Senior Software Engineer

Vizient•Irving, TX

About The Position

In this role, you will design and build scalable backend systems, cloud infrastructure, and platform capabilities that power AI/ML products and applications. You will partner closely with AI/ML engineers, data scientists, and cross-functional teams to productionize LLM applications, RAG pipelines, and agentic AI workflows. You will leverage modern cloud technologies, infrastructure-as-code practices, and AI-assisted development tools to deliver reliable, secure, and maintainable solutions that accelerate innovation across the AI/ML team while helping establish engineering standards, reusable platform patterns, and operational best practices.

Requirements

5 or more years of relevant experience required.
Strong Python development experience with expertise building production backend services, APIs, distributed systems, and cloud-based applications required.
Hands-on experience with Pulumi, Terraform, or other infrastructure-as-code tools along with cloud platforms such as AWS, Azure, or GCP required.
Knowledge of observability tools, monitoring, logging, tracing, alerting frameworks, and production support practices.
Familiarity with ML/AI systems, model serving, LLM applications, inference pipelines, RAG workflows, data pipelines, or related AI platform technologies.
Strong analytical, troubleshooting, problem-solving, verbal communication, and written communication skills with the ability to collaborate across technical and business teams.
Ability to operate effectively in fast-paced, evolving environments with a high level of ownership, accountability, and adaptability.

Nice To Haves

Experience with Docker, CI/CD pipelines, infrastructure automation, container orchestration, Kubernetes, serverless architectures, and cloud deployment practices preferred.
Experience with Databricks, Azure AI Foundry, or similar AI/ML platform technologies preferred.

Responsibilities

Design, develop, and maintain backend services, APIs, and platform components that support AI/ML applications and distributed systems.
Build scalable cloud infrastructure using Pulumi and modern infrastructure-as-code practices while developing CI/CD pipelines, deployment workflows, and containerized cloud environments.
Support production deployment of ML models, LLM applications, RAG pipelines, and agentic AI systems while collaborating with AI/ML engineers to productionize model serving, inference pipelines, and data workflows.
Enhance observability through monitoring, logging, tracing, alerting, and incident management practices to improve operational reliability and system performance.
Implement engineering standards for testing, code quality, security, maintainability, scalability, latency optimization, and cost efficiency.
Define reusable platform patterns, developer tooling, and engineering workflows that improve developer productivity and operational consistency across the AI/ML team.
Evaluate emerging AI engineering trends, AI-assisted development tools, and modern software practices to drive continuous improvement and innovation.
Partner with product, security, data, and platform teams to deliver production-ready AI solutions while contributing to architectural discussions and long-term platform strategy.
Troubleshoot complex production issues, perform root cause analysis, and drive remediation efforts to improve system stability and reliability.
Mentor engineers through technical collaboration, code reviews, knowledge sharing, and engineering best practices.