This posting is for proactive recruitment purposes and may be used to fill current openings or future vacancies within our organization. Staff Software Engineer – AI (DevOps) Are you ready to partner closely with product, architecture, and engineering teams to define needs and technical strategy, lead research & development within the project life cycle, provide technical analysis and design, and support operations staff in executing, testing, and rolling out solutions? You will combine Staff‑level software engineering with Lead‑level DevOps/Platform expertise: constantly looking to optimize systems and services for security, automation, reliability, and performance/availability, while ensuring solutions adhere to architecture standards and organizational values. You will also help development teams use AI safely and effectively in their SDLC (e.g., GitHub Actions and other MCPS tooling), and drive best practices in AI/ML Ops. About the Role In this role as a Staff Software Engineer – AI (DevOps), you will: Architect and implement AI‑driven solutions using agentic AI patterns , including MCP server architectures, orchestration workflows, and agentic pipelines. Design and operate scalable, secure, and cost‑efficient AI platforms on cloud infrastructure (Azure and/or AWS) with Kubernetes as the primary runtime. Integrate LLMs, vector search, and retrieval‑augmented generation (RAG) patterns using services such as Azure AI Foundry and Azure AI Search . Define and implement AI/ML Ops practices for model and pipeline lifecycle, including versioning, monitoring, evaluation, and governance. Plan, deploy, and maintain critical business applications and AI services in production and non‑production cloud environments (Azure / AWS). Design and implement appropriate environments for those applications and services; engineer robust release management procedures and provide production support. Build and maintain CI/CD pipelines using MCPS tooling (e.g., Azure DevOps, Jenkins, GitHub Actions ), including automation for building, testing, scanning, and deploying AI and non-AI workloads. Design and maintain infrastructure‑as‑code (e.g., Terraform, Bicep, Ansible) for cloud, Kubernetes, networking, and platform services. Develop and maintain agentic workflows that orchestrate tools, services, and data sources to support complex business processes. Use AI tools within the development lifecycle (e.g., AI‑assisted code generation, GitHub Actions AI features, AI‑driven test generation and triage) to increase velocity while maintaining quality and compliance. Collaborate with product and engineering teams to identify opportunities for AI automation in build, test, deployment, and operations workflows. Drive improvements to processes and design enhancements to automation to continuously improve production environments (reliability, observability, performance, cost). Perform daily system monitoring, verifying integrity and availability of services, reviewing system and application logs, and verifying completion of scheduled and automated tasks. Perform ongoing performance tuning, infrastructure upgrades, and resource optimization as required. Provide Tier II/Tier III support for incidents and requests from various constituencies; lead technical recovery for high‑severity incidents impacting AI platforms and services. Establish and maintain monitoring, alerting, SLOs, and dashboards for AI services; contribute to disaster recovery planning and testing to ensure business continuity. Partner with security and compliance teams to ensure AI platforms and pipelines meet TR security, privacy, and governance standards, including access controls, data protection, and auditability. Provide leadership, technical support, user support, technical orientation, and technical education activities to project teams and staff across multiple locations. Influence broader technology groups in adopting cloud, Kubernetes, and AI technologies, processes, and best practices. Mentor and coach engineers (Dev, QA, DevOps, Data/ML) in modern DevOps, AI/ML Ops, and platform practices. Maintain and contribute to our knowledge base and documentation, including runbooks, design docs, and standards. Participate in and often lead technical design reviews, architecture decisions, and cross‑team initiatives.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
No Education Listed
Number of Employees
5,001-10,000 employees