Staff Software Engineer – AI (DevOps)

Thomson ReutersEagan, MN
1dHybrid

About The Position

This posting is for proactive recruitment purposes and may be used to fill current openings or future vacancies within our organization. Staff Software Engineer – AI (DevOps) Are you ready to partner closely with product, architecture, and engineering teams to define needs and technical strategy, lead research & development within the project life cycle, provide technical analysis and design, and support operations staff in executing, testing, and rolling out solutions? You will combine Staff‑level software engineering with Lead‑level DevOps/Platform expertise: constantly looking to optimize systems and services for security, automation, reliability, and performance/availability, while ensuring solutions adhere to architecture standards and organizational values. You will also help development teams use AI safely and effectively in their SDLC (e.g., GitHub Actions and other MCPS tooling), and drive best practices in AI/ML Ops. About the Role In this role as a Staff Software Engineer – AI (DevOps), you will: Architect and implement AI‑driven solutions using agentic AI patterns , including MCP server architectures, orchestration workflows, and agentic pipelines. Design and operate scalable, secure, and cost‑efficient AI platforms on cloud infrastructure (Azure and/or AWS) with Kubernetes as the primary runtime. Integrate LLMs, vector search, and retrieval‑augmented generation (RAG) patterns using services such as Azure AI Foundry and Azure AI Search . Define and implement AI/ML Ops practices for model and pipeline lifecycle, including versioning, monitoring, evaluation, and governance. Plan, deploy, and maintain critical business applications and AI services in production and non‑production cloud environments (Azure / AWS). Design and implement appropriate environments for those applications and services; engineer robust release management procedures and provide production support. Build and maintain CI/CD pipelines using MCPS tooling (e.g., Azure DevOps, Jenkins, GitHub Actions ), including automation for building, testing, scanning, and deploying AI and non-AI workloads. Design and maintain infrastructure‑as‑code (e.g., Terraform, Bicep, Ansible) for cloud, Kubernetes, networking, and platform services. Develop and maintain agentic workflows that orchestrate tools, services, and data sources to support complex business processes. Use AI tools within the development lifecycle (e.g., AI‑assisted code generation, GitHub Actions AI features, AI‑driven test generation and triage) to increase velocity while maintaining quality and compliance. Collaborate with product and engineering teams to identify opportunities for AI automation in build, test, deployment, and operations workflows. Drive improvements to processes and design enhancements to automation to continuously improve production environments (reliability, observability, performance, cost). Perform daily system monitoring, verifying integrity and availability of services, reviewing system and application logs, and verifying completion of scheduled and automated tasks. Perform ongoing performance tuning, infrastructure upgrades, and resource optimization as required. Provide Tier II/Tier III support for incidents and requests from various constituencies; lead technical recovery for high‑severity incidents impacting AI platforms and services. Establish and maintain monitoring, alerting, SLOs, and dashboards for AI services; contribute to disaster recovery planning and testing to ensure business continuity. Partner with security and compliance teams to ensure AI platforms and pipelines meet TR security, privacy, and governance standards, including access controls, data protection, and auditability. Provide leadership, technical support, user support, technical orientation, and technical education activities to project teams and staff across multiple locations. Influence broader technology groups in adopting cloud, Kubernetes, and AI technologies, processes, and best practices. Mentor and coach engineers (Dev, QA, DevOps, Data/ML) in modern DevOps, AI/ML Ops, and platform practices. Maintain and contribute to our knowledge base and documentation, including runbooks, design docs, and standards. Participate in and often lead technical design reviews, architecture decisions, and cross‑team initiatives.

Requirements

  • 8+ years of overall software engineering / DevOps / platform engineering experience, including 3+ years in a Lead‑level DevOps / Platform / SRE capacity, and 2+ years supporting AI‑driven solutions at enterprise scale .
  • Strong experience designing and operating solutions on cloud platforms (Azure and/or AWS), including: Core services such as compute, storage, networking, identity, and managed databases (e.g., RDS or Azure SQL), and Experience with services such as S3/CloudFront/CloudFormation or Azure equivalents where applicable.
  • Hands‑on expertise with Kubernetes and containerization (Docker), including building and deploying containerized workloads at scale; experience with managed Kubernetes (e.g., AWS EKS and/or Azure AKS).
  • Deep knowledge and hands‑on experience with CI/CD and MCPS tools, including at least two of: Azure DevOps (ADO), Jenkins, GitHub Actions , with a track record of planning, building, and deploying cloud‑based solutions.
  • Experience implementing and supporting MCP server architectures, orchestration workflows, and agentic pipelines in production environments.
  • Demonstrated experience with AI/ML Ops concepts and tooling (e.g., model/pipeline versioning, evaluation, monitoring, rollout/rollback strategies).
  • Strong scripting and programming skills, preferably in Python , Bash , and/or PowerShell ; ability to build automation, tools, and integrations.
  • Practical experience with Infrastructure as Code (e.g., Terraform, Bicep, Ansible) for provisioning and managing cloud and Kubernetes resources.
  • Experience with Azure AI Foundry and Azure AI Search , or similar AI platform and vector search technologies.
  • Solid understanding of Git, branching strategies, and GitOps principles and tools.
  • Proven experience owning and operating continuous delivery / continuous deployment pipelines and production services, including monitoring, alerting, and incident response.
  • Strong communication and collaboration skills, with experience influencing across teams and mentoring other engineers.

Nice To Haves

  • Experience building and deploying .NET Core and/or Java ‑based solutions in cloud and Kubernetes environments.
  • Strong understanding of API‑first design and implementation, including secure, scalable APIs that integrate AI capabilities.
  • Experience implementing comprehensive testing strategies (unit, integration, performance, chaos, and/or evaluation loops for AI systems) in a continuous deployment environment.
  • Prior experience setting up monitoring tools (e.g., Prometheus, Grafana, CloudWatch, Azure Monitor, OpenTelemetry) and disaster recovery plans to ensure business continuity.
  • Exposure to data and ML tooling (e.g., feature stores, experiment tracking, model registries) and how they integrate with CI/CD and production platforms.
  • Experience working in regulated or compliance‑sensitive environments (e.g., legal, tax, financial services) with attention to data protection and governance.

Responsibilities

  • Architect and implement AI‑driven solutions using agentic AI patterns , including MCP server architectures, orchestration workflows, and agentic pipelines.
  • Design and operate scalable, secure, and cost‑efficient AI platforms on cloud infrastructure (Azure and/or AWS) with Kubernetes as the primary runtime.
  • Integrate LLMs, vector search, and retrieval‑augmented generation (RAG) patterns using services such as Azure AI Foundry and Azure AI Search .
  • Define and implement AI/ML Ops practices for model and pipeline lifecycle, including versioning, monitoring, evaluation, and governance.
  • Plan, deploy, and maintain critical business applications and AI services in production and non‑production cloud environments (Azure / AWS).
  • Design and implement appropriate environments for those applications and services; engineer robust release management procedures and provide production support.
  • Build and maintain CI/CD pipelines using MCPS tooling (e.g., Azure DevOps, Jenkins, GitHub Actions ), including automation for building, testing, scanning, and deploying AI and non-AI workloads.
  • Design and maintain infrastructure‑as‑code (e.g., Terraform, Bicep, Ansible) for cloud, Kubernetes, networking, and platform services.
  • Develop and maintain agentic workflows that orchestrate tools, services, and data sources to support complex business processes.
  • Use AI tools within the development lifecycle (e.g., AI‑assisted code generation, GitHub Actions AI features, AI‑driven test generation and triage) to increase velocity while maintaining quality and compliance.
  • Collaborate with product and engineering teams to identify opportunities for AI automation in build, test, deployment, and operations workflows.
  • Drive improvements to processes and design enhancements to automation to continuously improve production environments (reliability, observability, performance, cost).
  • Perform daily system monitoring, verifying integrity and availability of services, reviewing system and application logs, and verifying completion of scheduled and automated tasks.
  • Perform ongoing performance tuning, infrastructure upgrades, and resource optimization as required.
  • Provide Tier II/Tier III support for incidents and requests from various constituencies; lead technical recovery for high‑severity incidents impacting AI platforms and services.
  • Establish and maintain monitoring, alerting, SLOs, and dashboards for AI services; contribute to disaster recovery planning and testing to ensure business continuity.
  • Partner with security and compliance teams to ensure AI platforms and pipelines meet TR security, privacy, and governance standards, including access controls, data protection, and auditability.
  • Provide leadership, technical support, user support, technical orientation, and technical education activities to project teams and staff across multiple locations.
  • Influence broader technology groups in adopting cloud, Kubernetes, and AI technologies, processes, and best practices.
  • Mentor and coach engineers (Dev, QA, DevOps, Data/ML) in modern DevOps, AI/ML Ops, and platform practices.
  • Maintain and contribute to our knowledge base and documentation, including runbooks, design docs, and standards.
  • Participate in and often lead technical design reviews, architecture decisions, and cross‑team initiatives.

Benefits

  • Hybrid Work Model: We’ve adopted a flexible hybrid working environment (2-3 days a week in the office depending on the role) for our office-based roles while delivering a seamless experience that is digitally and physically connected.
  • Flexibility & Work-Life Balance: Flex My Way is a set of supportive workplace policies designed to help manage personal and professional responsibilities, whether caring for family, giving back to the community, or finding time to refresh and reset. This builds upon our flexible work arrangements, including work from anywhere for up to 8 weeks per year, empowering employees to achieve a better work-life balance.
  • Career Development and Growth: By fostering a culture of continuous learning and skill development, we prepare our talent to tackle tomorrow’s challenges and deliver real-world solutions. Our Grow My Way programming and skills-first approach ensures you have the tools and knowledge to grow, lead, and thrive in an AI-enabled future.
  • Industry Competitive Benefits: We offer comprehensive benefit plans to include flexible vacation, two company-wide Mental Health Days off, access to the Headspace app, retirement savings, tuition reimbursement, employee incentive programs, and resources for mental, physical, and financial wellbeing.
  • Culture: Globally recognized, award-winning reputation for inclusion and belonging, flexibility, work-life balance, and more. We live by our values: Obsess over our Customers, Compete to Win, Challenge (Y)our Thinking, Act Fast / Learn Fast, and Stronger Together.
  • Social Impact: Make an impact in your community with our Social Impact Institute. We offer employees two paid volunteer days off annually and opportunities to get involved with pro-bono consulting projects and Environmental, Social, and Governance (ESG) initiatives.
  • Making a Real-World Impact: We are one of the few companies globally that helps its customers pursue justice, truth, and transparency. Together, with the professionals and institutions we serve, we help uphold the rule of law, turn the wheels of commerce, catch bad actors, report the facts, and provide trusted, unbiased information to people all over the world.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

No Education Listed

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service