Lead, AI Production Services

AECOMDallas, TX
2dHybrid

About The Position

Work with Us. Change the World. At AECOM, we're delivering a better world. Whether improving your commute, keeping the lights on, providing access to clean water, or transforming skylines, our work helps people and communities thrive. We are the world's trusted infrastructure consulting firm, partnering with clients to solve the world’s most complex challenges and build legacies for future generations. There has never been a better time to be at AECOM. With accelerating infrastructure investment worldwide, our services are in great demand. We invite you to bring your bold ideas and big dreams and become part of a global team of over 50,000 planners, designers, engineers, scientists, digital innovators, program and construction managers and other professionals delivering projects that create a positive and tangible impact around the world. We're one global team driven by our common purpose to deliver a better world. Join us. Own and build the enterprise AI Operations practice, ensuring all production AI, agentic, and automation solutions are reliable, observable, well-governed, and continuously improving. This is a hands-on leadership role responsible for building the AI Operations function from the ground up, including implementing frameworks, observability, and operational playbooks. This leader defines and operationalizes AI Ops standards, frameworks, and capabilities across the organization—establishing clear accountability, visibility, and control over AI systems at scale. Serve as the functional leader for AI Operations, partnering closely with Product, Engineering, AI Transformation, and Delivery teams to ensure AI solutions are production-ready, resilient, and aligned to enterprise standards. Drive the implementation of governance, observability, and service management practices required to safely scale AI across the business. This role establishes the operational foundation required to scale AI across the enterprise with confidence. By ensuring reliability, visibility, and control of AI and agentic systems in production, this leader enables widespread adoption while minimizing operational risk, controlling costs, and driving continuous improvement toward a more autonomous and efficient organization. This position will offer flexibility for hybrid work schedules to include both in-office presence and telecommute/virtual work, to be based from either Houston or Dallas, TX.

Requirements

  • Bachelor's Degree plus extensive years of experience in enterprise IT operations, service management, reliability engineering, or production support, including 6+ years of overall leadership experience, to include leading operations for AI/ML/agentic/production systems in large-scale environments, or demonstrated equivalent experience and education.
  • Proven experience defining and governing operations frameworks, standards, and operating models across teams.
  • Deep knowledge of AI/agentic production challenges (LLM observability, agent behavior governance, RAG/prompt drift, orchestration risks, cost management).
  • Expertise in ITIL practices, observability (e.g., Prometheus, Grafana, OpenTelemetry), incident/change management, SLAs, and supplier governance.
  • Strong background in risk management, FinOps/Cloud cost optimization, and executive-level reporting.

Nice To Haves

  • Hands-on experience with AI platforms like Azure AI Foundry, AWS Bedrock, LangChain, or UiPath in production.
  • Knowledge of Responsible AI operations, agentic risk governance, and emerging AI standards.
  • Background in site reliability engineering (SRE), DevOps, or enterprise architecture.
  • Experience with hybrid/multi-cloud environments and supplier management.
  • Advanced degree in Computer Science, Engineering, or related field.

Responsibilities

  • Own the Enterprise AI Operations Practice End-to-End
  • Hold full accountability for the AI Operations strategy, operating model, standards, processes, tools, and governance frameworks—defining, implementing, and evolving them to ensure consistency, reliability, and scalability across all production AI, agentic, and automation solutions.
  • Drive Production Reliability, Support, and Governance
  • Establish and operationalize ITIL-aligned practices for incident, problem, and change management tailored to AI and agentic systems.
  • Define enterprise SLAs, escalation paths, and support models (internal, supplier, hybrid), ensuring strong governance of agent behavior, access, and production risk across large-scale AI deployments.
  • Lead Observability, Optimization, and Continuous Improvement
  • Define and implement enterprise observability and monitoring standards for AI behavior, performance, cost, and risk.
  • Establish proactive detection, alerting, and operational telemetry across AI systems.
  • Identify cross-product trends and systemic issues, driving improvements in reliability, performance, and cost efficiency.
  • Establish Enterprise Reporting and Operational Reviews
  • Define standardized AI Operations metrics, scorecards, and dashboards.
  • Publish consistent reporting on AI solution health, risks, and performance.
  • Lead operational review forums, supplier accountability discussions, and action tracking to ensure continuous improvement.
  • Partner with Product, Delivery, and Technical Teams
  • Serve as the primary operations counterpart to AI Product Management, Engineering, AI Platforms, Architecture, and DevOps—ensuring alignment on production readiness, observability integration, risk management, and operational excellence.
  • Mature AI Operations as Capabilities Evolve
  • Continuously evolve the AI Operations practice in step with advancements in AI, agentic systems, orchestration, and automation technologies.
  • Embed Responsible AI principles, governance, and cost optimization practices into operations.
  • Build AI Operations Capability Across the Organization
  • Develop and implement playbooks, standards, training, and communities of practice to elevate AI operations maturity.
  • Mentor teams and review production readiness and operational strategies to ensure alignment and scalability.

Benefits

  • AECOM benefits may include medical, dental, vision, life, AD&D, disability benefits, paid time off, leaves of absences, voluntary benefits, perks, flexible work options , well-being resources, employee assistance program, business travel insurance, service recognition awards, retirement savings plan, and employee stock purchase plan.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service