ML/AI Ops Engineer

Xcel EnergyMinneapolis, MN

About The Position

Are you looking for an exciting job where you can put your skills and talents to work at a company you can feel proud to be a part of? Do you want a workplace that will challenge you and offer you opportunities to learn and grow? A position at Xcel Energy could be just what you’re looking for. ML/AI Ops Engineer Position Summary The ML / AI Ops Engineer is responsible for operationalizing, deploying, monitoring, and sustaining machine learning and AI solutions across their full lifecycle. This role bridges data science, software engineering, cloud infrastructure, and governance to ensure models and AI systems are reliable, scalable, cost‑effective, and compliant, particularly in regulated or high‑risk environments. An ML/AI Ops Engineer will aid in design and operates the process and automation required for versioning models and data, managing CI/CD for ML, monitoring model drift and bias, enforcing governance controls, and ensuring integrations with enterprise platforms.

Requirements

  • Ten years of related functional experience
  • Bachelor's degree in Technology, Science, Business or related field, or 4 years of experience equivalent to the position.
  • Excellent communication skills, effective with varying organizational levels and skill set, and able to translate between technical and non-technical concepts.
  • Excellent Relationship Management and collaboration skills, with a track record of working as one team cross-organizationally to drive innovation and business results
  • Expertise managing the lifecycle of technical solutions
  • Deep Subject Matter Expertise within the respective system domain products, platforms, processes and architecture.
  • Broad and deep knowledge of technology architecture, infrastructure, network, security and software principles and models
  • Experience working in partnership with internal and external vendors.
  • Excellent analytical, problem-solving and troubleshooting skills
  • Extensive knowledge of future technology trends within area of expertise.
  • Demonstrated leadership on technical aspects of large-scale projects.
  • Experience coaching other developers in system deployment or operational troubleshooting.
  • Experience with delivery methodologies (Waterfall, Agile, Scrum) and operational models (ITIL)
  • Experience and understanding of core IT Service Management functions, such as Change Management and Incident Management

Nice To Haves

  • Familiarity with DataBricks platform.
  • Master's Degree

Responsibilities

  • Solution Delivery: Lead and support solution lifecycle technical activities. Ensure solutions are designed for great user experience and operational performance. Lead design, ensuring Enterprise Architecture, Security, Operations and Compliance aspects are continuously integrated into solutions. Provide input to cost and schedule estimation. Responsible for overall integrity of system design and operation. Oversee vendor activities.
  • Relationship Management: Conduct peer reviews and approve system changes and technical solution design. Coach and mentor less experience team members. Partner cross-organizationally to drive minimal costs on optimal solutions. Provide in-depth technical information to stakeholders as needed.
  • Strategy & Planning: Innovate through usage of industry emerging capabilities and evolving customer needs. Provide input to strategic roadmap and technical dependencies.
  • Subject Matter Expertise: Continuously stay current on, and apply, technical industry knowledge pertaining to the respective domain
  • Operations: Review solution performance and continually assess health of systems. Track and drive awareness to operational and technical debt risks. Provide escalated support to incident and problem management. Utilize analytics to improve availability, reliability, efficiency and capacity. Oversee vendor activities.
  • Model Deployment & Lifecycle Management: Productionize machine learning and AI models, including classical ML and GenAI, using standardized MLOps pipelines. Manage end‑to‑end model lifecycle activities: versioning, promotion, rollback, retraining, and retirement. Implement CI/CD practices for models, features, and inference services.
  • MLOps Platform & Pipeline Engineering: Design, build, and maintain reusable MLOps pipelines for training, validation, deployment, and monitoring. Develop common components (feature pipelines, quality checks, evaluation harnesses) to reduce friction across AI projects.
  • Monitoring, Observability & Reliability: Implement monitoring for model performance, data drift, bias, and system health. Own AI/ML operational SLAs, SLOs, and incident response, including root‑cause analysis and post‑mortems. Ensure high availability, resilience, and recoverability of AI services.
  • Governance, Risk & Compliance Support: Support regulated or high‑risk AI use cases by embedding governance, validation, and documentation into MLOps workflows. Produce and maintain required artifacts such as model cards, system cards, validation evidence, and audit support materials. Partner closely with AI Governance and Risk teams to ensure alignment with enterprise standards.

Benefits

  • Annual Incentive Program
  • Medical/Pharmacy Plan
  • Dental
  • Vision
  • Life Insurance
  • Dependent Care Reimbursement Account
  • Health Care Reimbursement Account
  • Health Savings Account (HSA) (if enrolled in eligible health plan)
  • Limited-Purpose FSA (if enrolled in eligible health plan and HSA)
  • Transportation Reimbursement Account
  • Short-term disability (STD)
  • Long-term disability (LTD)
  • Employee Assistance Program (EAP)
  • Fitness Center Reimbursement (if enrolled in eligible health plan)
  • Tuition reimbursement
  • Transit programs
  • Employee recognition program
  • Pension
  • 401(k) plan
  • Paid time off (PTO)
  • Holidays
  • Volunteer Paid Time Off (VPTO)
  • Parental Leave Benefit
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service