Engineer, AI/ML Operations

ComcastPhiladelphia, PA
Hybrid

About The Position

The Xfinity Data Platform (XDP) is an intelligence warehousing platform that provides centralized information and control capabilities over its subscriber’s local area network. It acquires, stores, and aggregates data related to end-user connected devices to the Comcast infrastructure. Additionally, a modular intelligence logic allows for granular visibility and control of subscriber’s networks, providing device-centric value-added capabilities such as creating and managing content access control policies, performance policies, or issuing presence notifications to the Comcast data ecosystem. This role is a foundational member of the platform DevOps and AIOps team responsible for maintaining and evolving the XDP platform, supporting new infrastructure and operational requirements for Archetype, MLOps, and AIOps initiatives, and ensuring the platform remains reliable, secure, and scalable during rapid expansion.

Requirements

  • 5-7 Years Relevant Work Experience
  • AI Ops
  • Automation
  • AWS Elastic Kubernetes Service (EKS)
  • Data Optimization
  • Machine Learning Operations
  • Platform Operations
  • Troubleshooting
  • Bachelor's Degree (or equivalent combination of coursework and experience, or extensive related professional experience)

Responsibilities

  • Core DevOps & AIOps Engineer for XDP and Next‑Gen Platform Needs
  • Maintaining and evolving the XDP platform, which underpins key services across Connected Living.
  • Supporting new infrastructure and operational requirements for Archetype, MLOps, and AIOps initiatives.
  • Ensuring the platform remains reliable, secure, and scalable during a period of rapid expansion.
  • AWS EKS Expertise Supporting 90% of Our Services
  • Providing deep specialization in Kubernetes operations and performance tuning, cluster stability and security, and autoscaling, networking, and observability for mission‑critical components.
  • Specialized AIOps & MLOps Engineering
  • Managing full ML lifecycle deployment pipelines, monitoring, retraining, and model governance.
  • Operationalizing AIOps patterns that reduce toil and elevate automation maturity.
  • Ownership of Complex, High‑Scale Infrastructure
  • Operating and troubleshooting critical distributed systems, including EKS, Spark, Kafka, Redis, and Zookeeper.
  • Managing multi‑layered data and compute pipelines that support real‑time and batch workloads.
  • Ensuring high‑availability and disaster‑recovery architecture.
  • Automation at Scale (Python/Golang + AI Tooling)
  • Building high‑efficiency automation frameworks.
  • Reducing manual operational work using Python or Golang.
  • Accelerating engineering velocity with AI‑assisted tooling such as GitHub Copilot.
  • Multi‑Region Operational Management (US, EU, APAC)
  • Managing Dev, Staging, and Production workloads across multiple regions globally.
  • Ensuring environment parity, compliance, and regional configuration differences.
  • Supporting 24/7 global infrastructure.
  • Cost Optimization & Infrastructure Efficiency
  • Reducing compute and storage costs.
  • Eliminating waste in Kubernetes clusters.
  • Optimizing data pipelines and observability tooling.

Benefits

  • Benefits that connect you to the support you need when it matters most, and should help you care for those who matter most.
  • An array of options, expert guidance and always-on tools that are personalized to meet the needs of your reality—to help support you physically, financially and emotionally through the big milestones and in your everyday life.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service