Senior Machine Learning Operations Engineer

BetMGMHybrid, NJ
$135,000 - $170,000Hybrid

About The Position

The Senior MLOps Engineer treats ML systems as software systems and owns the path from a trained model to a production endpoint that meets its latency, cost, and reliability budgets — across both batch scoring (SageMaker Batch Transform, Snowflake Cortex / Snowpark ML, dbt-orchestrated scoring) and real-time inference (SageMaker real-time endpoints, Lambda + Bedrock, sub-second feature serving). The Senior Engineer builds the platform that data scientists and ML engineers ship on: feature store with guaranteed online/offline parity, model registry, CI/CD for ML, drift and quality monitoring, champion/challenger and shadow deployment scaffolding. This requires a software-engineering-first mindset — distributed systems, observability, and on-call instincts are the foundation; ML literacy makes them effective for this role. GenAI integration experience is a plus, not a requirement.

Requirements

  • BS or MS in Computer Science, Math, Statistics, Machine Learning, or other STEM field — or equivalent practical experience.
  • 5+ years shipping software in production — Python, Docker, Kubernetes or ECS, CI/CD, distributed systems debugging — including time on-call.
  • 3+ years operating ML in production — you have owned a model in prod that served real traffic, with stated latency and cost budgets and a runbook you wrote.
  • AWS depth across the SageMaker surface (Training, Endpoints, Batch Transform, Model Registry, Pipelines) plus the supporting cast (IAM, Lambda, ECS, S3, Secrets Manager, VPC).
  • Snowflake fluency — Snowpark ML, Cortex, dbt-orchestrated batch scoring, RBAC for ML workloads.
  • IaC for ML — Terraform + SageMaker Pipelines or equivalent.
  • No manual console deployments to production.
  • Feature store experience — SageMaker Feature Store, Tecton, or Feast — with explicit ownership of online/offline parity.
  • Champion/challenger, shadow, and canary deployment patterns as production muscle, not blog-post familiarity.
  • Drift and model monitoring — Evidently, Arize, WhyLabs, or SageMaker Model Monitor — wired to a paging path.
  • Software-engineering-first mindset — you treat ML systems as systems, not notebooks.

Nice To Haves

  • GenAI integration experience is a plus, not a requirement.
  • GenAI in production — Bedrock, Anthropic, or OpenAI APIs integrated into live systems; RAG pipelines; vector DBs (Snowflake Cortex Search, pgvector , Pinecone); evaluation frameworks ( Langfuse or in-house).
  • Snowflake-native ML — Snowpark Container Services, Cortex AISQL, Cortex Agents — for workloads that do not need to leave the warehouse.
  • Streaming feature engineering — Kafka, Flink, or Snowpipe Streaming — for sub-second features.
  • Fine-tuning experience — LoRA , QLoRA , instruction tuning, eval-driven iteration — with an honest read on when fine-tuning beats prompting.
  • A track record of shipping more with AI in the engineering loop than without.
  • Regulated-industry experience (gaming, fintech, healthcare) — comfort with model risk, audit, and lineage requirements.

Responsibilities

  • Stand up and operate BetMGM's ML platform on AWS (SageMaker Training, Model Registry, Pipelines, Endpoints, Batch Transform) and Snowflake (Snowpark ML, Cortex), with Terraform-managed infrastructure.
  • Build self-service scaffolds that let data scientists ship a model end-to-end without a ticket queue — cookie-cutter project templates with CI, drift monitoring, alerting, IaC, and Snowflake connectivity pre-baked.
  • Design and operate batch scoring pipelines — SageMaker Batch Transform, dbt-orchestrated scoring against Snowflake, Snowpark ML — with explicit freshness and cost SLAs.
  • Design and operate real-time inference paths — SageMaker real-time endpoints, Lambda + Bedrock for GenAI, API Gateway — with stated latency budgets (typically sub-100ms) and graceful degradation under load.
  • Own the feature store (SageMaker Feature Store, Tecton, or Feast) with guaranteed online/offline parity — training-serving skew is treated as an incident, not a tradeoff.
  • Build CI/CD for ML — model registry, automated retraining triggers, model versioning, lineage from feature → training run → deployed model → live prediction.
  • Implement champion/challenger, shadow deployments, and canary releases as platform primitives so individual model teams do not reinvent them per project.
  • Stand up drift detection, data quality, and model performance monitoring (Evidently, Arize, or SageMaker Model Monitor — pick one and standardize) with paging that routes to humans who can fix it.
  • Own MLOps incident response — production model failures are SEV events with postmortems.
  • Right-size endpoints, batch caching, request batching, and autoscaling.
  • State cost-per-prediction targets up front and meet them.
  • Integrate LLM APIs (Bedrock, Anthropic, OpenAI) into production paths — RAG pipelines, agent eval frameworks, prompt versioning, cost and latency observability.
  • Partner with the Helix team on AI personalization workloads as they ramp toward March Madness 2027.
  • Direct AI coding agents (Claude Code, Cursor, GitHub Copilot, dbt Copilot) as a force multiplier across infrastructure code, eval suites, and model-serving glue — designing work for agents to do, not just accepting their suggestions.
  • Partner with the data engineering team on shared standards (Terraform modules, CI/CD patterns, observability, lineage).
  • Work alongside data scientists and analytics partners to land the right interfaces between research and production — opinionated about the boundary.
  • Coordinate with Entain India and contractor ML partners as workloads consolidate onto the BetMGM-owned platform.

Benefits

  • Medical, Dental, Vision, Life, and Disability Insurance
  • 401(k) with company match
  • Pre-tax spending accounts including health care FSA and commuter savings
  • Flexible paid time off
  • Professional development reimbursement and ongoing skills training opportunities
  • Employee resource groups
  • Swag, ticket giveaways, and more!
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service