Lead, Data Science Operations

Echo Global LogisticsChicago, IL
Remote

About The Position

The Data Science Operations Lead is a senior individual-contributor role that bridges Data Science, Engineering, and IT Architecture. This position focuses on the operational aspects of the model lifecycle, including deployment, monitoring, scaling, and maintenance. The role is crucial for ensuring the reliability, observability, and governance of Echo's growing portfolio of production models, allowing Data Scientists to concentrate on developing new capabilities. The Lead will facilitate the transition of models from research and development to production services, act as the primary contact for production issues, and liaise with Architecture on deployment and reliability matters.

Requirements

  • Hands-on experience operating ML or software systems in production (MLOps, DevOps, SRE, platform, or data science background with demonstrated production ownership).
  • Strong working knowledge of CI/CD pipelines, deployment automation, and a major cloud platform (AWS, Azure, or GCP).
  • Demonstrated expertise in error handling, fault tolerance, and designing systems that fail gracefully (retries, fallbacks, alerting, monitoring/observability).
  • Proficiency in Python (R a plus), and a working understanding of how ML models are packaged, served, monitored, and retrained.
  • Comfort serving as first point of contact for production issues, including an on-call / off-hours expectation.
  • Ability to translate complex infrastructure into clear guidance for non-infrastructure specialists.

Nice To Haves

  • Experience standing up monitoring and observability for a portfolio of production models or services (e.g., drift detection, performance tracking, alerting).
  • Familiarity with containerization (Docker) and orchestration (Kubernetes).
  • Familiarity with infrastructure-as-code.
  • Familiarity with model-serving frameworks.
  • Familiarity with MLOps tooling such as MLflow, Airflow, or Kubeflow, or managed equivalents (e.g., SageMaker, Vertex AI).
  • Familiarity with data/model versioning.
  • Experience working across an engineering/architecture boundary as a liaison or embedded operations partner.
  • Pragmatic use of AI tooling to accelerate operations and code-quality work, paired with sound judgment about when human reasoning is required.

Responsibilities

  • Model deployment partnership: Serve as the primary liaison between Data Science and the Architecture/Platform Engineering team for model deployment, managing daily collaboration, hand-offs, and coordination to bridge the gap between trained models and production services (APIs, web tools).
  • Production reliability and incident response: Act as the first point of contact for production issues (outages, errors, degraded endpoints) across all deployed models, with an on-call/off-hours availability expectation to shield the development team.
  • Resilient, error-aware systems: Implement rigorous error handling and fault tolerance practices to prevent errors, ensure graceful degradation or failure of models and endpoints, and establish sensible fallbacks, retries, alerting, and recovery paths.
  • Monitoring and observability: Establish and maintain monitoring and observability capabilities for the production model portfolio, tracking model health, endpoint performance, latency, logging, and prediction quality as an enterprise function.
  • Deployment expertise and team enablement: Develop a deep understanding of the evolving deployment system, guide Data Scientists in moving from experiment to production safely and quickly, and drive templating, documentation, and automation to reduce infrastructure-related time.
  • Governance and quality: Own versioning, reproducibility, and operational governance for production models, collaborating with Architecture on standards and controls to ensure trustworthiness.

Benefits

  • Bonus eligibility based on personal and business performance.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service