About The Position

We’re looking for a Machine Learning Engineer to design, deploy, and operate production ML systems on Amazon Web Services. You’ll own the full lifecycle in a real-world, high-stakes environment — from training and packaging through deployment, monitoring, retraining, security, and cost control. This role sits at the intersection of ML engineering and MLOps and is core to CCT’s analytics strategy. You’ll partner closely with data scientists, engineers, and product stakeholders to turn complex time-series and transactional data into reliable, observable, and cost-effective ML services that our customers can trust. You’ll thrive here if you naturally dig into why models behave the way they do, enjoy tracing issues to their root cause, and like collaborating across disciplines to ship robust systems that are built to last.

Requirements

  • 3+ years of experience in machine learning engineering, MLOps, or a closely related discipline.
  • Hands-on experience with AWS ML and data services — SageMaker (training, endpoints, pipelines), S3, Lambda, Step Functions, CloudWatch, MWAA (Apache Airflow).
  • Experience working with time series data, including feature engineering, seasonality handling, and temporal train/test splits.
  • Strong Python skills and familiarity with common ML frameworks (scikit-learn, PyTorch, XGBoost, or equivalent).
  • Experience building and maintaining CI/CD pipelines for ML systems.
  • Demonstrated ability to monitor and debug production ML systems — latency, drift, errors, and data quality — and drive issues to root cause.
  • Comfort with SQL and working with structured data at scale.
  • Able to work collaboratively across teams, assume positive intent, and communicate clearly with both technical and non-technical stakeholders.
  • Track record of self-directed learning and technical growth in areas like AWS, ML frameworks, or deployment patterns.

Nice To Haves

  • Experience in a regulated industry (gaming, finance, healthcare) where auditability, explainability, and compliance are first-class concerns.
  • Familiarity with feature stores, model registries, or ML metadata tools (e.g., MLflow, SageMaker Model Registry).
  • Experience with infrastructure-as-code (Terraform, CDK, or CloudFormation).
  • Exposure to data drift detection libraries or custom drift monitoring implementations.

Responsibilities

  • Build and maintain reproducible model training workflows on AWS (SageMaker, S3, Glue, etc.), making retraining, rollback, and experimentation routine rather than heroic.
  • Deploy and operate real-time and batch inference services with full CI/CD pipelines, versioning, and safe rollout strategies (canary, shadow, A/B) so changes are deliberate and observable.
  • Instrument production models for performance, data drift, latency, and errors — and automate retraining triggers when models drift out of tolerance.
  • Maintain model lineage, auditability, and traceability to meet the compliance, governance, and reporting needs of the regulated gaming industry.
  • Enforce least-privilege IAM, encryption, and secure data access patterns across the entire ML platform.
  • Treat cost as a first-class engineering metric — right-size infrastructure, balance batch vs. real-time workloads, and continually reduce platform spend without sacrificing reliability.
  • Collaborate with engineers, data scientists, and product teams to translate business problems into ML solutions, communicate tradeoffs clearly, and iterate based on feedback.
  • Continuously explore new AWS services, ML frameworks, and deployment patterns to improve reliability, observability, and developer velocity on the ML platform.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service