Lead DevOps Engineer

ParamountNew York, NY
3d

About The Position

We are looking for a Lead DevOps Engineer - Online Inference to join our Applied Intelligence Personalization Team. This role will focus on building and maintaining scalable, low-latency infrastructure to support real-time machine learning inference for engagement and personalized messaging. The ideal candidate will have 2+ years of experience working with Kubernetes, CI/CD pipelines, and cloud-based infrastructure to optimize and deploy real-time ML models.

Requirements

  • 4+ years of experience in DevOps, Site Reliability Engineering (SRE), or Cloud Infrastructure Engineering.
  • Solid experience with Kubernetes and container orchestration.
  • Hands-on experience with CI/CD tools such as GitHub Actions, Jenkins, and ArgoCD.
  • Experience working with real-time inference and ML model deployment.
  • Deep knowledge of Google Cloud Platform (GCP), AWS, or Azure.
  • Expertise in infrastructure as code (IaC) using Terraform or Helm.
  • Experience with message queues and event-driven architectures (Pub/Sub, Kafka, etc.).
  • Proficiency in monitoring and logging solutions (New Relic, Prometheus, OpenTelemetry, etc.).
  • Deep scripting skills in Python, Bash, or Go for automation.

Nice To Haves

  • Hands-on experience with ML model serving frameworks (TensorFlow Serving, Triton, TorchServe, etc.).
  • Familiarity with load balancing, API gateways, and caching strategies.
  • Understanding of A/B testing frameworks and experimentation analysis.
  • Experience optimizing low-latency microservices for ML-based personalization.
  • Passion for building and maintaining high-performance infrastructure for real-time applications.

Responsibilities

  • Design, implement, and manage scalable and reliable infrastructure for online inference services.
  • Optimize Kubernetes-based deployments for low-latency model serving and real-time personalization.
  • Automate CI/CD pipelines to streamline the deployment of ML models and services.
  • Develop observability and monitoring solutions using tools like Prometheus, New Relic, and OpenTelemetry.
  • Ensure high availability, security, and performance of real-time inference APIs.
  • Work with ML engineers and backend teams to integrate inference models effi ciently into production.
  • Implement autoscaling strategies for inference workloads based on traffic patterns and model demand.
  • Manage Pub/Sub and event-driven architectures to enable real-time messaging and engagement analytics.
  • Optimize model-serving infrastructure using Redis, Memcached, and other caching strategies.
  • Debug and tackle production issues related to latency, scaling, and reliability.

Benefits

  • Attractive compensation and comprehensive benefits packages.
  • Generous paid time off.
  • An exciting and fulfilling opportunity to be part of one of Paramount’s most dynamic teams.
  • Opportunities for both on-site and virtual engagement events.
  • Unique opportunities to make meaningful connections and build a vibrant community, both inside and outside the workplace.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service