Lead DevOps Engineer

Paramount•New York, NY

About The Position

We are looking for a Lead DevOps Engineer - Online Inference to join our Applied Intelligence Personalization Team. This role will focus on building and maintaining scalable, low-latency infrastructure to support real-time machine learning inference for engagement and personalized messaging. The ideal candidate will have 2+ years of experience working with Kubernetes, CI/CD pipelines, and cloud-based infrastructure to optimize and deploy real-time ML models.

Requirements

4+ years of experience in DevOps, Site Reliability Engineering (SRE), or Cloud Infrastructure Engineering.
Solid experience with Kubernetes and container orchestration.
Hands-on experience with CI/CD tools such as GitHub Actions, Jenkins, and ArgoCD.
Experience working with real-time inference and ML model deployment.
Deep knowledge of Google Cloud Platform (GCP), AWS, or Azure.
Expertise in infrastructure as code (IaC) using Terraform or Helm.
Experience with message queues and event-driven architectures (Pub/Sub, Kafka, etc.).
Proficiency in monitoring and logging solutions (New Relic, Prometheus, OpenTelemetry, etc.).
Deep scripting skills in Python, Bash, or Go for automation.

Nice To Haves

Hands-on experience with ML model serving frameworks (TensorFlow Serving, Triton, TorchServe, etc.).
Familiarity with load balancing, API gateways, and caching strategies.
Understanding of A/B testing frameworks and experimentation analysis.
Experience optimizing low-latency microservices for ML-based personalization.
Passion for building and maintaining high-performance infrastructure for real-time applications.

Responsibilities

Design, implement, and manage scalable and reliable infrastructure for online inference services.
Optimize Kubernetes-based deployments for low-latency model serving and real-time personalization.
Automate CI/CD pipelines to streamline the deployment of ML models and services.
Develop observability and monitoring solutions using tools like Prometheus, New Relic, and OpenTelemetry.
Ensure high availability, security, and performance of real-time inference APIs.
Work with ML engineers and backend teams to integrate inference models effi ciently into production.
Implement autoscaling strategies for inference workloads based on traffic patterns and model demand.
Manage Pub/Sub and event-driven architectures to enable real-time messaging and engagement analytics.
Optimize model-serving infrastructure using Redis, Memcached, and other caching strategies.
Debug and tackle production issues related to latency, scaling, and reliability.

Benefits

Attractive compensation and comprehensive benefits packages.
Generous paid time off.
An exciting and fulfilling opportunity to be part of one of Paramount’s most dynamic teams.
Opportunities for both on-site and virtual engagement events.
Unique opportunities to make meaningful connections and build a vibrant community, both inside and outside the workplace.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume