Senior DevOps Engineer – CloudOps & AI/ML – Evinova

AstraZenecaGaithersburg, MD
21dHybrid

About The Position

At AstraZeneca, we pride ourselves on crafting a collaborative culture that champions knowledge-sharing, ambitious thinking and innovation – ultimately providing employees with the opportunity to work across teams, functions and even the globe. Recognizing the importance of individualized flexibility, our ways of working allow employees to balance personal and work commitments while ensuring we continue to create a strong culture of collaboration and teamwork by engaging face-to-face in our offices 3 days a week. Our head office is purposely designed with collaboration in mind, providing space where teams can come together to strategize, brainstorm and connect on key projects. Are you ready to be part of the future of healthcare? Can you think big, be bold, and harness the power of digital and AI to tackle longstanding life sciences challenges? Then Evinova, a global health tech business might be for you! Transform patients’ lives through technology, data, and innovative ways of working. You’re disruptive, decisive, and transformative. Someone excited to use technology to improve patients’ health. We’re building a new Health-tech business – Evinova, a fully-owned subsidiary of AstraZeneca Group. Evinova delivers market-leading digital health solutions that are science-based, evidence-led, and human experience-driven. Thoughtful risks and quick decisions come together to accelerate innovation across the life sciences sector. Be part of a diverse team that pushes the boundaries of science by digitally empowering a deeper understanding of the patients we’re helping. Launch pioneering digital solutions that improve the patients’ experience and deliver better health outcomes. Together, we have the opportunity to combine deep scientific expertise with digital and artificial intelligence to serve the wider healthcare community and create new standards across the sector. Introduction to Role: We are seeking a Senior DevOps Engineer with strong CloudOps and AI/ML operations expertise to help scale and operate our global SaaS platform. You’ll play a key role in ensuring reliability, cost efficiency, and performance of our data and AI workloads—enabling scientists, data engineers, and developers to deliver faster and more securely.

Requirements

  • High School Diploma or GED
  • 7+ years in DevOps, CloudOps, or SRE roles supporting large-scale SaaS or data-driven platforms.
  • Deep operational experience with AWS (Fargate, EKS, EC2, S3, RDS, Lambda, IAM, CloudWatch, CloudTrail).
  • Proficiency with CI/CD tools (ArgoCD, GitHub Actions, Jenkins) and automation scripting (Python, Bash, TypeScript).
  • Strong hands-on experience with Kubernetes and containerized workloads.
  • Working experience with AI/ML platforms (AWS SageMaker, Kubeflow, MLflow, or equivalent).
  • Familiarity with GPU workloads and performance/cost tuning for AI pipelines.
  • Knowledge of MongoDB operations and performance optimization.
  • Solid understanding of FinOps principles, cost monitoring, and right-sizing in AWS.
  • Experience with observability and incident management (Splunk, Grafana, OpenTelemetry).

Nice To Haves

  • Bachelors degree or equivalent experience in the technology space.
  • Advanced expertise in AWS CDK, including building complex, reusable constructs and pipelines.
  • Familiarity with Projen for automating CDK project configuration and management.
  • Hands-on experience with Helm charts and Kubernetes manifests.
  • Experience with monitoring and logging tools such as Splunk, Grafana, and AWS CloudWatch.
  • Exposure to multi-tenant SaaS platforms and best practices.
  • Experience working with AI tools and frameworks.
  • Awareness of regulatory compliance frameworks (SOC 2, ISO 27001, NIST, HIPAA).
  • Mentor & Leader: Enjoys mentoring team members, and fostering a collaborative, innovation-driven team culture.
  • Organized & Adaptable: Able to manage multiple priorities and thrive in a fast-paced environment.
  • Innovative: Passionate about leveraging technology to solve complex problems and drive efficiency.
  • Customer-Focused: Dedicated to building infrastructure that delivers measurable business and customer value.

Responsibilities

  • Cloud Operations & Reliability
  • Lead operations for multi-tenant SaaS workloads running on AWS (Fargate, EKS, S3, RDS, Lambda, etc.).
  • Design and implement scalable, highly available, and cost-efficient infrastructure for production and ML workloads.
  • Drive incident response, postmortems, and operational runbooks to improve uptime and reduce MTTR.
  • CI/CD & Automation
  • Own and enhance CI/CD pipelines (ArgoCD, GitHub Actions, Jenkins) supporting both application and ML model deployment workflows.
  • Build automation for environment provisioning, configuration, and lifecycle management using Infrastructure as Code (AWS CDK or Terraform).
  • Enable self-service capabilities for engineering and data science teams.
  • FinOps & Cost Optimization
  • Monitor and optimize cloud usage, focusing on compute, GPU utilization, and storage tiers.
  • Implement cost controls and forecasting models to support AI/ML workloads.
  • Collaborate with Finance and Product teams to report and realize cloud savings targets.
  • AI/ML Infrastructure Operations
  • Support and automate ML pipelines for training, testing, and deployment using AWS SageMaker, Kubeflow, or MLflow.
  • Manage GPU and compute clusters (EKS, ECS, EC2) for model training and inference workloads.
  • Implement observability and scaling strategies for data ingestion, model serving, and batch workflows.
  • Partner with Data Science teams to operationalize ML models across environments with reproducibility and traceability.
  • Monitoring, Observability & Security
  • Develop and maintain dashboards, alerts, and telemetry (Splunk, Grafana, OpenTelemetry, AWS CloudWatch).
  • Define and track SLOs/SLIs to improve reliability and availability.
  • Apply security and compliance controls (IAM least privilege, KMS encryption, SOC 2, HIPAA, GDPR).
  • Contribute to audit preparation and evidence collection for compliance.
  • Collaboration & Leadership
  • Collaborate with Data Engineering, AI/ML, and Platform Ops teams to ensure smooth cross-team delivery.
  • Mentor junior engineers on operational best practices, IaC, CI/CD, and observability.
  • Participate in global change and incident management processes.

Benefits

  • qualified retirement program [401(k) plan]
  • paid vacation and holidays
  • paid leaves
  • health benefits including medical, prescription drug, dental, and vision coverage

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

High school or GED

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service