DevOps Engineer

Vumedi Inc.Oakland, CA
Hybrid

About The Position

We are looking for a DevOps Engineer to join our engineering team and take ownership of our infrastructure, deployment processes, and overall platform reliability. You will work closely with backend and data teams to support a growing video and data platform used by millions of healthcare professionals worldwide. In this role, you will focus on improving our CI/CD pipelines, system reliability, and developer experience, while helping scale our cloud infrastructure in a secure and cost-efficient way. You will work extensively with AWS services (compute, storage, networking, IAM, monitoring) and help ensure our systems are reliable, observable, and well-architected. You’ll also support and enable emerging AI/ML and LLM-powered systems used for large-scale medical content processing, helping build and operate the infrastructure required for these workloads. This includes improving data pipelines, optimizing resource usage, and ensuring production-grade reliability of AI-driven services. This is a high-impact role with a broad scope—from supporting production systems and data pipelines to driving long-term improvements in how we build, deploy, and operate our platform, with strong ownership and autonomy in shaping DevOps practices.

Requirements

  • 5+ years of experience in DevOps, Site Reliability Engineering, or infrastructure-focused roles
  • Proven experience designing and operating scalable, reliable, and secure cloud infrastructure (preferably AWS) in production environments
  • Strong understanding of cloud security best practices (IAM, network security, secrets management), preferably within AWS
  • Proficiency in Python for automation, scripting, and tooling
  • Hands-on experience building and maintaining CI/CD pipelines
  • Experience with monitoring, logging, and alerting tools (e.g., Datadog, CloudWatch, Prometheus)
  • Experience working in a Linux-based environment
  • Ability to drive infrastructure and DevOps strategy, balancing scalability, reliability, and cost
  • Experience working cross-functionally and influencing engineering teams on best practices and architectural decisions
  • Strong ownership mindset with the ability to operate autonomously in ambiguous environments

Nice To Haves

  • Experience supporting or scaling AI/ML or LLM-based systems in production
  • You have worked with containerized applications (Docker) and are familiar with orchestration concepts (Kubernetes or ECS is a plus)
  • You are familiar with Infrastructure as Code principles (e.g., Terraform) and have experience implementing Infrastructure as Code from scratch in existing environments
  • You have experience working with or supporting backend systems and data platforms (e.g., Postgres, Airflow is a plus)
  • Background in backend engineering or software development
  • Experience working in a fast-paced startup or scale-up environment
  • Experience leading and mentoring engineers, while contributing to team-wide best practices

Responsibilities

  • Own and improve our infrastructure, CI/CD pipelines, and deployment processes across multiple environments
  • Work with AWS services (compute, storage, networking, IAM, monitoring) to ensure scalable, secure, and reliable systems
  • Collaborate closely with backend and data teams to support production systems, data pipelines, and overall platform reliability
  • Continuously improve developer experience by streamlining workflows, reducing friction, and enabling faster, safer deployments
  • Contribute to improving security practices, access control, and compliance of our infrastructure
  • Automate infrastructure and workflows using Python
  • Improve observability by implementing and maintaining monitoring, logging, and alerting systems
  • Troubleshoot production issues, participate in incident response, and implement long-term fixes to improve system stability
  • Identify and drive improvements in performance, scalability, and cost efficiency across the platform
  • Support and scale AI/ML and LLM-based systems, ensuring reliable infrastructure for data processing and content classification workloads
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service