DevOps Engineer

BlitzyCambridge, MA
Onsite

About The Position

As a DevOps Engineer at Blitzy, you will be a critical force behind the infrastructure powering our cutting-edge AI agents and enterprise software development platform. Based out of our Cambridge, MA headquarters, you'll architect and maintain the scalable, resilient systems that enable Blitzy to autonomously deliver production-ready software at unprecedented speed. This is a high-impact, hands-on role where your work directly shapes the reliability and performance of a platform used by Fortune 500 companies.

Requirements

  • 5–8 years of DevOps or infrastructure engineering experience in production environments.
  • Deep expertise in Kubernetes — including deployment, scaling, networking, and troubleshooting.
  • Strong Python proficiency for automation, scripting, and tooling.
  • Hands-on experience with Helm for application package management.
  • Proven track record designing and maintaining CI/CD pipelines.
  • Experience with major cloud platforms (AWS, Azure, or GCP).
  • Proficiency with Terraform for Infrastructure as Code.
  • Strong Linux administration skills and containerization expertise (Docker).

Nice To Haves

  • CKA (Certified Kubernetes Administrator) certification.
  • Experience with MLOps tooling such as MLflow, Kubeflow, or similar platforms.
  • Background in microservices architecture and service mesh technologies.
  • Familiarity with API gateway management and advanced service mesh configurations.
  • A bias for automation — if you've done something manually twice, you've already started scripting it.
  • Passion for AI infrastructure and excitement about building systems at the frontier of what's technically possible.

Responsibilities

  • Build, manage, and scale Kubernetes clusters supporting AI agent workloads and production application deployments.
  • Design and implement robust CI/CD pipelines for both application services and AI-driven workflows.
  • Automate infrastructure provisioning, scaling, and operations using Python and Terraform.
  • Deploy and maintain applications via Helm charts, ensuring consistency across environments.
  • Own the observability stack: alerting, distributed tracing, and monitoring for all production services and APIs.
  • Build and maintain infrastructure for AI agent orchestration, enabling reliable and high-throughput agent execution.
  • Partner closely with engineering teams to improve developer experience, deployment strategies, and operational tooling.
  • Maintain and continuously improve the security, reliability, and cost-efficiency of our cloud environments.

Benefits

  • company equity tied to performance
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service