Senior DevOps Engineer - Platform 1 (P1)

ZiplineSouth San Francisco, CA
21d

About The Position

Zipline is on a mission to transform the way goods move. Our aim is to solve the world’s most urgent and complex access challenges by building, manufacturing and operating the first instant delivery and logistics system that serves all humans equally, wherever they are. From powering Rwanda’s national blood delivery network and Ghana’s COVID-19 vaccine distribution, to providing on-demand home delivery for Walmart, to enabling healthcare providers to bring care directly to U.S. homes, we are transforming the way things move for businesses, governments and consumers. The technology is complex but the idea is simple: a teleportation service that delivers what you need, when you need it. Using robotics and autonomy, we are decarbonizing delivery, decreasing road congestion, and reducing fossil fuel consumption and air pollution, while providing equitable access to billions of people and building a more resilient global supply chain. Join Zipline and help us to make good on our promise to build an equitable and more resilient global supply chain for billions of people. Zipline’s Platform 1 system powers our long-range autonomous aircraft and delivery infrastructure, an integrated stack of on-prem hardware, robotics, and cloud-connected services that must perform flawlessly, around the clock, in the real world. As a DevOps Engineer, you’ll be part of the team that ensures these systems remain reliable, observable, and scalable as we expand globally. You’ll work across the boundary between software and hardware building monitoring frameworks, automating deployments, and managing the infrastructure that keeps Zipline’s physical operations connected and performing. You are someone who thrives in complex environments, loves solving systems challenges, and takes pride in building reliability into everything you touch. You bring technical depth, hands-on expertise, and a mindset that blends engineering precision with operational pragmatism.

Requirements

  • 6+ years of professional experience in DevOps, Site Reliability, and/or Infrastructure Engineering roles.
  • Deep expertise in Linux systems administration, performance tuning, and troubleshooting.
  • Experience managing and scaling on-prem and hybrid infrastructure environments.
  • Proficiency in monitoring and logging tools (Prometheus, Grafana, ELK, etc.) and a strong understanding of observability principles.
  • Familiarity with infrastructure-as-code tools (e.g., Terraform, CDK).
  • Scripting or programming skills in Python, and Bash.
  • Strong communication and cross-functional collaboration skills—you work well across hardware, software, and operations domains.
  • A problem-solving mindset, with the grit and adaptability to thrive in dynamic, evolving systems.

Nice To Haves

  • Experience with container orchestration (Kubernetes, Docker/DockerCompose); huge plus if this experience is in hybrid or on-prem deployments.
  • Background in networking, bare metal server management or robotics infrastructure is a plus.
  • Familiarity with CI/CD and deployment pipelines for hardware-software systems is a plus.

Responsibilities

  • Ensure reliability and uptime of Platform 1’s hybrid infrastructure, spanning on-prem servers, edge devices, and infrastructure for cloud-based services.
  • Support the work of application engineers deploying software - by owning the deploy toolchain and management of the infra the services run on.
  • Design, implement, and evolve observability systems; metrics, logging, tracing, and alerting, to provide deep visibility into system health and performance.
  • Automate and scale maintenance operations for our on premise servers, reducing manual intervention and improving deployment repeatability using tools like Terraform and Ansible.
  • Administer and optimize Linux systems and network configurations that support mission-critical operations.
  • Lead and participate in incident response, driving both quick resolution and long-term prevention through post-incident analysis and automation.
  • Partner with software, flight systems, and operations teams to diagnose, resolve, and prevent system-level issues across environments.
  • Become THE in-house expert for DevOps on Platform 1 – learn, understand, and work to improve our compute infrastructure and development practices.
  • Continuously improve standards and processes for system configuration, deployment, and monitoring, helping raise the technical bar for reliability at Zipline.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

No Education Listed

Number of Employees

1,001-5,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service