Cloud Platform Team Lead

HavocAI

About The Position

We are seeking a Cloud Platform Team Lead to lead the development and operation of the infrastructure and platform systems that power our engineering organization. In this role, you will lead a team of Cloud Platform, DevOps, and Site Reliability Engineers, driving execution while contributing hands-on to architecture and system design. You will be responsible for building and maintaining a reliable, scalable platform that enables engineering teams to deploy, operate, and scale mission-critical systems. This is a hands-on leadership role—you will guide the team, set direction, and contribute directly to building the platform.

Requirements

5–8+ years of experience in cloud infrastructure, DevOps, or platform engineering
Prior experience leading or mentoring engineers (formal or informal)
Strong experience with Docker, Kubernetes, and cloud infrastructure
Solid understanding of distributed systems and system reliability
Experience operating production systems with monitoring and incident response
Strong problem-solving skills and ability to operate in a fast-paced environment

Nice To Haves

Experience with CI/CD systems and automation tooling
Exposure to SRE practices (SLIs, SLOs, incident management)
Experience supporting data pipelines or data platforms
Familiarity with AWS or similar cloud providers
Background in robotics, autonomy, or distributed systems

Responsibilities

Lead day-to-day execution of the Cloud Platform team
Set priorities, unblock engineers, and ensure consistent delivery
Provide mentorship through code reviews, design discussions, and technical guidance
Partner with leadership to align platform priorities with company goals
Design, build, and maintain core cloud infrastructure and platform services
Support containerized environments using Docker and Kubernetes
Ensure infrastructure is reliable, scalable, and secure
Contribute hands-on to system design and implementation
Improve CI/CD pipelines and deployment workflows
Support implementation of SRE practices, including monitoring, alerting, and incident response
Help define and maintain observability across services and systems
Reduce operational toil through automation and tooling
Partner with engineering teams across autonomy, backend, hardware, and data
Support teams in deploying and operating services effectively
Help build self-service tools and workflows to improve developer productivity
Maintain strong security practices across infrastructure and systems
Troubleshoot and resolve issues across infrastructure, services, and deployments

Benefits

100% Employer paid Health, Dental and Vision Insurance for you and your families
Life Insurance (Employer Paid)
Ability to participate in the companies 401k program (Matching)
Unlimited PTO policy with an enforced 2 week minimum
Equity Package
Work / Home Office Stipend
Global Entry
16 Week Paid Parental Leave
Monthly Health and Wellness Stipend

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume