About the position
Lambda is seeking a highly experienced Linux systems engineer to lead their efforts in delivering the best platform for AI training. The role involves optimizing performance, resource utilization, and security across their Linux hypervisor fleet. The successful candidate will be responsible for designing and implementing fleet management best practices, driving technical direction and quality of Lambda's cloud infrastructure, and researching new technical directions related to GPU virtualization and containerization for machine learning workloads. Strong expertise in Linux systems engineering, virtualization, and security, as well as experience with containerization technologies, are required qualifications for this role.
Responsibilities
- Lead efforts to deliver the best platform for AI training, focusing on performance optimization, resource utilization, and security across the Linux hypervisor fleet
- Design and implement fleet management best practices for maintaining a rapidly growing cloud platform, ensuring scalability of host lifecycle management processes and systems
- Drive technical direction and quality of Lambda's cloud infrastructure, and research new technical directions related to GPU virtualization and containerization for machine learning workloads
- Have in-depth Linux systems engineering experience, with a focus on virtualization and security with QEMU/KVM
- Architect and implement highly resilient systems for managing the life-cycle and configurations of 10,000+ hosts
- Expert-level knowledge in SDDC (e.g. MAAS), configuration management (e.g. ansible, salt stack), host configuration lifecycle (e.g. foreman), and drift detection
- Possess in-depth kernel-level understanding of Linux
- Strong engineering background, preferably in EECS, Mathematics, Software Engineering, or Physics
- Strong experience with containerization technologies like Docker (Kubernetes is a plus)
- Lead and take ownership of large, ambiguous, cross-team projects from conception to production
- Enjoy working in a fast-paced environment and making a significant business impact
- Value working on a high-performing team that emphasizes accountability and collaboration
- Value building for the long term
- Self-starter, curious, and not afraid to ask questions
- Quick learner and enjoy learning new technologies
- Care deeply about well-tested code
- Nice to have: experience working in a startup, building and maintaining infrastructure for machine learning applications, experience with GPU virtualization
Requirements
- 8+ years of in-depth Linux systems engineering, with a focus on virtualization and security with QEMU/KVM
- Experience in architecting and implementing highly resilient systems for managing the life-cycle and configurations of 10,000+ hosts
- Expert-level knowledge in SDDC (e.g. MAAS), configuration management (e.g. ansible, salt stack), host configuration lifecycle (e.g. foreman), and drift detection
- In-depth kernel-level understanding of Linux
- Strong engineering background (EECS preferred, Mathematics, Software Engineering, Physics)
- Strong experience with containerization technologies like Docker (Kubernetes is a plus)
- Ability to lead and take ownership of large, ambiguous, cross-team projects
- Enjoy working in a fast-paced environment and making a significant business impact
- Value working on a team of high performers that hold each other accountable
- Value building for the long term
- Self-starter, curious, and not afraid to ask when in doubt
- Quick learner and enjoy learning new technologies
- Strong communication and collaboration skills
- Care deeply about well-tested code
- Nice to have: experience working in a startup, experience building and maintaining infrastructure for machine learning applications, experience with GPU virtualization
Benefits
- Generous cash & equity compensation
- Health, dental, and vision coverage for you and your dependents
- Commuter/Work from home stipends
- 401k Plan
- Flexible Paid Time Off Plan
- Salary Range Information: $190,000-$250,000
- Equal Opportunity Employer