DevOps Engineering Lead - ML Infrastructure

Symbolica AI•San Francisco, CA
15d•Onsite

About The Position

As a DevOps Engineering Lead working closely with our Head of ML Engineering, you will lead the design, build, and optimize the infrastructure and tools that enable us to take our research and development efforts from the lab into a highly reliable, performant and secure software stack in production. You'll help accelerate the processes involved in going from research prototypes into production and enterprise ready platforms with security, availability and reliability in mind. Your work will be at the intersection of research and engineering, ensuring our R&D team has the robust platform they need to push the boundaries of AI, working with our GPU vendors, cloud providers, and on-prem servers. 📍 This is an onsite role that is based in our SF office (345 California St.)

Requirements

  • 5+ years of experience in DevOps, or infrastructure roles, with at least 2 years in machine learning infrastructure or MLOps. It would be a benefit if you have either built, maintained, or managed ML infrastructure using DevOps practices in the past.
  • Proficient in cloud-native architectures, with the ability to make the right tradeoffs where necessary
  • Experienced with Linux, containers, GPU management, Nix, Kubernetes and an interest in making sure the infrastructure behind our models is secure by design.
  • Exceptional problem-solving skills with the ability to nimbly solve edge-cases with minimum disruption.
  • Solid software engineering skills in Rust, Golang or Python

Responsibilities

  • Focus on improving the reliability and performance of our Lambda cluster and model training pipeline.
  • Assist in managing multiple Kubernetes environments across cloud providers
  • Maintain and build the internal observability platform across all environments, covering everything from GPUs, AI applications and distributed backend systems.
  • Take ownership of our model training and deployment systems, bringing them to a more scalable, production-ready state.
  • Aid in building comprehensive CI tests for GitOps repositories and promotion systems
  • Build and maintain different environments for research and client facing products according to best practices

Benefits

  • Competitive salary and early-stage equity package.
  • A high-trust, execution-first culture with minimal bureaucracy.
  • Direct ownership of meaningful projects with real business impact.
  • A rare opportunity to sit at the interface between deep research and real-world productization.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service