MLOps Engineer

BobyardSan Francisco, CA
2dOnsite

About The Position

Bobyard builds AI systems that automate takeoffs for contractors, saving them dozens of hours per project. Delivering this reliably at scale requires production-grade ML infrastructure, deployment systems, and cloud architecture that do not break under real customer usage. You will have very high autonomy in designing, executing, and iterating on our infrastructure. We are a startup, and we move fast. You will be the person responsible for turning research models into reliable production systems and building the foundation that allows engineering to ship quickly and safely. We look for world-class engineers who think in systems, take ownership of reliability and cost, and can go heads down to build durable infrastructure.

Requirements

  • Strong PyTorch knowledge with understanding of speed and memory bottlenecks and inference optimization
  • Comfortable managing GPU services (AWS, GCP,...), model containers, versioning and scaling
  • Experience owning infrastructure at a small team or startup
  • Cloud-native and pragmatic — chooses simple, reliable solutions
  • High ownership mindset — you don’t wait to be told what to fix
  • Cost-aware and disciplined about cloud spend
  • Full-stack capable — can ship features in React or Django when needed
  • Fast learner who can navigate unfamiliar systems and tools quickly
  • Passion for building foundational systems that enable product velocity

Responsibilities

  • Design and maintain ML deployment and model serving infrastructure
  • Build end-to-end pipelines for model packaging, inference, monitoring, and scaling
  • Implement infrastructure-as-code across all cloud resources (Terraform target state)
  • Own CI/CD pipelines, release processes, and deployment automation
  • Manage GPU provisioning, utilization, and cloud cost optimization
  • Build monitoring, alerting, and observability across services
  • Work closely with ML and fullstack engineering to ship production systems
  • Contribute to product development (React + Django) when infrastructure priorities allow
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service