Sierra-posted 2 months ago
Full-time • Mid Level
San Francisco, CA
251-500 employees

As a Software Engineer, Infrastructure at Sierra, you will be responsible for designing, building, and maintaining the core systems that make our AI platform possible. You’ll focus on making Sierra’s infrastructure secure, reliable, and scalable, enabling product teams to deliver with speed and confidence.

  • Ensure the reliability, scalability, and performance of our platform and LLM inference serving as we rapidly grow traffic.
  • Build and maintain cloud infrastructure using Terraform to ensure scalable, secure, and reproducible environments.
  • Create and maintain a self-serve infrastructure platform that enables the rest of engineering to deploy and operate services.
  • Own and evolve CI/CD pipelines and release management, enabling fast, reliable deployments for Sierra’s platform.
  • Architect and operate distributed systems that leverage distributed databases, retrieval systems, and ML models.
  • Develop and maintain core data serving abstractions along with authentication and security features (SSO, RBAC, authentication controls).
  • Navigate and integrate our stack with enterprise customer environments in scalable and maintainable ways.
  • Enhance observability tooling (metrics, logging, tracing) to provide deep visibility into platform health and performance.
  • Lead and participate in incident management, improving system resilience through proactive monitoring, root cause analysis, and postmortems.
  • Strong software engineering background with 5–7+ years of hands-on development experience in highly technical products.
  • A strong inclination towards building automation, tooling, and platform, along with designing maintainable systems.
  • Proven experience with cloud platforms (AWS, GCP, or Azure) and infrastructure as code (Terraform preferred).
  • Hands-on expertise in CI/CD systems, release management, and container orchestration (e.g., Docker, Kubernetes).
  • Experience with observability tools (Prometheus, Grafana, Datadog, OpenTelemetry, etc.).
  • Experience in incident response and operating distributed systems in production.
  • Degree in Computer Science or related field, or equivalent professional experience.
  • Production experience working with LLMs and machine learning models.
  • Background in distributed systems, running SaaS services at scale, and agentic architecture.
  • Familiarity with security and authentication protocols (OAuth, SSO, mTLS).
  • Previous experience in a fast-paced startup environment or platform/infra-focused team.
  • Flexible (Unlimited) Paid Time Off
  • Medical, Dental, and Vision benefits for you and your family
  • Life Insurance and Disability Benefits
  • Retirement Plan (e.g., 401K, pension) with Sierra match
  • Parental Leave
  • Fertility and family building benefits through Carrot
  • Lunch, as well as delicious snacks and coffee to keep you energized
  • Discretionary Benefit Stipend giving people the ability to spend where it matters most
  • Free alphorn lessons
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service