Principal Engineer - Platform Engineering

Health GPT IncPalo Alto, CA
45dOnsite

About The Position

We are seeking a Principal Engineer - Platform Engineering to design, scale, and evolve the infrastructure that powers our AI models and SaaS applications across multiple clouds. You'll define the technical direction for our platform, architect systems for large-scale model training and inference, and help with hiring and mentoring engineers. This role blends deep hands-on engineering with technical leadership, partnering closely with AI research and product engineering teams to deliver secure, high-performance, and cost-efficient systems.

Requirements

  • 12+ years in software engineering with strong experience in platform or distributed systems.
  • Deep knowledge of AWS, GCP, or Azure, and multi-cloud architectures.
  • Expertise in IaC tools (Terraform, Pulumi) and CI/CD automation.
  • Experience with Kubernetes, GPU orchestration, or model-serving (e.g., Ray, Triton, SGLang).
  • Proficiency with observability tools (Prometheus, Grafana, OpenTelemetry, Datadog).
  • Strong understanding of cloud security, networking, and cost optimization.
  • Excellent communicator and mentor; strong cross-team collaboration skills.

Nice To Haves

  • Experience with LLM or AI infrastructure, including large-scale inference or fine-tuning.
  • Background in data pipelines or developer platform tooling.
  • Exposure to hybrid-cloud or edge AI systems.
  • Track record of technical leadership and long-term architectural vision.

Responsibilities

  • Architect and scale the multi-cloud platform for AI inference and SaaS workloads.
  • Lead design for performance, reliability, and cost optimization.
  • Build automation and tooling using infrastructure-as-code and CI/CD best practices.
  • Partner with research to productionize models efficiently.
  • Improve observability, telemetry, and incident response.
  • Foster a culture of technical and operational excellence
  • Hire and mentor strong engineers to guide long-term technical strategy.
  • Evaluate emerging technologies in AI infra, GPU orchestration, and distributed systems.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Principal

Education Level

No Education Listed

Number of Employees

101-250 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service