Senior Machine Learning Platform Engineer

Charlie Health•New York, NY

14h•$170,000 - $220,000•Hybrid

About The Position

Charlie Health leads the nation in high-acuity virtual behavioral care, having delivered life-saving treatment to more than 100,000 clients nationwide. Our ML and AI capabilities are expanding rapidly—powering recommendation systems, clinical decision support, agentic AI products, and developer tooling—and the infrastructure underneath needs to scale with them. As our first dedicated ML Platform Engineer, you'll define the technical direction and build the foundational systems that our data scientists, ML engineers, and product teams depend on to ship AI-powered features reliably and at scale. We have several models in production today and are investing in hosted GPU inference to support the next generation of our AI capabilities. You'll inherit and evolve existing infrastructure while building new platform capabilities—from multi-tenant model serving and GPU inference pipelines to multimodal data management, evaluation frameworks, observability, and infrastructure as code. You'll own the platform layer that makes ML/AI development at Charlie Health fast, safe, and repeatable as the team grows. If you care about building the systems that let others build great things, this team is for you.

Requirements

4+ years of professional experience in software engineering, with at least 2 years focused on ML infrastructure, ML platform, or AI systems engineering
Strong software engineering fundamentals in Python and deep infrastructure expertise
Familiarity with cloud ML services (AWS SageMaker, GCP Vertex AI, or similar) and CI/CD for ML pipelines
Experience with infrastructure as code (Terraform, Pulumi, or similar) and container orchestration (Kubernetes, ECS)
Excellent at managing ambiguity—able to break down big, messy problems into smaller parts with tractable solutions and clear iterations
Growth mindset and sense of humor; you welcome feedback, adapt quickly in a fast-paced environment, and foster a culture of learning and fun

Nice To Haves

Experience building evaluation and observability systems for LLM-based or agentic AI applications is a plus
Experience with a systems language (Go, Rust, or C++) is a plus

Responsibilities

Define technical direction for ML/AI infrastructure and make build-vs-buy decisions as the founding platform engineer
Design and operate multi-vendor AI infrastructure supporting client-facing and clinician-facing LLM applications across multiple LLM providers
Design, build, and operate production model serving systems; maintain infrastructure as code for reproducible ML environments, training pipelines, and deployment workflows
Develop high-performance GPU inference pipelines with low latency and high availability
Own the multimodal data pipeline layer—manage ingestion, processing, and serving of text, audio, and structured clinical data for ML and AI systems
Create reliable infrastructure for agentic AI systems, including orchestration, monitoring, evaluation, and observability tooling
Build developer tooling that accelerates data science and ML engineering workflows across the organization
Own AI observability—build monitoring, alerting, and debugging capabilities for production ML systems
Partner with ML engineers, data scientists, and product teams to understand infrastructure needs and translate them into scalable platform capabilities
Foster a culture of collaboration and learning across engineering, product, and design through mentoring, documentation, presentations, and knowledge sharing
Participate in our on-call rotation to ensure model serving uptime, pipeline reliability, and infrastructure health