Senior ML Infrastructure Engineer

Rebar•New York, NY

1d•Onsite

About The Position

Rebar is building the next-generation operating system for commercial HVAC, electrical, and plumbing suppliers and subcontractors. Over the past year, our V1 quoting product has scaled to thousands of quotes completed weekly, doubled revenue in 2026, and gained adoption across many of the top suppliers in North America. Fresh off a $14M Series A backed by leading construction tech investors, we're entering our next phase of growth — with AI at the center of everything we build next. We're looking for a Senior ML Infrastructure Engineer to build the platform our ML engineers depend on to rapidly iterate, experiment, and ship models — spanning feature pipelines, training infrastructure, evaluation, deployment, and monitoring. You'll be joining a small, highly capable team focused on delivering practical, production-ready ML systems in a fast-moving startup context. This role is ideal for someone who enjoys designing clean abstractions, integrating disparate systems into coherent platforms, and obsessing over the developer experience of the engineers they support. Our work spans the full ML lifecycle, and we're building the platform that makes it all hang together.

Requirements

Bachelor's degree or higher in Computer Science, Electrical Engineering, or other relevant field — or equivalent industry experience.
3+ years of experience building production backend systems, with significant time on internal developer platforms, ML platforms, or integration-heavy infrastructure work.
Expert-level Python; comfortable picking up other languages as the tooling demands.
2+ years of experience with cloud infrastructure (AWS preferred), including IAM, networking, and cost management.
Proven ability to design clean, composable APIs and SDKs that internal users adopt willingly.
Deep understanding of ML related workflows and requirements.

Nice To Haves

Experience integrating common ML tooling — experiment trackers (W&B, MLflow), feature stores, model serving frameworks — into broader platforms.
Built a Backstage-style internal developer portal or comparable internal platform.
Familiarity with GPU compute providers (AWS, Lambda Labs, CoreWeave, RunPod).
Some ML practitioner background — you've trained or deployed models yourself and understand the workflow from the user's side.
Experience with deployment and monitoring pipelines for ML systems.

Responsibilities

Design and build the CLI, SDK, and services that serve as the single front door to our ML platform. Make launching a training job, tracking an experiment, or shipping a model feel like one coherent product.
Wire together our cloud and SaaS stack — compute providers, storage, experiment tracking, model serving — into a unified system. Own the abstractions for compute orchestration, feature store, and model deployment.
Build cost attribution, usage dashboards, and monitoring across the platform. Surface what's running where, catch problems early, and keep the production model serving reliable.
Work closely with ML engineers to understand their workflows, turn one-off scripts into self-serve platform features, and participate in architecture and roadmap decisions.