Staff Software Engineer, ML Infrastructure

SimpliSafeBoston, MA
$146,600 - $215,100Hybrid

About The Position

We're looking for a Staff Software Engineer to join our Cloud ML team — the team that owns both the cloud-side ML infrastructure and the applied ML research that powers SimpliSafe's intelligent home security products. This is a senior individual contributor role for a distributed systems expert who wants to apply that craft to one of the most demanding problem domains in the company. You'll partner closely with other Staff and Principal engineers to drive architecture, mentor across the team, and set the technical direction for our ML platform. The work spans two of our most demanding workloads: real-time computer vision inference that processes video from cameras and doorbells across our customer base, and LLM/GenAI infrastructure that will power our future generation of intelligent applications. Both are, fundamentally, distributed systems problems — high-throughput, low-latency, multi-tenant, GPU-aware, and unforgiving of regressions. This role is for someone who has built and operated large-scale distributed services in production — high-QPS APIs, real-time platforms, low-latency serving systems — and is excited to bring that depth to ML infrastructure. Prior ML experience is a plus, not a prerequisite. If you've shipped systems that serve a lot of traffic, scale gracefully, and stay up at 3am, we want to talk to you.

Requirements

  • 8+ years of software engineering experience, with a clear track record of building and operating large-scale distributed systems in production.
  • Deep expertise in high-throughput, low-latency services — ad serving, recommendations, real-time APIs, online platforms, or similar — including the operational reality of running them at scale.
  • Strong production experience on Kubernetes and AWS (EKS, S3, IAM, networking) and with Kafka, containerized deployments, CI/CD, and infrastructure-as-code.
  • Demonstrated experience with the building blocks of high-scale systems: load balancing, autoscaling, batching, caching, multi-tenancy, queuing, and capacity planning.
  • Proficiency in Python is required; experience with a systems language (Go, C++, Rust) for performance-sensitive components is a plus.
  • Staff-level technical leadership: ability to drive ambiguous, cross-cutting initiatives, align senior stakeholders, and elevate the engineers around you without formal authority.
  • Strong written and verbal communication — you can make complex technical tradeoffs legible to ML scientists, product, and other infra teams.
  • ML exposure is preferred — having deployed or operated production ML systems, worked closely with ML teams, or built ML-adjacent infrastructure. Exceptional distributed systems engineers without direct ML experience are encouraged to apply; we'll help you ramp.

Nice To Haves

  • Hands-on experience with Ray, KServe, Triton, vLLM, or other ML serving stacks.
  • Hands-on experience with LLM serving in production (vLLM, TGI, TensorRT-LLM, SGLang) — KV cache management, continuous batching, speculative decoding, quantization for serving.
  • Experience building real-time video or streaming pipelines (Kafka, Kinesis, Flink, or similar) at scale.
  • Experience operating GPU-based inference systems — GPU-aware scheduling, multi-model serving, accelerator utilization optimization.
  • Familiarity with ML fundamentals — how models are trained, evaluated, versioned, deployed, monitored, and rolled back in production.
  • Experience with model lifecycle tooling (MLflow, Weights & Biases, model registries, drift detection, shadow deployments).
  • Open source contributions to distributed systems or ML infrastructure projects.
  • Experience operating in environments with strong security and compliance requirements.

Responsibilities

  • Set technical direction for ML infrastructure
  • Drive architecture decisions for our Kubernetes-based ML platform — anchored on Ray for inference, alongside KServe, Triton, and vLLM — across real-time and batch workloads.
  • Lead deep technical reviews on system design, capacity planning, and reliability for the highest-stakes ML systems at SimpliSafe.
  • Identify and remove the systemic bottlenecks in our ML deployment infrastructure — whether that's serving reliability, deployment friction, observability gaps, scaling, or cost.
  • Build and operate real-time CV inference at scale
  • Own the design and evolution of cloud-side inference systems that process live video and events from SimpliSafe devices in real time.
  • Drive throughput, latency, and cost improvements (batching strategies, GPU utilization, autoscaling, multi-model serving) for production CV models.
  • Build the feedback loops between cloud inference, edge devices, and the data flywheel that improves model quality over time.
  • Stand up LLM/GenAI serving infrastructure
  • Help shape how SimpliSafe serves LLMs in production — model serving patterns, KV-cache and batching strategies, evaluation pipelines, guardrails, and cost controls.
  • Partner with applied ML engineers to take new GenAI-powered product features from prototype to scaled deployment.
  • Raise the engineering bar across Cloud ML
  • Mentor engineers across the team through design reviews, code reviews, pairing, and written guidance — a meaningful uplift on everyone you work with.
  • Establish and evangelize best practices for model lifecycle management (registry, deployment, monitoring, rollback, drift) and on-call.
  • Write the documentation, runbooks, and architectural decision records that make the platform legible and durable.
  • Own reliability and operational excellence
  • Lead incident response and postmortems for critical ML systems; turn lessons learned into platform-level improvements.
  • Define SLOs, observability standards, and on-call practices for ML services in production.

Benefits

  • A comprehensive total rewards package that supports your wellness and provides security for SimpliSafers and their families
  • Free SimpliSafe system and professional monitoring for your home.
  • Employee Resource Groups (ERGs) that bring people together, give opportunities to network, mentor and develop, and advocate for change.
  • participation in our annual bonus program, equity, and other forms of compensation
  • a full range of medical, retirement, and lifestyle benefits
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service