Staff Software Engineer: Microservice Infrastructure & Real-Time ML Inference

Sanas•Palo Alto, CA

34d

About The Position

We're looking for a Staff Software Engineer (Backend) to design and build the next generation of our real-time translation infrastructure. You'll architect mission-critical microservices that power low-latency audio/video processing pipelines, working with cutting-edge speech recognition, translation, and voice synthesis technologies. You'll be instrumental in scaling our platform to handle millions of concurrent streaming sessions while maintaining sub-100ms latency requirements. This role combines deep systems programming, distributed systems architecture, and cloud infrastructure expertise. Mission & Scope Own Sanas' microservice and streaming architecture, that power sub-100 ms, real-time language translation in both B2B and B2C environments. Define Technical Strategy, align multiple teams, and raise the bar on reliability, performance, and reliability across regions.

Requirements

7+ years of Software Engineering experience, with a focus on distributed architecture and technical leadership.
Strong proficiency in Python or Go; strong async/concurrency (asyncio/futures), profiling, and GC/heap tuning.
Strong proficiency in Containerization and Orchestration: AWS/Azure, Terraform, Kubernetes, IaaC patterns and node pools. (CPU/GPU)
Experience in ML Inference: Triton/vLLM/TorchServe; GPU scheduling/packing, batching, A/B and shadow traffic.
Experience with gRPC/protobuf at scale (versioning, interceptors, performance tuning, and compatibility testing)

Nice To Haves

Experience with WebRTC/SRTP, RTP/RTCP, NAT traversal STUN/TURN,, SIP interop; FFmpeg/codec tradeoffs.
Experience in data streaming with Kafka, Redis, DynamoDB; exactly-once/at-least-once patterns; stream-batch bridges.

Responsibilities

Lead the design for high-throughput, low-latency microservices that enable bidirectional streaming in Sanas' audio/video pipelines.
Build event/telemetry/feature pipelines (Kafka/Redis/DynamoDB) that support near-real-time decisions and model features at scale.
Productionize model serving (Triton/vLLM/TorchServe), implement autoscaling/batching/shadow-deploys, and enforce p99/p999 budgets.
Establish SLOs/error budgets, graceful degradation (keep call quality first), idempotency, circuit breakers, retries with jitter, and chaos drills.
Lead Sanas-wide logging/metrics/tracing (OpenTelemetry), RED/USE dashboards, and symptom-based alerting.
Drive cross-team designs, mentor seniors, lead postmortems/design reviews, and lay the foundation for shared libraries and patterns (auth, interceptors, tracing, schema rollout).

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume