Director of Engineering, Inference Services

CoreWeaveSunnyvale, CA
6h$206,000 - $303,000Hybrid

About The Position

CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability. Founded in 2017, CoreWeave became a publicly traded company (Nasdaq: CRWV) in March 2025. Learn more at www.coreweave.com . About this Role: CoreWeave is looking for a Director of Engineering to own and scale our next-generation Inference Platform. In this highly technical, strategic role you will lead a world-class engineering organization to design, build, and operate the fastest, most cost-efficient, and most reliable GPU inference services in the industry. Your charter spans everything from model-serving runtimes (e.g., Triton, vLLM, TensorRT-LLM) and autoscaling micro-batch schedulers to developer-friendly SDKs and airtight, multi-tenant security - all delivered on CoreWeave’s unique accelerated-compute infrastructure.

Requirements

  • 10+ years building large-scale distributed systems or cloud services, with 5+ years leading multiple engineering teams.
  • Proven success delivering mission-critical model-serving or real-time data-plane services (e.g., Triton, TorchServe, vLLM, Ray Serve, SageMaker Inference, GCP Vertex Prediction).
  • Deep understanding of GPU/CPU resource isolation, NUMA-aware scheduling, micro-batching, and low-latency networking (gRPC, QUIC, RDMA).
  • Track record of optimizing cost-per-token / cost-per-request and hitting sub-100 ms global P99 latencies.
  • Expertise in Kubernetes, service meshes, and CI/CD for ML workloads; familiarity with Slurm, Kueue, or other schedulers a plus.
  • Hands-on experience with LLM optimization (quantization, compilation, tensor parallelism, speculative decoding) and hardware-aware model compression.
  • Excellent communicator who can translate deep technical concepts into clear business value for C-suite and engineering audiences.
  • Bachelor’s or Master’s in CS, EE, or related field (or equivalent practical experience).

Nice To Haves

  • Experience operating multi-region inference fleets at a cloud provider or hyperscaler.
  • Contributions to open-source inference or MLOps projects.
  • Familiarity with observability stacks (Prometheus, Grafana, OpenTelemetry) for AI workloads.
  • Background in edge inference , streaming inference, or real-time personalization systems.

Responsibilities

  • Vision & Roadmap - Define and continuously refine the end-to-end Inference Platform roadmap, prioritizing low-latency, high-throughput model serving and world-class developer UX. Set technical standards for runtime selection, GPU/CPU heterogeneity, quantization, and model-optimization techniques.
  • Platform Architecture - Design and implement a global, Kubernetes-native inference control plane that delivers <50 ms P99 latencies at scale. Build adaptive micro-batching, request-routing, and autoscaling mechanisms that maximize GPU utilization while meeting strict SLAs. Integrate model-optimization pipelines (TensorRT, ONNX Runtime, BetterTransformer, AWQ, etc.) for frictionless deployment. Implement state-of-the-art runtime optimizations —including speculative decoding, KV-cache reuse across batches, early-exit heuristics, and tensor-parallel streaming—to squeeze every microsecond out of LLM inference while retaining accuracy.
  • Operational Excellence - Establish SLOs/SLA dashboards, real-time observability, and self-healing mechanisms for thousands of models across multiple regions. Drive cost-performance trade-off tooling that makes it trivial for customers to choose the best HW tier for each workload.
  • Leadership - Hire, mentor, and grow a diverse team of engineers and managers passionate about large-scale AI inference. Foster a customer-obsessed, metrics-driven engineering culture with crisp design reviews and blameless post-mortems.
  • Collaboration - Partner closely with Product, Orchestration, Networking, and Security teams to deliver a unified CoreWeave experience. Engage directly with flagship customers (internal and external) to gather feedback and shape the roadmap.

Benefits

  • Medical, dental, and vision insurance - 100% paid for by CoreWeave
  • Company-paid Life Insurance
  • Voluntary supplemental life insurance
  • Short and long-term disability insurance
  • Flexible Spending Account
  • Health Savings Account
  • Tuition Reimbursement
  • Ability to Participate in Employee Stock Purchase Program (ESPP)
  • Mental Wellness Benefits through Spring Health
  • Family-Forming support provided by Carrot
  • Paid Parental Leave
  • Flexible, full-service childcare support with Kinside
  • 401(k) with a generous employer match
  • Flexible PTO
  • Catered lunch each day in our office and data center locations
  • A casual work environment
  • A work culture focused on innovative disruption
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service