Nebius-posted 9 days ago
Full-time • Mid Level
Remote
1,001-5,000 employees

In this role, you will lead the definition, development, and delivery of Nebius Token Factory’s inference capabilities, focusing on highly scalable, production-grade machine learning systems. You will be responsible for shaping the direction of our inference platform, driving product decisions that balance performance, reliability, and real-world customer needs. This includes working closely with engineering and research teams to design and optimize real-time and batch inference workflows, supporting customer PoCs, and translating technical challenges into clear product requirements. You will work directly with customers and internal stakeholders to understand ML workflows at scale, identify bottlenecks, and define features that improve latency, throughput, orchestration, and deployment efficiency. You will also guide product adoption by delivering intuitive tools and robust infrastructure that solve complex inference problems across diverse use cases. This role requires a strong technical foundation in ML systems and a product mindset oriented toward execution, clarity, and long-term scalability. You are welcome to work remotely from the US.

  • Own the product roadmap for Nebius Token Factory inference capabilities, focusing on high-load, production-grade ML scenarios.
  • Be involved in customer PoCs involving distributed ML model deployment, inference orchestration, and optimization.
  • Work closely with engineering and research teams to shape scalable infrastructure for real-time and batch inference.
  • Act as the technical voice in customer conversations, translating ML workflows into product requirements.
  • Drive product adoption by delivering tools and features that solve real-world inference problems at scale.
  • 3–5 years of product management experience, ideally in cloud infrastructure, ML platforms, or developer tools.
  • Strong technical foundation (e.g. Computer Science or Engineering degree) with ability to dive deep into model architectures and serving systems.
  • Familiarity with modern ML inference tools and frameworks (e.g., Triton Inference Server, vLLM, SGLang, TensorRT-LLM, Dynamo, KServe, Ray Serve).
  • Proven track record of delivering technically complex products that support distributed and high-throughput ML pipelines.
  • Strong communicator with experience working across engineering, research, and customer-facing teams.
  • Deep understanding of modern ML architectures, including transformer-based models and their inference characteristics.
  • Experience delivering or supporting ML solutions in production as part of a customer-facing or solutions role.
  • Knowledge of MLOps or AIOps cycles, including observability, performance optimization, and continuous delivery of ML systems.
  • Health insurance: 100% company-paid medical, dental, and vision coverage for employees and families.
  • 401(k) plan: Up to 4% company match with immediate vesting.
  • Parental leave: 20 weeks paid for primary caregivers, 12 weeks for secondary caregivers.
  • Remote work reimbursement: Up to $85/month for mobile and internet.
  • Disability & life insurance: Company-paid short-term, long-term and life insurance coverage.
  • Competitive salary and comprehensive benefits package.
  • Opportunities for professional growth within Nebius.
  • Flexible working arrangements.
  • A dynamic and collaborative work environment that values initiative and innovation.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service