Nebiusposted 2 months ago
Full-time • Senior

About the position

Nebius is leading a new era in cloud computing to serve the global AI economy. We create the tools and resources our customers need to solve real-world challenges and transform industries, without massive infrastructure costs or the need to build large in-house AI/ML teams. Our employees work at the cutting edge of AI cloud infrastructure alongside some of the most experienced and innovative leaders and engineers in the field. We are seeking a Senior Technical Product Manager, ML/AI Lifecycle Services to join our team. In this role, you will oversee the planning and prioritization of services across the ML/AI lifecycle, including data preparation, training, fine-tuning, experiments, monitoring and inference. You will deliver products for leading AI companies, utilizing thousands GPU within one cluster with cutting-edge hardware. We also provide room for creativity, empowering you to take the initiative and build what you think is best.

Responsibilities

  • Be a center of ML/AI expertise for both dev and business teams.
  • Own the backlog of 1–3 AI/ML products.
  • Make technical requirements for IaaS and PaaS teams that are essential for your products.
  • Introduce and promote products to the market in collaboration with cross-functional teams.
  • Make materials and onboarding guides for Solution Architect teams and Sales.
  • Be an internal customer for a Marketplace and Solution Architects teams to build E2E scenarios using our products.

Requirements

  • Technical expertise is mandatory.
  • Solid experience as an ML Engineer/MLOps Engineer/AI Engineer with one or more domains from the following list: Distributed training that utilizes at least dozens of hosts using Slurm, Ray Cluster, MosaicML.
  • Organizing ML infrastructure using best MLOps practices with instruments like MLflow, W&B, MosaicML, Kubeflow, Apache Airflow, ClearML, AzureML, SageMaker, VertexAI.
  • Maintaining and optimizing a large inference cluster with KServe, vLLM, Triton, RunAI, Seldon.
  • Experience using data preparation tools like Databricks and Apache Spark.
  • Building a product on top of LLMs that leverages techniques such as RAG, fine-tuning, and function calling, with an understanding of continuous eval of the quality.
  • Product management experience is not required but willingness to learn is essential.

Nice-to-haves

  • Experience as an ML engineer, specializing in developing large generative AI models.
  • Worked as an MLOps, Solution Architect or DevOps engineer, providing infrastructure for ML teams.
  • Experience in building MLOPS based on serverless GPU services such as Modal, Cerebrius and Google Cloud Run.
  • Background as an ML engineer and transitioned to product management, with a proven track record of delivering complex products for tech customers.

Benefits

  • Competitive salary and comprehensive benefits package.
  • Opportunities for professional growth within Nebius.
  • Hybrid working arrangements.
  • A dynamic and collaborative work environment that values initiative and innovation.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service