This position sits within Nebius Token Factory, our serverless platform for running and customizing open-source LLMs in production. Token Factory allows for serverless inference and fine-tuning (LoRA, full FT, RFT) backed by in-house optimizations like custom speculative decoding, quantization, cache-aware routing and dedicated endpoints. Customers come to us to move from prototype to scaled production without the cost and complexity of building and tuning their own inference stack. We're looking for a Principal ML Solutions Architect to act as the most senior technical authority for customers leveraging Token Factory's serverless inference and fine-tuning platforms. Beyond designing and implementing optimized inference and fine-tuning workflows, you will set technical direction across our largest and most strategic accounts, own the hardest performance and quality problems end to end, mentor other Solutions Architects, and serve as a primary technical voice shaping the platform roadmap with backend, product, and research teams. You’re welcome to work remotely from the United States.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Principal
Education Level
No Education Listed