Senior Software Engineer, Managed AI

CrusoeSan Francisco, CA
7d$166,000 - $201,000

About The Position

Crusoe's mission is to accelerate the abundance of energy and intelligence. We’re crafting the engine that powers a world where people can create ambitiously with AI — without sacrificing scale, speed, or sustainability. Be a part of the AI revolution with sustainable technology at Crusoe. Here, you'll drive meaningful innovation, make a tangible impact, and join a team that’s setting the pace for responsible, transformative cloud infrastructure. About this role: The Senior Software Engineer for the Model LifeCycle team will contribute to building a managed platform for the entire application development lifecycle, with a specific focus on leveraging Machine Learning models, including Large Language Models (LLMs).

Requirements

  • 4-5+ years of industry experience with demonstrated history of consistent success leading a varied portfolio of initiatives across your function
  • Bachelor's degree in Computer Science, Engineering, or a related field.
  • Experience delivering production-ready features.
  • Familiarity with essential cloud-based services (e.g., compute, storage, networking).
  • Familiarity with Generative AI (Large Language Models, Multimodal).
  • Experience with AI infrastructure components (training, inference).
  • Proactive and collaborative approach.
  • Strong communication and interpersonal skills.
  • Passion for building AI products and solving technical problems.

Nice To Haves

  • Proficiency in Golang or Python for production services.
  • Familiarity with PyTorch.
  • Some experience with training and fine-tuning LLMs.

Responsibilities

  • Implement and maintain systems for fine-tuning large foundation models (SFT, PEFT, LoRA, adapters), including multi-node orchestration, checkpointing, failure recovery, and cost-efficient scaling.
  • Implement and maintain end-to-end training pipelines for Large Language Models.
  • Implement components for distillation and reinforcement learning pipelines (e.g., preference optimization, policy optimization, reward modeling).
  • Develop and maintain core agent execution infrastructure.
  • Implement features for dataset, model, and experiment management, focusing on versioning, lineage, evaluation, and reproducible fine-tuning.
  • Work closely with Senior Engineers and Principal Engineers, as well as product and platform teams, to implement system abstractions and APIs.
  • Contribute to technical discussions on training runtimes, scheduling, storage, and model lifecycle management.
  • Engage with the open-source LLM ecosystem.
  • This role involves significant implementation ownership for core system components.

Benefits

  • Industry competitive pay
  • Restricted Stock Units in a fast growing, well-funded technology company
  • Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
  • Employer contributions to HSA accounts
  • Paid Parental Leave
  • Paid life insurance, short-term and long-term disability
  • Teladoc
  • 401(k) with a 100% match up to 4% of salary
  • Generous paid time off and holiday schedule
  • Cell phone reimbursement
  • Tuition reimbursement
  • Subscription to the Calm app
  • MetLife Legal
  • Company paid commuter benefit; $300/month
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service