Anthropic Pbc-posted 8 months ago
$280,000 - $690,000/Yr
Full-time • Mid Level
San Francisco, CA

As a Research Engineer on Alignment Science at Anthropic, you will build and run elegant and thorough machine learning experiments to help us understand and steer the behavior of powerful AI systems. You will contribute to exploratory experimental research on AI safety, focusing on risks from powerful future systems, often in collaboration with other teams including Interpretability, Fine-Tuning, and the Frontier Red Team. Your work will involve developing techniques to keep highly capable models helpful and honest, ensuring advanced AI systems remain safe in unfamiliar scenarios, and creating model organisms of misalignment to improve our understanding of alignment failures.

  • Build and run machine learning experiments to understand AI behavior.
  • Contribute to exploratory research on AI safety.
  • Collaborate with teams on projects related to AI safety.
  • Develop techniques for scalable oversight of AI models.
  • Create methods to ensure AI systems remain safe in adversarial scenarios.
  • Run multi-agent reinforcement learning experiments.
  • Build tooling to evaluate the effectiveness of safety techniques.
  • Contribute to research papers, blog posts, and talks.
  • Run experiments that support AI safety efforts.
  • Significant software, ML, or research engineering experience.
  • Experience contributing to empirical AI research projects.
  • Familiarity with technical AI safety research.
  • Ability to work collaboratively on fast-moving projects.
  • Willingness to take on tasks outside of job description.
  • Experience authoring research papers in machine learning, NLP, or AI safety.
  • Experience with LLMs.
  • Experience with reinforcement learning.
  • Experience with Kubernetes clusters and complex shared codebases.
  • Visa sponsorship available.
  • Hybrid work policy with at least 25% office presence.
  • Encouragement for diverse candidates to apply.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service