As a Research Engineer on Alignment Science at Anthropic, you will build and run elegant and thorough machine learning experiments to help us understand and steer the behavior of powerful AI systems. You will contribute to exploratory experimental research on AI safety, focusing on risks from powerful future systems, often in collaboration with other teams including Interpretability, Fine-Tuning, and the Frontier Red Team. Your work will involve developing techniques to keep highly capable models helpful and honest, ensuring advanced AI systems remain safe in unfamiliar scenarios, and creating model organisms of misalignment to improve our understanding of alignment failures.