AI Behavior Engineer

Transluce•San Francisco, CA

2d•Onsite

About The Position

Transluce is a fast-moving nonprofit research lab building the public tech stack for scalable AI evaluation and oversight. We specialize in behavioral evaluations of frontier AI systems, assessing how models actually behave in deployment, not just how they perform on benchmarks. We are an independent non-profit with a mission to steer the development of AI for the public good. About the role: We're looking for an engineer to work on measuring and shaping AI model behaviors, someone who thrives on turning hard questions into evidence fast. Think of this as a forward-deployed engineering role working directly with policymakers, civil society partners, and frontier labs to rapidly answer key questions about why AI systems act the way they do, and when and why they fail. You'll build relationships with external domain experts, adapt our methods to new contexts, and help ensure our work is both technically credible and immediately useful to the people making consequential AI governance decisions. This is a high-autonomy role with direct exposure to senior stakeholders and a clear line of sight from your work to real-world impact.

Requirements

Hands-on experience designing and running AI evaluations, particularly behavioral or interactive evaluations (multi-turn, agentic, or red-teaming contexts)
Strong engineering instincts and good judgment about when "good enough to ship" is actually good enough.
Experience in customer-facing, consulting, or forward-deployed roles translating ambiguous stakeholder needs into concrete deliverables.
Experience running evaluations at scale or in a production context.
Ability to understand and balance between the needs of AI researchers and domain experts, as well as between researchers and senior decision makers.
Strong communication skills, low ego, openness to giving and receiving feedback.

Responsibilities

Build and extend Transluce’s AI evaluation methods for measuring important evolving AI model behaviors.
Scope, prototype, and run behavioral evaluations in response to emerging policy and oversight needs, including rapid-turnaround work for government and civil society partners.
Execute on Transluce's contracts with government evaluators, including building evaluations for harmful manipulation with the EU AI Office.
Design and run privileged-access evaluations and external oversight exercises with frontier labs.
Work with civil society organizations and domain experts to adapt our behavioral evaluation pipelines to their contexts (e.g., mental health, persuasion, evaluation awareness).