Senior Platform Engineer I, AI Evaluation (24 months fixed-term)

Khan Academy•Mountain View, HI

56d•Remote

About The Position

We’re looking for an AI Platform Engineer to evolve and extend our internal evaluation framework for assessing the quality of our AI-driven experiences at Khan Academy. This engineer will have worked with enough eval systems to quickly make sense of Khan's internal eval framework and recognize opportunities for improvement. This is largely a software development role, but domain experience with AI eval is essential for appreciating the hill-climbing and data science workflows we need to support. Soft skills will be important for gathering internal requirements, getting buy-in for changes, and then developing documentation and training materials. You’ll work closely with ML data engineers and platform developers to help internal teams adopt an eval-driven development process incorporating offline benchmark tests and online experiments. As a Platform Engineer focused on evaluation , you’ll be expected to: Be fluent in the range of offline and online evaluation strategies, and when to apply the techniques over the lifecycle of development Have intuitions about how to specify eval pipelines succinctly using declarative syntax Understand the role of stratified datasets and ground truth labeling Appreciate the range of eval scoring schemes from human raters to automated LLMs-as-judge We are a remote-first organization and we strive to build using technology that is best suited to solving problems for our learners. Currently, we build with Go, GraphQL, JavaScript, React & React Native, Redux and we adopt new technologies like LLMs when they’ll help us better achieve our goals. At Khan, one of our values is “Cultivate Learning Mindsets”, so for us, it’s important that we’re working with all of our engineers to help match the right opportunity to the right individual, in order to ensure every engineer is operating at their “learning edge”. Currently, we are focused on providing equitable solutions to historically under-resourced communities of learners and teachers, and guided by our Engineering Principles . You can read about our latest work on our Engineering Blog . A few highlights: Incremental Rewrites with GraphQL Our Transition to React Native Go + Services = One Goliath Project How Engineering Principles Can Help You Scale How to upgrade hundreds of React components without breaking production

Requirements

Bachelor’s or Master’s degree in Computer Science, Data Engineering, related field, or equivalent professional experience.
5 years of Software Engineering including significant time working on the evaluation of generative AI systems or other evaluations of ML model quality
Strong programming skills in Go, Python, SQL, and at least one data pipeline framework (e.g., Airflow, Dagster, Prefect)
Familiarity with the architecture of large language models and their industry-standard APIs

Nice To Haves

Experience with labeling platforms (e.g., Label Studio, Scale AI, Toloka) and human-in-the-loop concerns such as rubric development and inter-rater agreement
Exposure to MLOps practices such as model registry, feature store, or continuous evaluation
Background in education technology or other human-centered AI applications

Responsibilities

Be fluent in the range of offline and online evaluation strategies, and when to apply the techniques over the lifecycle of development
Have intuitions about how to specify eval pipelines succinctly using declarative syntax
Understand the role of stratified datasets and ground truth labeling
Appreciate the range of eval scoring schemes from human raters to automated LLMs-as-judge

Benefits

Competitive salaries
Ample paid time off as needed – Your well-being is a priority
8 pre-scheduled Wellness Days in 2026 occurring on a Monday or a Friday for a 3-day weekend boost
Remote-first culture - that caters to your time zone, with open flexibility as needed, at times
Generous parental leave
An exceptional team that trusts you and gives you the freedom to do your best
The chance to put your talents towards a deeply meaningful mission and the opportunity to work on high-impact products that are already defining the future of education
Opportunities to connect through affinity, ally, and social groups
401(k) + 4% matching & comprehensive insurance, including medical, dental, vision, and life

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume