AI Research Engineer, Handshake AI

Handshake•San Francisco, CA

About The Position

Handshake is the career network for the AI economy. 20 million knowledge workers, 1,600 educational institutions, 1 million employers (including 100% of the Fortune 50), and every foundational AI lab trust Handshake to power career discovery, hiring, and upskilling, from freelance AI training gigs to first internships to full-time careers and beyond. This unique value is leading to unparalleled growth; in 2025, we tripled our ARR at scale. Why join Handshake now: Shape how every career evolves in the AI economy, at global scale, with impact your friends, family and peers can see and feel Work hand-in-hand with world-class AI labs, Fortune 500 partners and the world’s top educational institutions Join a team with leadership from Scale AI, Meta, xAI, Notion, Coinbase, and Palantir, among others Build a massive, fast-growing business with billions in revenue About the Role Design and implement post-training systems and methodologies in close partnership with research scientists and domain experts Build and maintain infrastructure that supports large-scale model training, specialized data processing, and benchmark evaluation Develop robust frameworks for verifying the quality and integrity of highly specialized domain datasets Create next-generation LLM benchmarks that push the boundaries of model evaluation and capabilities assessment Optimize performance across software and hardware layers to accelerate post-training experimentation and deployment Collaborate across disciplines to ensure rigorous validation of model improvements and benchmark reliability

Requirements

Strong Python programming skills with attention to clean, efficient, and scalable code
Experience building and operating large-scale systems for model post-training, specialized data processing, or benchmark evaluation
Deep familiarity with PyTorch and modern post-training techniques (RLHF, constitutional AI, etc.)
A background in applied machine learning, model evaluation, or large-scale data quality assessment
Experience with benchmark design, evaluation methodologies, and performance measurement frameworks
Clear communication skills and a collaborative mindset for cross-functional research teams

Nice To Haves

Experience optimizing deep learning models for performance (e.g., memory usage, training speed)
Interest in the societal and ethical impacts of AI technologies
Contributions to open-source ML infrastructure or tools

Responsibilities

Design and implement post-training systems and methodologies in close partnership with research scientists and domain experts
Build and maintain infrastructure that supports large-scale model training, specialized data processing, and benchmark evaluation
Develop robust frameworks for verifying the quality and integrity of highly specialized domain datasets
Create next-generation LLM benchmarks that push the boundaries of model evaluation and capabilities assessment
Optimize performance across software and hardware layers to accelerate post-training experimentation and deployment
Collaborate across disciplines to ensure rigorous validation of model improvements and benchmark reliability