Principal Research Engineer

Microsoft•Redmond, WA

About The Position

Join Microsoft’s CoreAI group as a Principal Research Engineer on the AI Data Platform team—the foundation for secure, scalable, reusable datasets that power AI model development across the company. This central platform manages the full lifecycle of Microsoft’s AI training data, accelerating model development with high-quality, compliant, and reusable datasets and services.

Requirements

Bachelor's Degree in Computer Science, Electrical or Computer Engineering, or related field AND 6+ years related experience (e.g., statistics, predictive analytics, research)
OR Master's Degree in Computer Science, Electrical or Computer Engineering, or related field AND 4+ years related experience (e.g., statistics, predictive analytics, research)
OR Doctorate in Computer Science, Electrical or Computer Engineering, or related field AND 3+ years related experience (e.g., statistics, predictive analytics, research)
OR equivalent experience.
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings:
Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

Nice To Haves

Doctorate in Computer Science, Electrical or Computer Engineering, or related field AND 3+ year(s) related experience (e.g., statistics, predictive analytics, research)
5+ years of coding experience in Python and experience with ML frameworks such as PyTorch and Triton
3+ years experience of large-scale model training for LLMs, SLMs, and agentic models
3+ years of proven ability to design and scale training infrastructure and pipelines in production environments
Experience with agent training frameworks
Demonstrated experience developing synthetic data generation pipelines to enable SFT and RL training of agentic models
Hands-on experience with large-scale distributed training and/or serving with demonstrated ability to dive deep into complex systems, troubleshoot unconventional issues, and craft innovative solutions under real-world constraints
Extensive experience with large-scale training, model inference, reinforcement learning, and reasoning models
Demonstrated ability to work in cross-functional teams and collaborate effectively with researchers, product managers, and other engineers to deliver complex ML solutions
Startup-style mindset: agile, solution-oriented, and self-driven

Responsibilities

Design and build a data quality evaluation framework for AI training datasets, including scalable metrics, testing methodologies, and automated reporting.
Define and operationalize quality signals aligned to model outcomes (e.g., coverage, diversity, noise/duplication, labeling consistency, safety/toxicity, privacy/compliance risk indicators).
Collaborate with cross-functional stakeholders to run experiments, establish best practices, and deliver reusable tools that scale across multiple model and product teams.
Develop task- and model-aware evaluation approaches that connect dataset properties to training performance, reliability, and safety.
Create automated dataset validation gates and monitoring to support continuous dataset iteration (e.g., regression detection across dataset versions).
Design and implement synthetic data generation pipelines (LLM-driven and programmatic approaches) to improve long-tail representation, fill coverage gaps, and accelerate iteration cycles.
Build guardrails for synthetic data: filtering, scoring, calibration, provenance tracking, and bias/safety checks to ensure quality and compliance.
Partner with engineering to integrate evaluation and generation into the platform’s end-to-end data lifecycle.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume