Staff Machine Learning Engineer, Responsible AI

Zoom•Seattle, WA

2d•Hybrid

About The Position

This position connects AI Research and Production Engineering. Researchers define metrics and policies, while you ensure system scalability, architecture, and reliability. You will write code and mentor engineers to develop advanced safety infrastructure. Lead quality measurement advancements for AI models and systems across products. Design evaluation frameworks emphasizing rigor, scalability, and reliability. Innovate in statistical analysis, machine learning, and experimental methods to deliver actionable insights. Your work will influence product strategies and enhance AI’s large-scale impact. About the Team Trust is a crucial aspect of the Generative AI era. Our goal is to make AI models and products beneficial, safe, and truthful. We develop essential systems for AI safety and reliability, addressing risks like inaccuracies, prejudice, harmful content, and security vulnerabilities on a large scale.

Requirements

Possess an academic and professional background in computer science or machine learning, with extensive experience designing large-scale AI or distributed systems.
Have Engineering Excellence: Expert-level software development skills with a focus on system architecture, code maintainability, and building robust production services.
Demonstrate AI/ML Domain Depth: Deep understanding of the LLM lifecycle and the unique challenges of evaluating non-deterministic systems (Quality, Safety, and Alignment).
Need to have experience scientific Rigor: Solid foundation in statistical analysis and experimental design (A/B testing) to validate model interventions and product quality.
Demonstrate AI Infrastructure: Deep experience with PyTorch and the Hugging Face ecosystem (Transformers, Datasets).
Have inference & Serving: Proficiency in high-throughput inference engines (e.g., vLLM, SGLang, or TensorRT-LLM).
Need t have core Languages: Expert-level Python; proficiency in high-performance languages such as Go, C++, or Java.
Demonstrate expertise in CI/CD patterns, automated testing frameworks, and large-scale data processing pipelines for engineering systems.
Leverage knowledge of cloud-native infrastructure and distributed system monitoring to support production AI services effectively.

Nice To Haves

Technical Leadership: Proven ability to define long-term technical roadmaps, drive organizational alignment on engineering standards, and mentor senior talent.
Advanced Applied ML: Experience in high-performance model optimization and the deployment of sophisticated evaluation methodologies in production.
AI Quality & Excellence: Deep domain expertise in AI alignment and reliability, with a focus on benchmarking and elevating model performance across diverse applications.

Responsibilities

Developing an Assessment Methodology and Strategy: Establishing a comprehensive approach to evaluate AI quality, safety, and alignment across various product modalities.
Designing an Evaluation Framework: Creating a strategy to define key performance metrics and safety parameters for AI models in diverse applications.
Identifying Systemic Risks: Proactively detecting potential failure modes like hallucinations, bias, and vulnerabilities in applications of language models.
Designing Risk Evaluation Protocols: Establishing systematic methods to measure and address identified risks in AI applications.
Providing Strategic Technical Guidance: Conducting analyses to determine whether to develop or procure evaluation tools for optimal performance.
Integrating Advanced Evaluation Techniques: Staying updated with cutting-edge methods to ensure effective and accurate AI model assessments.
Creating a scalable evaluation infrastructure involves designing and managing a modular system supporting offline benchmarking and online monitoring.
Ensuring compatibility across various AI product lines is essential.
Integrating automated quality gates into the CI/CD pipeline guarantees that all model deployments comply with predefined quality and safety standards before production.
Building a framework for extensive A/B testing and online experimentation validates the practical effects of model adjustments and safety mechanisms.
Ensuring engineering excellence and technical guidance, this role involves serving as the primary authority for AI evaluation and technical decision-making.
Establishing clear technical roadmaps, the individual translates product requirements into actionable plans and leads the team through intricate system designs.
Creating organizational standards, this role oversees AI quality by promoting the use of RFCs, design reviews, and best practices for data labeling and model versioning.
Guiding talent development, the position focuses on mentoring team members while encouraging continuous learning and thorough peer evaluations.