About The Position

As a Principal Applied Scientist focused on RL post‑training, you will lead the design and deployment of learning systems that shape how our models behave in real products. You will own the technical direction and strategy for post‑training and adaptation of large models to align behavior with user value, safety, and business objectives. This is a high‑impact principal IC role with broad influence across Zillow, working closely with senior leadership to ensure our investments translate into safer, more capable, and more trusted AI‑powered experiences.

Requirements

  • You are an applied scientist who is excited to use reinforcement learning and post‑training methods to shape how AI systems behave in complex, high‑judgment settings, and you are comfortable owning ambiguous problems end-to-end—from framing the objective and data strategy to shipping models into production and measuring their impact.
  • You have a PhD or equivalent experience in Computer Science, Electrical Engineering, Statistics, or a related field, with emphasis in areas such as reinforcement learning, bandits, large language models, or applied machine learning.
  • You have strong, current expertise in post‑training techniques (such as supervised fine‑tuning, DPO, RLHF/RLAIF, preference modeling, and multi‑objective optimization), in evaluation and monitoring of aligned models (including win‑rate experiments, human and AI feedback loops, long‑horizon evaluation, and safety or guardrail metrics), and in modern transformer-based models and tooling such as LLMs, multimodal models, vector search, and orchestration frameworks.
  • You have experience working with cross‑functional partners (for example, engineering, product, design, operations, legal, and compliance) in domains where safety, trust, or regulation matter, such as marketplaces, finance, healthcare, or other high‑stakes verticals.
  • You demonstrate technical leadership and mentorship, helping senior engineers and scientists grow, creating clarity amid ambiguity, and driving alignment across teams, and you communicate complex technical ideas clearly to both expert and non‑expert audiences in writing and verbally.

Nice To Haves

  • Here at Zillow, we value the experience and perspective of candidates with non‑traditional backgrounds. We encourage you to apply if you have transferable skills or related experiences.

Responsibilities

  • Lead the technical direction and strategy for RL post‑training of production models, partnering with other scientists, engineers, and product leaders to align models with customer and business needs.
  • Design and implement post‑training pipelines that combine techniques such as supervised fine‑tuning on curated demonstrations, preference modeling and pairwise ranking, and RL‑based alignment approaches like RLHF, RLAIF, or DPO for multi-objective optimization.
  • Develop reward models and objective formulations that balance constraints such as helpfulness, safety, fairness, compliance, and customer satisfaction, and iterate on them using human and AI feedback at scale through online and batch adaptation loops with strong guardrails.
  • Translate conversational logs, behavioral signals, and structured attributes into training, reward, and evaluation signals for post‑training and reinforcement learning, turning heterogeneous data into actionable supervision.
  • Partner with model and platform teams to improve the efficiency and robustness of training and evaluation, including off-policy evaluation, replay strategies, controlled rollouts, and metrics and evaluation frameworks such as win-rates versus baselines, safety and quality metrics, and expert-review programs.
  • Mentor applied scientists and engineers, raising the technical bar in RL, post-training, and evaluation, and contributing to the broader AI roadmap at Zillow through thought leadership and guidance.
  • When appropriate, represent Zillow’s work externally through talks, publications, or open‑source contributions.

Benefits

  • In addition to a competitive base salary this position is also eligible for equity awards based on factors such as experience, performance and location.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Principal

Education Level

Ph.D. or professional degree

Number of Employees

101-250 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service