Senior Principal Machine Learning Engineer

AtlassianSeattle, WA
7h$222,300 - $348,975Remote

About The Position

Atlassian is seeking a Senior Principal Machine Learning Engineer to join our GenAI Platform organization, focusing on the quality and reliability of Rovo Chat. Rovo is Atlassian’s AI teammate, embedded across our products to help teams search, understand, and act on their work. In this role, you will be the technical driver behind making Rovo Chat exceptionally accurate, trustworthy, observable, and reliable at scale. You will define what “great” looks like for GenAI chat quality, build the platforms and evaluation systems to measure it, and lead cross‑org efforts that materially improve customer outcomes and reduce incidents. This work sits at the intersection of LLMs, retrieval-augmented generation (RAG), evaluation and quality frameworks, observability, and large‑scale production systems. You will join the GenAI Platform pillar within Central AI / Engineering‑AI, working closely with the Rovo Chat product and engineering teams. Our mission is to: Provide a central GenAI platform (models, infra, evaluation, safety, and tooling) that powers AI experiences across Atlassian. Ensure Rovo Chat is a highly reliable, high‑quality assistant across Jira, Confluence, and the rest of our product suite. Drive quality, observability, and debuggability for GenAI experiences, so we can quickly detect, root‑cause, and fix issues that impact customers, incidents, Disturbed tickets, and DoS escalations. You’ll collaborate with: Rovo Chat and Search & Conversation teams on chat UX and retrieval quality, AI Fundamentals / AI Modeling / ML Platform on modeling, evaluation, training, and serving, SRE / TechOps / Support (Disturbed / DoS) on reliability, incident response, and root‑cause tooling.

Requirements

  • 10+ years of industry experience in machine learning / applied AI, including shipping production systems at scale.
  • Deep hands‑on expertise with LLMs and/or large‑scale NLP systems, including at least one of: Retrieval‑augmented generation (RAG), Search & ranking / relevance, Conversational AI / assistants / agents, Evaluation and quality frameworks for LLM applications.
  • Strong coding skills in Python (and/or Java) with the ability to write performant, production‑quality code, plus: Solid experience with Java/Kotlin and large‑scale data processing (e.g., Spark), Familiarity with cloud environments (e.g., AWS, Databricks) and modern ML tooling.
  • Demonstrated experience designing and operating ML systems end‑to‑end, including: Data pipelines and feature generation, Training, evaluation, and deployment, Monitoring, incident response, and iterative improvement.
  • A track record of technical leadership beyond a single team, such as: Driving cross‑team/platform initiatives, Making high‑impact architecture decisions, Influencing roadmaps and org‑level priorities.
  • Ability to communicate complex ML concepts clearly to engineers, PMs, designers, and leadership, and to tell a compelling story with data.
  • A strong product sense and bias for pragmatism and iteration (80/20 mindset: knowing when “good and measurable now” beats “perfect later”).

Nice To Haves

  • Master’s degree or PhD in Computer Science, Machine Learning, Statistics, or a related technical field.
  • Experience with: LLM fine‑tuning, post‑training, and optimization (instruction tuning, preference optimization, safety tuning), Model evaluation and guardrails (LLM‑as‑a‑judge, red‑teaming, safety frameworks), High‑reliability systems in SaaS (SLOs, error budgets, incident command, post‑incident analysis).
  • Prior work on AI assistants or conversational experiences in a B2B SaaS or productivity setting.
  • Experience partnering with SRE / incident management / support to reduce MTTR, improve root‑cause coverage, and lower ticket volume through better tooling and automation.
  • Experience building observability and debuggability tools for ML or GenAI systems (e.g., tracing, experiment management, evaluation platforms).

Responsibilities

  • Set the bar for Rovo Chat quality & reliability
  • Define and evolve a north‑star quality and reliability framework for Rovo Chat, spanning: Answer correctness, faithfulness, and grounding, Safety and policy adherence, Latency, robustness, and uptime, Incident, Disturbed, and DoS impact.
  • Translate these into measurable metrics, SLAs/SLOs, and dashboards that are adopted across product and platform teams.
  • Build the evaluation & observability stack for GenAI chat
  • Design and lead implementation of end‑to‑end evaluation pipelines for Rovo Chat, including: Offline evals (benchmarks, synthetic data, golden sets, human‑in‑the‑loop labeling), Online evals (A/B tests, interleaving, guardrail metrics), LLM‑as‑a‑judge and other automated evaluation techniques.
  • Drive observability and debuggability improvements (e.g., tracing, attribution, feature logging, and model behavior introspection) so engineers can quickly root‑cause regressions and incidents.
  • Partner with SRE/TechOps to connect evaluation and observability signals into incident management, improving: % of incidents successfully root‑caused, Disturbed ticket and DoS resolution efficiency.
  • Lead technical strategy for GenAI platform quality
  • Define and own technical roadmaps for GenAI platform features that directly impact Rovo Chat quality and reliability (e.g., retrieval quality, RAG orchestration, guardrails, safety filters, fallback strategies, model selection/routing).
  • Make high‑impact architecture decisions across: LLM and RAG architectures, Knowledge ingestion and retrieval, Evaluation & monitoring infra, Trust & Safety layers.
  • Identify and prioritize cross‑pillar investments (e.g., shared eval frameworks, reusable prompt libraries, safety and policy enforcement) that raise the bar across Atlassian AI.
  • Deliver high‑impact improvements to customer outcomes
  • Use data from incidents, Disturbed tickets, DoS escalations, and product telemetry to identify systemic quality and reliability gaps.
  • Lead multi‑team initiatives to: Reduce production incidents and regressions, Improve “first‑try success” rate of answers, Decrease hallucinations and unsafe outputs, Improve CSAT/NPS and key adoption/retention metrics for Rovo Chat.
  • Work closely with PMs and designers to ensure quality and reliability are visible, explainable, and trustworthy to customers.
  • Mentor, influence, and grow the AI community
  • Mentor senior/principal ML engineers and ML systems engineers across GenAI Platform and Rovo Chat.
  • Act as a technical thought partner to engineering and product leadership on GenAI quality and reliability strategy.
  • Contribute to AI best practices across Atlassian via design reviews, internal talks, and cross‑org forums.

Benefits

  • Atlassian offers a wide range of perks and benefits designed to support you, your family and to help you engage with your local community. Our offerings include health and wellbeing resources, paid volunteer days, and so much more.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service