Staff AI/ML Engineer, MyHealthTeam

Swoop Airlines and Aviation•San Francisco, CA

1d•Hybrid

About The Position

What you get to do every day Build end-to-end ML/LLM features from problem definition → data → modeling → evaluation → deployment → monitoring. Develop LLM applications with retrieval and tool use (e.g., RAG, orchestration/workflows, structured extraction) to deliver trustworthy consumer health experiences. Convert unstructured text (posts, comments, messages, search queries) into structured signals (topics, entities, intent, sentiment, safety flags) using a mix of classical NLP and modern LLMs. Create and maintain data pipelines for training, inference, evaluation, and analytics (batch and/or streaming as needed). Design evaluation systems that measure quality and safety: offline metrics, golden datasets, human review workflows, and online A/B testing alignment. Implement production guardrails to reduce harm and misinformation risk (policy constraints, refusal behavior, citations/attribution when appropriate, red-teaming, monitoring, and incident response). Set up monitoring for model + system health (latency, cost, drift, regressions, quality metrics). Partner closely with the Product, Engineering, and Data teams and clinical/subject-matter experts to validate outputs and define what “correct” means for sensitive, health-adjacent use cases. (Staff scope) Lead architecture and technical direction for applied AI across the organization; mentor engineers; establish best practices and reusable platforms. Examples of problems you might work on Personalized recommendations for communities, posts, resources, or next-best actions Safer content understanding: detection of misleading/high-risk health claims, escalation workflows Search and discovery improvements using embeddings, hybrid retrieval, and ranking Summarization and structuring of long threads into navigable insights (with safety constraints) Member intent understanding from behavioral + text signals

Requirements

8+ years building and shipping production ML systems (or equivalent experience with demonstrable impact)
Strong Python skills and experience with ML/LLM libraries and tooling (e.g., Hugging Face ecosystem, LangChain/LangGraph, or equivalent)
Proven ability to design production-grade pipelines (training/inference/eval) and operate models in real systems (monitoring, rollbacks, incident handling)
Solid grounding in ML fundamentals (NLP, deep learning, statistical reasoning, evaluation)
Experience with MLOps best practices: versioning, reproducibility, CI/CD, model registry patterns, feature/data management, and infrastructure collaboration
Experience working with large-scale data using Databricks/Spark or equivalent distributed processing
Strong product and stakeholder instincts: you can translate ambiguous business needs into measurable ML outcomes

Nice To Haves

Experience building RAG and retrieval systems: vector databases, hybrid search, ranking, recommendation, query understanding
Experience in healthcare or regulated environments, including privacy-by-design, auditability, and safety reviews (HIPAA/PHI familiarity a plus)
Experience with streaming/clickstream data, experimentation platforms, and causal/measurement thinking
Ability to prototype end-to-end experiences (e.g., Streamlit, Gradio, lightweight frontends)
Experience designing LLM safety systems: red-teaming, adversarial testing, prompt injection mitigation, output filtering, human-in-the-loop review

Responsibilities

Build end-to-end ML/LLM features from problem definition → data → modeling → evaluation → deployment → monitoring.
Develop LLM applications with retrieval and tool use (e.g., RAG, orchestration/workflows, structured extraction) to deliver trustworthy consumer health experiences.
Convert unstructured text (posts, comments, messages, search queries) into structured signals (topics, entities, intent, sentiment, safety flags) using a mix of classical NLP and modern LLMs.
Create and maintain data pipelines for training, inference, evaluation, and analytics (batch and/or streaming as needed).
Design evaluation systems that measure quality and safety: offline metrics, golden datasets, human review workflows, and online A/B testing alignment.
Implement production guardrails to reduce harm and misinformation risk (policy constraints, refusal behavior, citations/attribution when appropriate, red-teaming, monitoring, and incident response).
Set up monitoring for model + system health (latency, cost, drift, regressions, quality metrics).
Partner closely with the Product, Engineering, and Data teams and clinical/subject-matter experts to validate outputs and define what “correct” means for sensitive, health-adjacent use cases.
Lead architecture and technical direction for applied AI across the organization; mentor engineers; establish best practices and reusable platforms.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume