Director, Data & AI/ML Platform Engineering

Stitch Fix
$213,000 - $284,000

About The Position

Stitch Fix is redefining retail by combining human creativity with advanced data science and Generative AI. This role leads the engineering organization responsible for three interconnected platform areas: the enterprise data platform, the machine learning platform, and the generative AI platform. This is a product leadership role focused on understanding user needs, setting a compelling product vision, and driving execution for these platforms. The role involves making consequential architectural decisions, owning the modernization agenda, and communicating strategy to stakeholders. The platforms are live production systems at a public company with meaningful scale and a clear strategic mandate for modernization. The company's top strategic initiative is building the next generation of AI-powered personalization, and this team builds the platform it runs on.

Requirements

  • 10+ years in software, data, or ML/AI platform engineering.
  • 5+ years leading engineering managers or multi-team platform organizations.
  • Track record of owning and evolving production-grade platform systems at scale, driving adoption, rationalizing legacy, and measurably improving developer and data science productivity.
  • History of making and landing consequential architectural decisions in complex, high-availability environments.
  • Hands-on experience with distributed compute and storage (Spark, Trino/Presto, Apache Iceberg or Delta Lake).
  • Hands-on experience with event streaming (Kafka, Flink).
  • Hands-on experience with workflow orchestration (Airflow).
  • Hands-on experience with data governance and quality systems.
  • Experience with feature engineering and feature stores.
  • Experience with model training pipelines.
  • Experience with model deployment and serving (Ray Serve, Triton, or equivalent).
  • Experience with monitoring and validation of ML models.
  • Experience with MLOps practices for running ML in production.
  • Experience with LLM orchestration frameworks.
  • Experience with retrieval-augmented generation (RAG).
  • Experience with agent architectures.
  • Experience with evaluation frameworks for AI.
  • Experience with cost and latency governance for AI.
  • Experience building internal developer platforms (IDPs), self-service tooling, and platform abstractions.
  • Familiarity with developer experience metrics and platform adoption patterns.
  • Experience with distributed systems design.
  • Experience with container orchestration (Kubernetes).
  • Experience with cloud infrastructure at scale (AWS preferred).
  • Product-led mindset for internal platforms, with segmented user personas, defined success metrics, and a prioritized roadmap.
  • Ability to own the full loop from discovery and planning to iterative delivery, production quality, user enablement, and feedback loops.
  • Ability to make a compelling case for multi-year platform investment to a CxO.
  • Ability to write technical design docs.
  • Ability to provide useful answers to data scientists about performance issues.
  • Ability to represent users' needs inside the platform team.
  • Ability to hold the bar on developer experience, self-service reliability, and documentation quality.
  • Treat user complaints as signal, not noise.

Nice To Haves

  • Familiarity with the emerging standards around agentic AI (Model Context Protocol or equivalent).

Responsibilities

  • Own Data infrastructure at scale, including systems that ingest, store, and make data accessible across the company (petabyte-scale lakehouse, event streaming, workflow orchestration, data governance, and self-service tools).
  • Own the Machine learning platform, including infrastructure for building, experimenting, and serving models in production (feature stores, training pipelines, distributed model serving, and MLOps practices).
  • Own the Generative AI platform, including runtime and routing infrastructure, self-service agent-building tools, context and retrieval management, observability and evaluation frameworks, and cost and safety controls.
  • Own the next generation of personalization and decisioning, including foundational platform work for the company's highest-priority strategic initiatives, partnering with Data Science, Algorithms, and Product.
  • Set and own the product vision for each platform area, treating internal platforms as products with defined success metrics and roadmaps.
  • Own platform modernization decisions, leading strategic architectural shifts.
  • Compress time from idea to production by building developer experience, self-service tooling, and golden paths.
  • Lead and grow the organization, managing engineering managers and senior ICs, creating clarity, removing blockers, and developing people.
  • Drive cross-functional alignment by partnering with Data Science, ML Engineering, Data Engineering, Product, and Business leaders.
  • Communicate with authority at every level, writing strategy documents, presenting trade-offs, and whiteboarding system designs.
  • Run the business, owning budget, headcount planning, vendor relationships, contractor management, and long-horizon platform strategy.

Benefits

  • Competitive salary
  • Benefits
  • Equity
  • Annual bonus
  • New hire and ongoing grants of restricted stock units
  • Medical, dental, vision, and other benefits
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service