Senior Engineering Manager, ML Platform

WhatnotLos Angeles, CA
3d$255,000 - $345,000Hybrid

About The Position

We’re looking for hands-on builders–intellectually curious, deeply technical leaders eager to shape the future of AI and ML at Whatnot. You’ll lead the development and scaling of the core infrastructure that powers machine learning and self-hosted large language model applications across the company, working side by side with machine learning scientists to bring cutting-edge models powered by near-realtime features into production and unlock entirely new product experiences. This means building systems that make advanced ML dependable and fast at scale–from low-latency deep learning model serving and streaming feature ingestion to distributed training and high-throughput GPU inference. This is a management role that requires strong technical depth–potential candidates should be excited about getting and staying in the weeds. You will be expected to up-level architectural discussion, provide technical feedback, and code at least a day a week. US Based: We offer flexibility to work from home or from one of our global office hubs, and we value in-person time for planning, problem-solving, and connection. Team members in this role must live within commuting distance of our New York, Seattle, Los Angeles, and San Francisco hubs.

Requirements

  • 4+ years of engineering management experience developing production machine learning systems at consumer-scale loads
  • Bachelor’s degree in Computer Science, Statistics, Applied Mathematics or a related technical field, or equivalent work experience.
  • 5+ years of hands-on software engineering experience building and maintaining production systems for consumer-scale loads.
  • 1+ years of professional experience developing software in Python
  • Ability to work autonomously and drive initiatives across multiple product areas and communicate findings with leadership and product teams.
  • Experience with operational, search, and key-value databases such as PostgreSQL, DynamoDB, Elasticsearch, Redis.
  • Experience working with with ML-specific tools and frameworks such as MLFlow, LitServe, TorchServe, Triton
  • Firm grasp of visualization tools for monitoring and logging e.g. DataDog, Grafana.
  • Familiarity with cloud computing platforms and managed services such as AWS Sagemaker, Lambda, Kinesis, S3, EC2, EKS/ECS, Apache Kafka, Flink.
  • Professionalism around collaborating in a remote working environment and well tested, reproducible work.
  • Exceptional documentation and communication skills.

Responsibilities

  • Own the infrastructure powering AI and ML models across critical business surfaces–supporting growth, recommendations, trust and safety, fraud, seller tooling, and more.
  • Guide the prototyping, deployment, and productionization of novel ML architectures that directly shape user experience and marketplace dynamics.
  • Help design and scale inference infrastructure capable of serving large models with low latency and high throughput.
  • Oversee and evolve real-time feature pipelines that feed both our online and offline stores, ensuring single-second feedback from behavioral signals, high reliability, and model training fidelity.
  • Drive feature platform improvements and expand scope to cover non-ML use cases such as fraud rules where point-in-time backtesting is also critical.
  • Lead the development of distributed training and inference pipelines leveraging GPUs and both model and data parallelism.
  • Optimize system performance by managing resource utilization and developing intelligent feature caching strategies.
  • Empower scientists to iterate faster by building abstractions, APIs, and developer tools that simplify the development of near-realtime features and model iteration.
  • Roll out ever-better ergonomics around model training and deployment.
  • Stretch beyond your comfort zone to take on new technical challenges as we scale AI across Whatnot’s ecosystem.

Benefits

  • Generous Holiday and Time off Policy
  • Health Insurance options including Medical, Dental, Vision
  • Work From Home Support
  • Home office setup allowance
  • Monthly allowance for cell phone and internet
  • Care benefits
  • Monthly allowance for wellness
  • Annual allowance towards Childcare
  • Lifetime benefit for family planning, such as adoption or fertility expenses
  • Retirement; 401k offering for Traditional and Roth accounts in the US (employer match up to 4% of base salary) and Pension plans internationally
  • Monthly allowance to dogfood the app
  • Parental Leave
  • 16 weeks of paid parental leave + one month gradual return to work company leave allowances run concurrently with country leave requirements which take precedence.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service