Stack AV-posted 6 months ago
Pittsburgh, PA
101-250 employees

In the ML Data team, our mission is to provide trusted and useful data to efficiently power all of Stack's ML applications end-to-end from labeling to training to safety evaluation. We work hand in hand with AV autonomy teams to provide cutting edge solutions to all their data needs, working across data engineering, ML modeling, and ML infrastructure. In particular, we provide services to find (data mining), curate (datasets), annotate (data labeling), and serve (high throughput data access) data for all ML needs. Training: We are building state of the art infrastructure to support machine learning training and inference workloads using OSS components such as Ray, Spark, and Iceberg. Data Mining: We are building a framework and infrastructure to find interesting events quickly and flexibly. As part of this mission, you would be setting the direction for and helping us build an inference service using LLMs and vector db. Labeling: You would set the direction and build towards auto-labeling. You would be the owner driving labeling needs of the entire company.

  • Push the GPU to its limit from Python to CUDA kernel level.
  • Build the inference or training loop for large models, ideally with LLM flavor.
  • Ship ML products (NLP, computer vision, recommender systems, etc.) at scale to make a business impact.
  • Develop data platform infrastructure for real-time querying/vector databases and batch/stream processing using technologies like Ray, Spark, or similar.
  • Create Parquet-based object storage solutions (data lake/data warehouse).
  • Build low latency/high throughput batch or stream processing pipelines.
  • Write (readable) high-performance C++ code.
  • Experience with both ML platforms and building ML-based applications (modeling experience is a bonus).
  • Proven track record of building scalable, reliable infrastructure in a fast-paced environment.
  • Ability to collaborate effectively across teams.
  • Experience building or using ML infrastructure for a large number of customer teams.
  • Deep understanding of design trade-offs with the ability to articulate those trade-offs and achieve alignment with others.
  • Experience in building ML models or infrastructure in domains such as autonomous vehicles, perception, and decision-making (desirable but not required).
  • Experience with model training, model optimization, or large data processing pipelines.
  • Prior experience in autonomous vehicles (AV) is a plus.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service