Staff Software Engineer, Infrastructure - Machine Learning

Ryder Supply Chain SolutionsSan Francisco, CA
1dHybrid

About The Position

Job Seekers can review the Job Applicant Privacy Policy by clicking here. Job Description: Responsibilities Own Core ML Infrastructure: Build and scale distributed systems for ML training, serving, and inference. Design and implement real-time ML workflows that power core product features. Implementation of Distributed Systems: Build robust distributed systems tailored for efficient ML training and seamless operational deployment. Feature Engineering Enhancement: Streamline and manage both online and offline feature stores, optimizing feature engineering processes for greater efficiency. Real-Time ML Workflow Enhancement: Improve real-time machine learning workflows to support dynamic decision-making and automate core operational processes. Platform Level Ownership: Lead the development of ML Ops systems, including model deployment, monitoring, and experiment tracking. Architect and manage scalable feature stores for online and offline usage. AI-Driven Optimization: Contribute to agentic AI systems for freight matching, ETA prediction, and load scheduling. Support systems that improve Stop Estimation Accuracy and Cross-Mode Optimization. Production Ready Engineering: Write production-grade Python that operates at scale, with reliability and performance top of mind. Collaborate across engineering and data science to turn models into resilient software systems.

Requirements

  • Production Python & Distributed Systems Expertise Advanced proficiency in Python at a Staff Level Must be within a production environment where the code directly impacts operations.
  • Experience in distributed computing, scalable ML infrastructure, & high-performance engineering.
  • Machine Learning (MLOps) Scales ML infra for multiple teams and use cases.
  • Experience implementing and serving ML algorithms.
  • Ensures reproducibility, lineage, and experiment rigor.
  • Owns end-to-end ML systems: training, deployment, features, monitoring, rollback.
  • Hands-on experience with data engineering, distributed training, model monitoring, and experiment tracking.
  • Breadth of knowledge and applied experience across multiple ML applications, with proven ability to leverage a wide range of tools, frameworks, and systems.
  • Technical Leadership & Cross-Functional Influence Leads design and delivery of large-scale ML or distributed systems.
  • Defines reusable patterns, standards, and architectures.
  • Drives decisions that improve reliability, latency, and developer velocity.
  • Sets technical direction and elevates ML engineering standards.
  • Communicates vision and trade-offs across disciplines.
  • Can Mentor other ML engineers on the team.

Nice To Haves

  • 5 to 8 years of backend or ML infrastructure experience.
  • Proven track record building production ML workflows at scale.
  • Experience in industry logistics, transportation, or freight is a bonus.

Responsibilities

  • Own Core ML Infrastructure: Build and scale distributed systems for ML training, serving, and inference.
  • Design and implement real-time ML workflows that power core product features.
  • Implementation of Distributed Systems: Build robust distributed systems tailored for efficient ML training and seamless operational deployment.
  • Feature Engineering Enhancement: Streamline and manage both online and offline feature stores, optimizing feature engineering processes for greater efficiency.
  • Real-Time ML Workflow Enhancement: Improve real-time machine learning workflows to support dynamic decision-making and automate core operational processes.
  • Platform Level Ownership: Lead the development of ML Ops systems, including model deployment, monitoring, and experiment tracking.
  • Architect and manage scalable feature stores for online and offline usage.
  • AI-Driven Optimization: Contribute to agentic AI systems for freight matching, ETA prediction, and load scheduling.
  • Support systems that improve Stop Estimation Accuracy and Cross-Mode Optimization.
  • Production Ready Engineering: Write production-grade Python that operates at scale, with reliability and performance top of mind.
  • Collaborate across engineering and data science to turn models into resilient software systems.

Benefits

  • Competitive Base Salary
  • Long Term Cash Incentive Plans
  • Annual Company Bonus
  • 401k with Matching
  • Hybrid Work Schedule
  • Comprehensive Health Coverage
  • Hyper-Stable, publicly traded Enterprise
  • Employee Stock Purchase Program (15% discount to market value)
  • Collaborative, Tech-Forward, Cozy office environment in Hayes Valley
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service