AI Infrastructure Engineer - Fury Team

Scout AISunnyvale, CA
82d

About The Position

The future of defense will be decided by those who field intelligent machines at scale. At Scout, we’re developing Fury — the first robotic foundation model for defense — to give U.S. forces overwhelming, adaptable, and autonomous power across every domain. Fury enables human operators to command fleets of robots through natural language, and empowers those machines to sense, decide, and act together as one. It’s not just a leap in autonomy, it’s a force multiplier built for real-world conflict. This mission will ask everything of us: urgency, precision, and relentless work. We’re looking for an AI Infrastructure Engineer to build and scale the backbone of Fury’s model training and deployment ecosystem. You’ll design the data, compute, and orchestration infrastructure that enables our vision-language-action models to learn from massive real-world datasets and operate across edge and cloud environments. This role bridges systems engineering, distributed computing, and machine learning infrastructure. Your work will ensure our teams can iterate rapidly, train large models efficiently, and deploy them reliably on robotic platforms in the field. We’re a startup. You’ll be moving fast, context-switching daily, and helping define the culture and process as we go. This is a rare opportunity to come in early and architect the future of defense.

Requirements

  • 3+ years of experience in ML infrastructure, MLOps, or large-scale data systems
  • Proven experience with distributed training (PyTorch DDP, DeepSpeed, Ray, or similar) and workflow orchestration (Kubernetes, Airflow, or equivalent)
  • Strong proficiency in Python and cloud-native infrastructure (AWS, GCP, or Azure)
  • Deep understanding of data engineering (ETL pipelines, object storage, data versioning, metadata management)
  • Familiarity with containerization and deployment (Docker, Kubernetes) and monitoring systems (Prometheus, Grafana)
  • Experience optimizing GPU cluster utilization, scaling training jobs, and profiling model performance
  • Bachelor’s degree or higher in Computer Science, Electrical Engineering, or related technical field

Nice To Haves

  • Experience with edge-deployed ML systems, federated training, or robotic data collection pipelines

Responsibilities

  • Design and implement data pipelines for ingesting, transforming, and storing petabytes of multimodal data from Fury’s robotic and operator systems
  • Develop internal tooling for dataset exploration, curation, versioning, and quality monitoring over time
  • Build and maintain distributed training infrastructure (cloud and on-prem) for large-scale multimodal and foundation model training
  • Implement job orchestration workflows for launching, tracking, and debugging large-scale model runs
  • Identify and remediate bottlenecks in compute, memory, storage, and network performance to optimize throughput and cost efficiency
  • Collaborate with AI, autonomy, and systems teams to ensure data and training infrastructure supports real-time and mission-critical use cases
  • Maintain observability and reliability tooling for training and inference pipelines
  • Stay current on best practices in MLOps, distributed training frameworks, and AI infrastructure at scale

Benefits

  • Competitive base salary and meaningful equity
  • Premium medical, dental, and vision plans with $0 paycheck contribution
  • Competitive PTO and company holiday calendar
  • Catered lunch daily and fully stocked kitchen
  • EV charging
  • Relocation assistance (depending on role eligibility)

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

Bachelor's degree

Number of Employees

1-10 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service