Member of Technical Staff - Infrastructure Engineer, Frontier AI & Robotics (FAR)

Amazon•San Francisco, CA

1d•$150,000 - $300,000

About The Position

Amazon’s Frontier AI & Robotics (FAR) team is seeking a Member of Technical Staff, Infrastructure to build and scale the foundational systems that power our robotics research and development platform. In this role, you will design and operate the distributed infrastructure that enables our researchers and engineers to train foundation models, run large-scale experiments, and deploy intelligent robotic systems at Amazon scale. Join the next revolution in robotics, where you’ll work alongside world-renowned AI pioneers to push the boundaries of what’s possible in robotic intelligence. As a Member of Technical Staff focused on Infrastructure, you’ll build the critical platform layer that accelerates every aspect of FAR’s research — from high-throughput data pipelines and experiment management systems to low-latency model serving and configuration delivery for robotic deployments. This role is deeply technical and focuses on performance, scalability, and reliability at scale. You will design systems that support volumes of training data, operate with strict latency requirements, and provide the compute and data foundation that enables breakthrough research across FAR’s robotics ecosystem.

Requirements

5+ years of distributed systems experience
Bachelor's degree in Computer Science or a related field
Proficiency in Python and at least one systems or backend programming language (e.g., Go, Java, C++)
Experience with cloud infrastructure platforms (AWS, GCP, or Azure), including compute, storage, and networking services
Experience building or maintaining data pipelines, ETL systems, or ML training/serving infrastructure
Understanding of system reliability principles including monitoring, observability, fault tolerance, and on-call operational practices

Nice To Haves

Experience supporting AI/ML research workflows, including building and optimizing training stack, experiment tracking, dataset management, or model deployment infrastructure
Familiarity with robotics platforms, simulation environments, or real-time systems with strict latency requirements
Experience with large-scale data processing frameworks (e.g., Apache Spark, Flink, or Ray) and query optimization for analytics workloads
Demonstrated ability to lead large technical initiatives and influence architectural decisions across cross-functional teams
Experience building developer tooling, internal platforms, or self-service infrastructure systems that improve research or engineering productivity

Responsibilities

Design and build scalable compute and data infrastructure to support model training, inferencing, and eval for frontier AI/Robotics development
Lead large technical initiatives and shape the architecture of FAR’s research platform infrastructure
Develop tooling and frameworks that accelerate research workflows, including dataset management, visualization, and quality assessment systems
Optimize query performance and data availability for experimentation and analytics workflows used by research teams
Improve the performance, efficiency, and reliability of FAR’s core compute and storage infrastructure, ensuring systems remain fast and stable at scales
Build highly scalable experimentation and analytics infrastructure to support model evaluation, A/B testing, and feature performance
Collaborate directly with science and robotics teams to support research projects through both infrastructure development and hands-on technical contribution