Platform Engineer II/III

Zone 5 Technologies

About The Position

At Zone 5 Technologies, we're redefining what's possible in unmanned aircraft systems. Our team of engineers and innovators is developing cutting-edge autonomous solutions that push the boundaries of UAS technology - solving complex challenges that matter. We're building the future of UAS capabilities, and we're looking for exceptional talent to join us. If you're driven by hard problems, energized by rapid innovation, and ready to make an impact on next-generation flight systems, you belong here. We are seeking a Platform Engineer to architect and operate scalable compute infrastructure that powers our autonomous vehicle simulation and testing framework. You will build elastic compute systems across AWS and on-premises clusters, enabling engineering teams to rapidly iterate on autonomy algorithms through massive parallel simulation workloads.

Requirements

Bachelor's in Computer Science, Software Engineering, or related technical field – equivalent industry experience also welcome
2-5+ years of experience in platform engineering, DevOps, SRE, or cloud infrastructure roles
Strong hands-on experience with Kubernetes for container orchestration and workload management
Experience with cloud computing platforms and services (compute, storage, networking)
Deep understanding of Linux system administration and troubleshooting
Strong networking fundamentals including TCP/IP, routing, DNS, VPNs, and security
Understanding of infrastructure as code principles and configuration management
Proficiency in scripting and automation (Python, Bash, or similar)
Experience building and maintaining CI/CD pipelines
Solid grasp of distributed systems concepts, job scheduling, and resource management
Ability to design infrastructure from first principles and make architectural decisions

Nice To Haves

Experience building infrastructure for simulation, robotics, or autonomous systems workloads
Understanding of GPU computing and accelerated workload management
Knowledge of job scheduling systems for batch and parallel workloads
Experience managing on-premises clusters and hybrid cloud architectures
Familiarity with robotics middleware (ROS/ROS2) or simulation platforms
Understanding of cost optimization for compute-intensive workloads
Experience with monitoring, logging, and observability systems
Knowledge of containerization technologies and image management
Background in data engineering, MLOps, or machine learning infrastructure
Experience with network performance analysis and troubleshooting
Understanding of software-defined networking and network automation
Familiarity with security compliance requirements in aerospace/defense environments

Responsibilities

Design and implement auto-scaling compute infrastructure for simulation workloads using cloud platforms
Build and maintain on-premises GPU and CPU clusters for simulation and machine learning training
Architect hybrid cloud solutions that optimize cost and performance across cloud and local compute resources
Implement job scheduling and orchestration systems using Kubernetes for thousands of concurrent simulations
Design storage solutions for large-scale simulation data, logs, and artifacts using cloud and local storage systems
Deploy and maintain robotics simulation environments at scale
Build CI/CD pipelines for automated simulation testing of autonomy software
Create infrastructure for distributed parameter sweeps, Monte Carlo testing, and regression suites
Develop monitoring and observability systems for simulation fleet health and resource utilization
Implement data pipelines for simulation results ingestion, analysis, and visualization
Write and maintain infrastructure as code for reproducible infrastructure deployment
Build automation tools and CLI utilities to simplify developer access to compute resources
Implement GitOps workflows for infrastructure changes and configuration management
Create self-service interfaces for engineers to launch and manage simulation jobs
Develop cost monitoring and optimization strategies for cloud and on-prem resources
Monitor and optimize infrastructure performance, reliability, and cost efficiency
Troubleshoot complex distributed systems issues across networking, storage, and compute layers
Implement backup, disaster recovery, and business continuity strategies
Maintain security best practices including IAM, secrets management, and network isolation
Collaborate with autonomy, ML, and robotics teams to understand compute requirements and optimize workflows
Design and implement network architectures for distributed simulation workloads across AWS and on-premises environments
Configure VPCs, subnets, security groups, and routing for secure, high-performance compute clusters
Establish hybrid cloud connectivity (VPN, Direct Connect, site-to-site tunnels) between on-premises and cloud resources
Optimize network performance for large data transfers, multi-node communication, and distributed workloads
Support internal infrastructure network design and provide technical guidance to engineering programs
Troubleshoot network issues including latency, packet loss, and connectivity problems across distributed systems

Benefits

Competitive total compensation package
Comprehensive benefit package options include medical, dental, vision, life, and more.
401k with company-match
4 weeks of paid time off each year
12 annual company holidays

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume