Sr. DevOps Engineer, Cloud Platform

Contoro Inc.•Austin, TX

3d•Onsite

About The Position

Contoro Robotics is an Austin-based startup revolutionizing warehouse automation with AI-powered robotic solutions tackling real industrial challenges. Our mission is to deploy scalable, human-in-the-loop autonomous systems that reliably perform in the field. As our fleet of unloading robots expands, we're looking for a talented DevOps Engineer to help scale and harden our Cloud Platform infrastructure. This role is critical to enabling real-time robot operation, system monitoring, and on-demand AI training infrastructure. You’ll own key components of our web services, networking, and CI/CD systems.

Requirements

Direct experience designing and building AWS-driven infrastructure from scratch (maintaining existing systems is not sufficient).
5+ years of hands-on experience with AWS, Linux, Terraform, and Python.
Prior ownership or leadership of production infrastructure projects.
Solid knowledge of AWS services (IAM, S3, EC2, ECR, VPC, etc.).
Proficient with Docker, Docker Compose, and Terraform.
Experience with messaging and communication protocols (e.g., Kafka, MQTT, WebSockets).
Deep knowledge of scalable data stores (SQL, Redis, Timeseries, etc) and retention policies.
Passion for CI/CD workflows and automated testing pipelines.
Strong sense of ownership, urgency, and curiosity.
Excellent communication skills—both verbal and written.
Ability to work collaboratively across cross-functional teams.
Minimum B.S. in Computer Science, Engineering, or related field (or equivalent industry experience).

Nice To Haves

Experience managing GPU clusters or distributed compute environments.
Exposure to robot fleet orchestration or IoT deployment strategies.
Familiarity with 5G hardware, VPN technologies, and network configuration.
Familiarity with cloud-native observability stacks (e.g., Prometheus, Grafana, ELK).

Responsibilities

Lead and maintain three foundational pillars of our infrastructure: Web Services (AWS-hosted and on-premise deployments), Fleet Network (Secure, scalable data communication with the robots), and CI/CD Pipelines (Fast and reliable build/test/deploy automation).
Lead the migration of key services to cloud-based infrastructure (AWS).
Design and maintain secure user access, containerized services, and cloud-native integrations.
Optimize uptime and performance of GPU clusters for real-time and batch AI model training.
Build and scale secure remote-access networking for robot fleet management.
Track and improve KPIs around build speed, service uptime, and deployment cadence.
Ensure infrastructure performance stays within defined budget constraints.
Promote software development best practices, including automation, versioning, and test coverage.
Collaborate closely with software, hardware, and AI teams to integrate infrastructure into the product lifecycle.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume