Lightning AI is seeking an experienced Infrastructure Operations Engineers to help scale and operate our next-generation AI infrastructure platform. Our InfraOps team sits at the center of reliability, automation, and operational scale for GPU infrastructure. This team owns break/fix operations, incident response, customer provisioning, observability, and the automation systems that keep complex infrastructure running efficiently. In this role, you’ll work hands-on with large-scale GPU environments, Linux systems, bare metal infrastructure, provisioning workflows, and platform reliability. You’ll partner closely with Infrastructure Engineering, Network Operations, and Software Platform teams to troubleshoot issues, improve operational efficiency, and build automation that reduces manual toil over time. We’re flexible on location for this team. This role can work hybrid out of one of our US-based hubs (Seattle, NYC, or SF) or fully remote within the U.S., with occasional company and team offsites. We are not able to provide visa sponsorship for this position at this time.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior
Education Level
No Education Listed