We have an opening for a Development Operations (DevOps) Engineer. You will develop and support a robust, scalable, and operational infrastructure at the intersection of High-Performance Computing (HPC), on-prem cloud native technologies, and AI/ML software stacks to support, develop, and deploy collaboration tools and services for users of LLNL’s high-performance computers. You will work independently, applying software engineering and DevOps skills on a variety of hardware platforms to enable state-of-the-art collaboration and productivity tools for developers and scientists located world-wide. This position is in the Livermore Computing Division within the Computing Directorate. This position will be filled at either the SES.1 or SES.2 level depending on your qualifications. Additional job responsibilities (outlined below) will be assigned if you are selected at the higher level. You will Build, deploy, support, and enhance LLNL containerized applications and software stacks deployed in our LC OpenShift/Kubernetes clusters. Identify issues and propose solutions to technical problems across a wide range of projects and efforts to improve design and implementation of DevOps best practices. Perform software engineering using established development practices, tools, and processes for achieving robust software quality; including testing, configuration management, change management, and documentation. Collaborate closely with other technical teams/developers to ensure solutions are secure and integrated with other services as appropriate. Engage directly with HPC customers who use our tools and systems, delivering timely, customer-focused support and guidance. Assist with managing OpenShift/Kubernetes container orchestration infrastructure in Linux, to support complex operational, development, and security requirements. Investigate and deploy infrastructure monitoring, alerting, and logging tools for the DevOps infrastructure. Support and improve automating deployments of infrastructure services and applications with the design principles of high availability and zero downtime updates. Work with users and LC/LLNL security regarding use of on-premises and third-party, cloud-based AI offerings, while understanding what is available, what LC users want to use, and whether it satisfies LC security. Perform other duties as assigned. Additional job responsibilities, at the SES.2 level Implement automation tools to help with deploying, troubleshooting, and maintaining cluster environments within container orchestration environments. Design, implement and manage build and release pipelines. Extend Kubernetes to help simplify researcher’s usage and operations. Provide solutions to moderately complex problems involving largely identifiable factors.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level