Site Reliability Engineer

Leidos•Minneapolis, MN

About The Position

Come put your Site Reliability Engineer (SRE) skills into action! Leidos has openings for talented SREs to join our team and develop reusable solutions that support our customers in any environment. You will have the opportunity to contribute to the design and implementation of Continuous Integration and Continuous Delivery (CI/CD) pipelines that accelerate the secure delivery of software to production. You will automate the buildout of infrastructure in cloud and on-premises environments to operate Kubernetes clusters and microservices deployments. In this role, you will join dynamic Agile software teams that are singularly focused on providing world-class solutions to our customers in an exciting, collaborative, and inclusive atmosphere. You will be intellectually challenged and provided with a tremendous opportunity for growth in a fast-paced, and fun environment. You’ll learn, master, and improve the Continuous Integration Continuous Delivery (CI/CD) processes and tools we use to develop, test, integrate, and deploy our Cloud-based and on-premises solutions into multiple hosting environments, such as AWS, Azure, VMWare, and others. You’ll learn new technologies and tools and apply what you’ve learned to overcome technological challenges with innovative solutions. You’ll collaborate with other software engineers and SREs to share your knowledge with the team and the organization to make us all better at what we do. You’ll perform technical spikes and develop prototypes to help test product concepts and achieve customer validation.

Requirements

Bachelor’s degree in Computer Science, Computer Engineering, or a related field, with 4+ years of relevant experience
Demonstrated ability to deliver projects or processes spanning multiple technical domains, including experience in a technical lead capacity
Solid understanding of Agile development practices, along with CI/CD methodologies and supporting tools
Strong proficiency with Linux and Windows operating systems, as well as networking fundamentals (e.g., HTTP, HTTPS, SSL/TLS, SMTP, DNS)
Hands-on experience provisioning and managing resources within cloud and IaaS environments (AWS, Azure, Google Cloud Platform, etc.)
Practical experience with infrastructure-as-code and automation tools such as Terraform, Ansible, CloudFormation, Chef, or Puppet
Experience working with container technologies (Docker) and orchestration platforms like Kubernetes, including use of kubectl
Proficiency with version control systems, such as Git
Demonstrated curiosity and initiative in learning new tools, frameworks, and technologies
Ability to work independently with minimal supervision while also collaborating effectively within cross-functional engineering teams
Travel: Travel will be 50% within the US as well as overseas

Nice To Haves

Experience with enterprise event streaming technologies such as Kafka or NATS
Familiarity with monitoring and observability tools like Grafana and Prometheus
Exposure to service mesh and API gateway technologies (e.g., Istio)
Experience with GitOps tools such as Argo CD, Flux CD, or similar platforms
Professional cybersecurity certification (e.g., Security+ or equivalent)
Understanding of Agile development methodologies and practices
Working knowledge of relational database systems such as Oracle, MySQL, PostgreSQL, or SQL Server

Responsibilities

Design, develop, troubleshoot, and maintain mission-critical infrastructure across cloud and on-premises environments using infrastructure-as-code (IaC)
Build and support scalable, highly available, and secure cloud-native architectures, including Kubernetes clusters and microservices deployments
Enable and optimize CI/CD pipelines by applying best practices for automated provisioning, configuration, testing, and deployment
Gather and analyze system and application metrics to support performance tuning, capacity planning, and proactive issue resolution
Partner with development teams to improve system reliability through rigorous testing, release processes, and continuous improvement initiatives
Participate in system design, platform engineering, and technical decision-making to ensure solutions meet functional, performance, and SLA requirements
Collaborate across engineering teams and stakeholders to deliver solutions, resolve technical challenges, and coordinate key deliverables
Develop prototypes, perform technical spikes, and evaluate new tools or approaches to solve complex technical problems
Continuously assess deployed systems and implement improvements to enhance reliability, scalability, and operational efficiency
Mentor team members and contribute to knowledge sharing across the organization