Senior Site Reliability Engineer (SRE)

CACI International Inc-posted 7 months ago

$113,200 - $237,800/Yr

Full-time • Senior

Washington, DC

5,001-10,000 employees

Resume

Match Score

Upload and Match ResumeTrack Jobs with Teal

The Senior Site Reliability Engineer (SRE) will drive modernization initiatives, focusing on containerizing applications and leading cloud adoption strategies. This role bridges software engineering, platform, and operations, ensuring systems are scalable, reliable, and efficient. The SRE will collaborate with development, infrastructure, and DevOps teams to design, build, and maintain containerized environments, implement cloud-native solutions, and champion reliability best practices across all layers of our technology stack.

Lead tenant technical feasibility engagements, assist with developing application onboarding roadmaps, and evangelizing platform adoption
Collaborate with cross-functional teams to identify performance bottlenecks, troubleshoot complex issues, and optimize system performance
Design and implement monitoring, alerting, and incident response strategies to proactively identify and mitigate potential issues, ensuring uninterrupted service availability
Drive automation initiatives to streamline deployment, configuration management, and infrastructure provisioning processes
Develop and maintain comprehensive documentation for system configurations, processes, and procedures

Possess an active Top Secret U.S. Government security clearance with Polygraph and willingness to work onsite at customer facility
Bachelor’s degree in Computer Science, Information Technology, or a related field
Minimum of 8 years of professional experience in a Site Reliability Engineering role or similar capacity
Deep understanding of containerization and orchestration technologies (e.g., Kubernetes, Docker)
Excellent communication skills, with the ability to collaborate effectively across diverse teams
Strong experience with cloud technologies (e.g., AWS, Azure, GCP) and infrastructure as code (e.g., Terraform, Ansible)
Proficiency in programming and scripting languages (e.g., Python, Go, Bash, CloudFormation) to automate tasks and develop tools
Experience with Representational State Transfer (REST) and microservices
Expertise in implementing and managing monitoring and logging solutions (e.g., Splunk, Prometheus, Grafana, ELK stack)
Familiarity with CI/CD pipeline development and management (e.g., GitLab CI, Azure DevOps, AWS Lambda, Jenkins)
Experience applying industry best practices to ensure system performance, reliability, scalability and security
Expert proficiency in developing automated functional, regression and performance tests and developing automated testing standards for development teams
Experience facilitating change and configuration management processes to drive reliability
Strong problem-solving skills, with the ability to diagnose complex issues and implement effective solutions
Proficiency in managing, leading, and engineering incident and outage response

Experience with identity management, access, and authorization solutions (PKI, LDAP, SSL)
Strong understanding of networking, security protocols, and system architecture
Red Hat Certified Specialist in Containers
Docker Certified Associate
Certified Kubernetes Application Developer
Certified Kubernetes administrator (CKA)
Certified Kubernetes Security Specialist (CKS)

healthcare
wellness
financial
retirement
family support
continuing education
time off benefits

Track Jobs with Teal

Job Search Resources

•

AI Resume Builder

•

Senior Site Reliability Engineer Resume Examples

•

Senior Site Reliability Engineer Cover Letter Examples

Senior Site Reliability Engineer (SRE)

Job Search Resources

Tools

Career Hubs

Guides

Company