CACI International Inc-posted 7 months ago
$113,200 - $237,800/Yr
Full-time • Senior
Washington, DC
5,001-10,000 employees

The Senior Site Reliability Engineer (SRE) will drive modernization initiatives, focusing on containerizing applications and leading cloud adoption strategies. This role bridges software engineering, platform, and operations, ensuring systems are scalable, reliable, and efficient. The SRE will collaborate with development, infrastructure, and DevOps teams to design, build, and maintain containerized environments, implement cloud-native solutions, and champion reliability best practices across all layers of our technology stack.

  • Lead tenant technical feasibility engagements, assist with developing application onboarding roadmaps, and evangelizing platform adoption
  • Collaborate with cross-functional teams to identify performance bottlenecks, troubleshoot complex issues, and optimize system performance
  • Design and implement monitoring, alerting, and incident response strategies to proactively identify and mitigate potential issues, ensuring uninterrupted service availability
  • Drive automation initiatives to streamline deployment, configuration management, and infrastructure provisioning processes
  • Develop and maintain comprehensive documentation for system configurations, processes, and procedures
  • Possess an active Top Secret U.S. Government security clearance with Polygraph and willingness to work onsite at customer facility
  • Bachelor’s degree in Computer Science, Information Technology, or a related field
  • Minimum of 8 years of professional experience in a Site Reliability Engineering role or similar capacity
  • Deep understanding of containerization and orchestration technologies (e.g., Kubernetes, Docker)
  • Excellent communication skills, with the ability to collaborate effectively across diverse teams
  • Strong experience with cloud technologies (e.g., AWS, Azure, GCP) and infrastructure as code (e.g., Terraform, Ansible)
  • Proficiency in programming and scripting languages (e.g., Python, Go, Bash, CloudFormation) to automate tasks and develop tools
  • Experience with Representational State Transfer (REST) and microservices
  • Expertise in implementing and managing monitoring and logging solutions (e.g., Splunk, Prometheus, Grafana, ELK stack)
  • Familiarity with CI/CD pipeline development and management (e.g., GitLab CI, Azure DevOps, AWS Lambda, Jenkins)
  • Experience applying industry best practices to ensure system performance, reliability, scalability and security
  • Expert proficiency in developing automated functional, regression and performance tests and developing automated testing standards for development teams
  • Experience facilitating change and configuration management processes to drive reliability
  • Strong problem-solving skills, with the ability to diagnose complex issues and implement effective solutions
  • Proficiency in managing, leading, and engineering incident and outage response
  • Experience with identity management, access, and authorization solutions (PKI, LDAP, SSL)
  • Strong understanding of networking, security protocols, and system architecture
  • Red Hat Certified Specialist in Containers
  • Docker Certified Associate
  • Certified Kubernetes Application Developer
  • Certified Kubernetes administrator (CKA)
  • Certified Kubernetes Security Specialist (CKS)
  • healthcare
  • wellness
  • financial
  • retirement
  • family support
  • continuing education
  • time off benefits
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service