About The Position

This role offers the opportunity to lead a high-performing Site Reliability Engineering (SRE) team focused on government cloud environments, ensuring operational excellence, security, and regulatory compliance. You will oversee the reliability, performance, and availability of cloud services while scaling operations to support stringent FedRAMP and DoD requirements. The position requires close collaboration with engineering, security, and compliance teams to drive automation, incident response, and continuous improvement. You will guide the SRE team in adopting modern cloud-native technologies, establish best practices for monitoring and compliance, and ensure the operational health of Authorization to Operate (ATO) systems. This role combines hands-on technical expertise with strategic leadership, enabling safe, efficient, and reliable government cloud operations.

Requirements

  • 5+ years of experience in SRE, DevOps, or Cloud Engineering, including at least 2 years in a management role.
  • Hands-on experience in regulated audit cycles such as FedRAMP or DoD CC SRG, including managing POA&Ms and contributing to SSPs.
  • Deep understanding of security and compliance controls, ideally aligned with NIST SP 800-53.
  • Expertise in at least one major cloud provider (AWS, Azure, or GCP) and Infrastructure as Code tools such as Terraform.
  • Experience with CI/CD pipelines (e.g., Jenkins, GitLab), version control (Git), and scripting/programming languages like Python, Go, Java, or Bash.
  • Proven ability to manage and scale global on-call operations and lead incident management processes.
  • Bachelor’s degree in Computer Science, Engineering, or equivalent experience.

Nice To Haves

  • experience with Kubernetes, Docker, compliance-as-code solutions, monitoring tools (PagerDuty, Splunk, Prometheus, Datadog), cloud/security certifications, and prior work with public sector clients.

Responsibilities

  • Lead, mentor, and grow a team of Site Reliability Engineers, fostering a culture of accountability, collaboration, and continuous improvement.
  • Oversee service availability, latency, performance, and capacity within FedRAMP and DoD-compliant cloud environments.
  • Manage global on-call rotations, incident response, and blameless post-mortems while ensuring regulatory reporting requirements are met.
  • Drive automation across infrastructure, build/release processes, and operational tasks to reduce manual overhead and improve reliability.
  • Establish and maintain Continuous Monitoring (ConMon) programs, ensuring timely collection and reporting of compliance evidence.
  • Collaborate with Product, Engineering, Security, and Compliance teams to deliver operationally sound, secure, and scalable services.
  • Define and execute the SRE roadmap, including process improvement, tooling, and cloud-native best practices.

Benefits

  • Competitive base salary: $132,000–$175,000 USD (commensurate with experience).
  • Generous paid time off (PTO) and holiday schedule.
  • Parental leave and flexible work arrangements.
  • Progressive healthcare options including medical, dental, and vision coverage.
  • Retirement programs and potential education reimbursement.
  • Opportunity to work on high-impact government cloud projects in a collaborative, inclusive environment.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service