Site Reliability Engineer (SRE) - FedRAMP

ClarotyNew York, NY
8h$190,000 - $215,000

About The Position

We are seeking a skilled Site Reliability Engineer (SRE) to support and maintain Claroty's FedRAMP-compliant deployment in AWS GovCloud for public sector customers. The SRE will be responsible for ensuring high availability, security, and compliance of cloud-based environments while driving automation, monitoring, and incident response best practices. As a DevOps SRE, your impact will be: AWS GovCloud Operations: Manage and optimize Claroty’s cloud-based infrastructure in AWS GovCloud, ensuring FedRAMP compliance and high availability. Reliability & Performance: Monitor and enhance system performance, scalability, and reliability through observability tools, automation, and best practices. Security & Compliance: Implement and maintain security controls aligned with FedRAMP, NIST 800-53, and other federal cybersecurity standards. Infrastructure as Code (IaC): Develop and manage infrastructure automation using Terraform and Ansible. CI/CD & Automation: Enhance DevSecOps pipelines, automate deployments, and improve system resilience through tools like GitLab CI/CD, Jenkins, and Kubernetes. Incident Response & Monitoring: Implement and manage monitoring solutions (Prometheus, Grafana, ELK Stack), respond to incidents, and conduct post-mortems. Networking & Security: Configure and maintain VPCs, VPNs, security groups, and firewalls in AWS GovCloud, ensuring compliance with FedRAMP requirements. GOV Production Gatekeeper: Manage rollout strategy for new technologies and oversee their execution to ensure minimal disruption to existing systems. GOV Production On-Call: Act as the first line of response for critical incidents, assessing issues, triaging, and coordinating with the team to prevent further problems and swiftly restore services. Monitor Production Performance and Degradation: Monitor system performance metrics closely and detect any degradation early to prevent outages and disruptions. Production Maintenance: Conduct regular infrastructure upgrades to accommodate changes, developments, and advancements in the technological landscape. Manage Release Flow: Oversee the release of updates and new functionalities, ensuring a seamless transition while handling any potential negative impacts on production. Collaboration: Work closely with DevOps, security teams, developers, and federal stakeholders to maintain a compliant and secure cloud environment.

Requirements

  • 6-8+ years of experience in SRE, DevOps, or Cloud Engineering roles.
  • Hands-on experience with AWS GovCloud, including EC2, EKS, MSK, S3, RDS, IAM, CloudTrail, and CloudWatch.
  • Strong expertise in Infrastructure as Code (Terraform, Ansible).
  • Experience with FedRAMP, NIST 800-53, and cloud security best practices.
  • Proficiency in Kubernetes, Docker, and container orchestration.
  • Knowledge of Linux system administration and scripting (Python, Bash).
  • Experience with logging, monitoring, and observability tools in a cloud-native environment.
  • Strong troubleshooting, problem-solving, and automation mindset.
  • U.S. Citizenship (required for working in GovCloud environments).

Responsibilities

  • AWS GovCloud Operations: Manage and optimize Claroty’s cloud-based infrastructure in AWS GovCloud, ensuring FedRAMP compliance and high availability.
  • Reliability & Performance: Monitor and enhance system performance, scalability, and reliability through observability tools, automation, and best practices.
  • Security & Compliance: Implement and maintain security controls aligned with FedRAMP, NIST 800-53, and other federal cybersecurity standards.
  • Infrastructure as Code (IaC): Develop and manage infrastructure automation using Terraform and Ansible.
  • CI/CD & Automation: Enhance DevSecOps pipelines, automate deployments, and improve system resilience through tools like GitLab CI/CD, Jenkins, and Kubernetes.
  • Incident Response & Monitoring: Implement and manage monitoring solutions (Prometheus, Grafana, ELK Stack), respond to incidents, and conduct post-mortems.
  • Networking & Security: Configure and maintain VPCs, VPNs, security groups, and firewalls in AWS GovCloud, ensuring compliance with FedRAMP requirements.
  • GOV Production Gatekeeper: Manage rollout strategy for new technologies and oversee their execution to ensure minimal disruption to existing systems.
  • GOV Production On-Call: Act as the first line of response for critical incidents, assessing issues, triaging, and coordinating with the team to prevent further problems and swiftly restore services.
  • Monitor Production Performance and Degradation: Monitor system performance metrics closely and detect any degradation early to prevent outages and disruptions.
  • Production Maintenance: Conduct regular infrastructure upgrades to accommodate changes, developments, and advancements in the technological landscape.
  • Manage Release Flow: Oversee the release of updates and new functionalities, ensuring a seamless transition while handling any potential negative impacts on production.
  • Collaboration: Work closely with DevOps, security teams, developers, and federal stakeholders to maintain a compliant and secure cloud environment.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service