Cloud Reliability & Support Engineer

PeratonChantilly, VA
15d$135,000 - $216,000

About The Position

Peraton is seeking a Cloud Reliability & Support Engineer in our Chantilly, VA office in support of our Department of Defense (DoD) customer as part of a highly talented, highly motivated and high-performing team. As the program’s expert in Level 3 Anomaly Resolution and operational excellence, your deep expertise in RHEL and RHOSP internals is used to conduct deep, in-project troubleshooting, ensuring tenant applications fully utilize the cloud’s resiliency features. Your focus is on stability by identifying root causes of system anomalies within the tenant's provisioned environment. Join us and be part of the next generation of innovators as we blaze a trail forward for our profession and company. What you'll do: Anomaly Resolution & Deep Troubleshooting Serve as the primary technical resource for complex, escalated incidents that are contained within the tenant's RHOSP project/resources. RHEL/OS Deep Dive: Expertly troubleshoot issues on tenant RHEL instances, including kernel panics, package conflicts, file system errors, and performance degradation (CPU, memory, I/O). RHOSP Resource Triage: Diagnose issues related to the tenant's consumption of OpenStack services (e.g., Nova instance failures, Neutron port issues, Cinder volume attachment problems). Utilize monitoring tools to perform deep-dive analysis and isolate the root cause of service disruptions within the OpenStack data plane. Root Cause Analysis (RCA): Own the technical execution and documentation of RCAs, focusing on issues rooted in RHEL/RHOSP misconfiguration or resource limitations. Maintain partnership with Red Hat vendor to stay up to date with the latest advancements in Red Hat products and industry best practices to maintain effective and innovative infrastructures

Requirements

  • This position requires the candidate to possess a minimum of Top-Secret clearance with the ability to obtain TS/SCI. The candidate must maintain the clearance.
  • Associates degree and 10+ years of experience in a systems engineering related field; OR bachelor’s degree in computer science, computer engineering, or related field and 8+ years of experience in a systems engineering related field; or a master's degree in computer science, cloud computing, or related field and 6+ years of experience in a systems engineering related field. Additional four (4) years of relevant experience will be considered in lieu of a degree
  • Meet DoD 8140 foundational requirements for a System Developer with a proficiency of advanced.
  • 4+ years of hands-on experience in a cloud operations, system reliability engineer (SRE), or highly technical Level-3 support role within a Linux/Private Cloud environment.
  • Deep-level expertise with RHEL/CentOS administration, networking, and system diagnostics.
  • Strong understanding of Red Hat OpenStack service interaction (Nova, Neutron).
  • Proficiency with key observability tools and log analysis on Linux systems (e.g., systemd-journald, specialized OpenStack logs).
  • Expert skill in diagnosing resource contention and failure patterns in distributed systems on a Linux operating system.
  • Proficiency in Linux systems administration, cloud computing, and virtualization, with a strong understanding of both public and private cloud environments.
  • Strong communication and organizational skills in coordination with customers / tenants

Nice To Haves

  • Certifications: Red Hat Certified Engineer (RHCE) or equivalent is highly preferred.
  • You have strong skills in scripting languages such as Python (specifically for OpenStack SDK interaction).
  • Hands-on experience with container technologies (Docker, Kubernetes) and demonstrable experience with OpenShift Container Platform
  • A solid grasp of enterprise networking, firewalls, and security best practices.
  • Strong analytical and conceptual thinking skills to troubleshoot complex issues and optimize performance.
  • Ability to learn independently, adapt to an evolving environment, and stay current with industry trends.

Responsibilities

  • Anomaly Resolution & Deep Troubleshooting Serve as the primary technical resource for complex, escalated incidents that are contained within the tenant's RHOSP project/resources.
  • RHEL/OS Deep Dive: Expertly troubleshoot issues on tenant RHEL instances, including kernel panics, package conflicts, file system errors, and performance degradation (CPU, memory, I/O).
  • RHOSP Resource Triage: Diagnose issues related to the tenant's consumption of OpenStack services (e.g., Nova instance failures, Neutron port issues, Cinder volume attachment problems). Utilize monitoring tools to perform deep-dive analysis and isolate the root cause of service disruptions within the OpenStack data plane.
  • Root Cause Analysis (RCA): Own the technical execution and documentation of RCAs, focusing on issues rooted in RHEL/RHOSP misconfiguration or resource limitations.
  • Maintain partnership with Red Hat vendor to stay up to date with the latest advancements in Red Hat products and industry best practices to maintain effective and innovative infrastructures

Benefits

  • Peraton offers enhanced benefits to employees working on this critical National Security program, which include heavily subsidized employee benefits coverage for you and your dependents, 25 days of PTO accrued annually up to a generous PTO cap and eligible to participate in an attractive bonus plan

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

Associate degree

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service