Site Reliability Engineer - Journeyman

Obsidian Solutions GroupQuantico, VA
231dHybrid

About The Position

We are seeking a Site Reliability Engineer - Journeyman to support the performance, reliability, and security of enterprise infrastructure in a mission-critical environment. This role focuses on automation, monitoring, and incident response across production and support tiers. The ideal candidate will bring strong scripting skills, cloud experience, and a proactive approach to system optimization and resilience.

Requirements

  • Bachelor's degree in Computer Science, Information Systems, or a related technical field
  • 4-6 years of experience in systems or network engineering, DevOps, or site reliability engineering
  • Experience managing and maintaining enterprise infrastructure performance, reliability, and security
  • Proficiency with CI/CD tools and automation frameworks (e.g., Jenkins, GitLab CI, Ansible, Terraform)
  • Strong scripting skills in languages such as Python, Bash, or PowerShell
  • Experience with monitoring, alerting, and log analysis tools (e.g., Splunk, Prometheus, Grafana)
  • Familiarity with cloud and virtualization technologies (e.g., AWS, OCI, VMware)
  • Understanding of disaster recovery, backup/restore, and incident response procedures
  • Active DoD Secret Clearance or the ability to obtain one

Nice To Haves

  • Experience supporting secure federal or defense-related IT environments
  • Familiarity with Oracle Exadata, ZFS, and InfiniBand networking
  • Experience with performance tuning and root cause analysis (RCA) in mission-critical systems
  • Knowledge of IAVA compliance, STIGs, and DoD cybersecurity standards
  • Exposure to Agile/SAFe environments and participation in sprint planning and retrospectives
  • Experience supporting interface monitoring and data integrity across integrated systems

Responsibilities

  • Monitor and maintain infrastructure performance and availability across production and support environments
  • Implement and manage CI/CD pipelines and automation for system monitoring, patching, and recovery
  • Troubleshoot service disruptions, perform root cause analysis, and implement corrective actions
  • Collaborate with operations, engineering, and cybersecurity teams to ensure secure and scalable system operations
  • Support capacity planning, performance tuning, and system optimization initiatives
  • Document system events, changes, and performance metrics in accordance with operational standards
  • Support interface monitoring and data integrity across integrated systems
  • Maintain audit readiness and compliance with operational standards

Benefits

  • Competitive compensation package
  • Exceptional benefits that protect the well-being of employees and their families
  • Family atmosphere with a commitment to operational excellence

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Industry

Administrative and Support Services

Education Level

Bachelor's degree

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service