Systems Reliability Engineer

LeidosChantilly, VA
1dOnsite

About The Position

GEOAxIS is looking for Systems Reliability Engineer engineer to work with the rest of the operations team to help drive program technical execution, innovation and modernization. The GEOAxIS system provides Identity, Credential and Access Management for all web applications. GEOAxIS enables online, on-demand, access to NGA GEOINT content based on user’s authoritative attributes/roles. Our Mission is to maintain highly available ICAM services for protecting those critical mission applications across all security domains. The GxNext contract was awarded to Leidos in 2021 and runs until 2031.

Requirements

  • BS degree and 4+years of prior relevant experience or Masters with 2+ years of prior relevant experience.
  • Requires a TS/SCI and ability to obtain and maintain a Polygraph post hire
  • Strong communication skills, both verbal and written
  • Ability to quickly learn new software and IT concepts
  • Strong problem solving and decision making skills
  • Self-starter with an ability to work in a team environment and independently
  • Intimately familiar with the COTS products that the program leverages: Oracle Identity and Access Management (IdAM) suite, Apache webgates, and Computer Associates (CA) API Gateway
  • Experience scripting in a Linux environment using Shell and Bash
  • Deep understanding and background in COTS integration and custom code development
  • Experience in at least one of the following languages: Bash Python Java NodeJS
  • Local to DMV (DC/Maryland/Virginia) with ability to be physically present at the team’s work location in Chantilly
  • Strong interpersonal skills and proven track record of leading technical teams, conveying technical solutions to technical and non-technical audiences
  • Candidate must be able to physically be in Chantilly, VA a minimum of 5 days a week to work with the team with occasional meetings in Reston and/or Springfield, VA
  • All candidates must be US CITIZENS to be considered for the position
  • Security+ certification within 60 days of hire

Nice To Haves

  • Kubernetes experience using Rancher RKE2 or Openshift
  • Strong understanding of containers
  • Experience containerizing existing custom software
  • Knowledge of common DevOps tools such as: Ansible ArgoCD Gitlab Nexus3
  • Kubernetes Certifications in any of the following: RHCSA/RHCE AWS Solutions Architect/DevOps Engineer CKA/CKAD
  • Familiarity with modern authentication flows such as SAML, OAuth2 and OIDC

Responsibilities

  • Troubleshoot and resolve system/operational incidents
  • Perform root cause analysis for operational incidents
  • Analyze system performance and take corrective actions as needed
  • Coordinate with mission partners, consumer applications, and other external entities in troubleshooting enterprise incidents and integration problems
  • Design, develop, and implement automated solutions to proactively monitor system health, identify performance bottlenecks, and resolve system issues through automated remediation, reducing manual intervention and improving system reliability.
  • Collect data, identify and analyze trends in Operational Incidents, and provide suggestions to mitigate common issues
  • Work closely with Ops Tech Lead and Development Lead to identify baseline enhancements to improve operational stability
  • Work with deployment and ISP teams to support baseline deployments to operations
  • Willingness to support off-hour calls to assist in troubleshooting when high priority operational incidents occur
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service