About The Position

Peraton is seeking a highly skilled Senior Cloud Engineer to support the daily operations and long-term reliability of our cloud-based infrastructure. This role is critical for ensuring uptime, performing proactive maintenance, troubleshooting issues and implementing fixes across our cloud environments. You will work closely with development, operations and security teams to ensure the scalability, performance and security of cloud applications. The ideal candidate will be responsible for maintaining cloud-based applications and infrastructure on AWS.

Requirements

  • BA/BS and 8 years of experience or 12 years of experience and a HS Diploma.
  • 5+ years of experience in cloud support, infrastructure maintenance or IT operations.
  • Experience with Infrastructure as Code (Terraform, CloudFormation)
  • Strong proficiency in AWS Lambda (writing, deploying and, optimizing)
  • Hands-on experience with CI/CD tools (GibHub, GitLab, EKS, Kubernettes, DevOps)
  • Scripting skills for automation and maintenance tasks (Bash, Python)
  • Cloud certifications (AWS DevOps Engineer, Solutions Architect Associate)
  • Strong written and verbal communication skills for technical and non-technical stakeholders
  • Excellent analytical and problem-solving skills
  • Must be a US Citizen.
  • Must be able to obtain and maintain the required agency clearance

Nice To Haves

  • Ability to diagnose performance issues in cloud environments
  • Pre-check and post-check scripts for validating system health
  • Familiarity with container orchestration (Docker, ECS, Kubernetes)
  • Knowledge of ITIL practice or incident management frameworks

Responsibilities

  • Deploy applications across multiple environments (dev, staging, prod) and ensure consistency and stability
  • Build reusable pipeline templates, jobs and stages for CI/CD consistency across teams
  • Collaborate with developers to containerize and deploy applications using ECS and Lambda
  • Configure GitLab Runners and manage environment-specific variables and secrets
  • Define and deploy readiness and liveness probes for containers running in EKS/ECS
  • Write custom scripts for CloudWatch custom metrics and alarms based on application specific probes
  • Monitor deployments and system health using CloudWatch and other tools
  • Implement rollback strategies and manage version control during deployments
  • Troubleshoot and resolve deployment issues and improve pipeline performance and reliability
  • Proficient with Python, Bash, YAML/JSON, Node.js, Lambda functions
  • Perform daily health checks using AWS CLI or scheduled Lambda scripts to check health and log/report results
  • Set up monitoring thresholds, dashboards, and metrics for application and infrastructure
  • Perform root cause analysis and incident correlation using monitoring and performance analysis tools
  • Maintain a central inventory of all licensed software deployed in AWS environments
  • Maintain accurate documentation on infrastructure and procedures
  • Patch assessment and maintenance of infrastructure software, to include third party software patches
  • Develop a patch testing schedule and rollout plan to include rollback and recovery
  • Create and manage change records. Participate in PI planning/ Agile ceremonies
  • Keep cloud environments compliant with security standards and best practices
  • Orchestrate failover and restoration of ECS/ EKS services, Lambda functions, databases and other infrastructure components
  • Test and document regional failover playbooks and recovery runbooks
  • Ensure compliance with RTO (Recovery Time Objective) and RPO (Recovery Point Objective) requirements
  • Participate in on-call rotations to support 24/7 production systems and respond to incidents as they arise
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service