Senior DevOps Engineer

Iron Mountain

About The Position

The Senior DevOps Engineer, a key member of the EIT DevOps Team, is responsible for the staging and production infrastructure of Iron Mountain’s Digital Services within the federal sector. This role is pivotal in managing and optimizing staging and production deployment environments across Google Cloud Platform (GCP), Amazon Web Services (AWS), and Microsoft Azure. Core responsibilities include provisioning and maintaining secure, scalable, and robust cloud infrastructure for the InSight DXP Platform. The Senior DevOps Engineer will apply extensive knowledge of cloud services and DevOps best practices to ensure application efficiency, high availability, and performance. Additionally, this role involves creating and maintaining FedRAMP controls and documentation compliance. The Senior DevOps Engineer will execute automation pipelines, upgrade infrastructure, troubleshoot complex issues, and contribute to the ongoing enhancement of deployment processes. Close collaboration with development, operations, and other EIT teams is crucial for delivering seamless and reliable solutions.

Requirements

  • U.S. Citizenship and residency on U.S. soil are required.
  • Must be eligible and willing to submit for U.S. Government security clearances; active clearance is a plus.
  • Minimum 5 years of experience leading and supporting enterprise-level applications in production environments.
  • Proven experience in cloud infrastructure provisioning and management on Google Cloud Platform (GCP), Amazon Web Services (AWS), or Microsoft Azure.
  • Proficiency in scripting languages such as Python, Bash, or PowerShell for automation and systems management.
  • Strong understanding of containerization and orchestration technologies, including Docker, Kubernetes, and Helm.
  • Hands-on experience with cloud object storage services such as AWS S3, Google Cloud Storage, or Azure Blob Storage.
  • Working knowledge of database and persistence technologies, particularly MongoDB and PostgreSQL.
  • Experience supporting and integrating microservices architectures and RESTful APIs.
  • Familiarity with incident and service management systems, such as ServiceNow and Jira.
  • Experience with SAST/DAST security and compliance tooling, such as Prisma Cloud, CrowdStrike, XSOAR, and Burp Suite.
  • Basic understanding of identity and access management (IAM) and SSO technologies, particularly Okta, and application integration practices.
  • Excellent troubleshooting skills, especially in complex, distributed, cloud-based environments.
  • Strong written and verbal communication skills, with the ability to clearly document procedures, incidents, and solutions.
  • Effective at producing support documentation and conducting knowledge transfer or training sessions.
  • Demonstrated ability to work independently with minimal supervision in a fast-paced, collaborative, and globally distributed team.
  • A motivated, proactive mindset with a commitment to delivering high-quality, secure, and reliable systems.

Nice To Haves

  • Experience supporting FedRAMP Authorized platforms is highly desirable.

Responsibilities

  • Deploy, manage, and maintain cloud infrastructure across AWS, Azure, and/or GCP, ensuring compliance for government workloads.
  • Automate infrastructure provisioning using Infrastructure as Code (IaC) tools like Terraform, OpenTofu, or AWS CloudFormation.
  • Collaborate with development teams to streamline CI/CD pipelines using tools such as GitLab and OpenTofu for efficient infrastructure and application delivery.
  • Monitor system performance, participate in capacity planning, and optimize application and infrastructure performance by tuning configurations and identifying bottlenecks.
  • Develop scripts and tools to automate routine operations, including patching, scaling, and monitoring.
  • Design and implement self-healing systems that proactively detect and resolve faults.
  • Manage backup and disaster recovery strategies to ensure data integrity and availability across environments.
  • Perform regular security audits and vulnerability patching, adhering to government compliance requirements (e.g., FedRAMP, NIST).
  • Respond to and resolve infrastructure incidents and outages in real-time, minimizing disruption.
  • Conduct Root Cause Analysis (RCA) for production issues and implement long-term corrective actions.
  • Participate in an on-call rotation, escalating and coordinating responses to high-severity issues.
  • Document incidents, responses, and postmortems to capture lessons learned.
  • Diagnose complex infrastructure and application problems, including database performance issues, latency, and service connectivity challenges.
  • Ensure comprehensive logging and telemetry to support incident response, performance tuning, and auditing.
  • Drive observability improvements by collaborating with Engineering and Platform teams to enhance system reliability and traceability.
  • Lead resolution efforts for application-level incidents, ensuring coordinated response across teams.
  • Oversee application lifecycle management, including version upgrades, security patches, and regional rollouts.
  • Contribute to a shared knowledge base, documenting recurring issues and resolution steps.
  • Support scaling strategies to meet regional demand, ensuring infrastructure resilience and compliance with service-level objectives (SLOs).

Benefits

  • FULL BENEFITS – 1st Day of Employment
  • Competitive Pay with Annual Merit Increases
  • 2 Weeks Paid Vacation + 7 Paid Holidays + Sick Pay
  • 401(k) with company match & Employee Stock Purchase Program
  • Tuition Reimbursement

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Senior

Education Level

No Education Listed

Number of Employees

5,001-10,000 employees

© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service