Staff Cloud Operations Engineer

Simplesense
3hHybrid

About The Position

Simplesense builds, deploys, and sustains the Installation Resilience Platform that enables mission operators to rapidly adapt and respond. The Platform protects critical infrastructure from cyber attack while unlocking previously siloed information to monitor, diagnose, and improve response times to incidents. Our adversaries rapidly adopt the latest technology: we help defense users respond in kind. Simplesense is a non-traditional defense contractor and prime on the Air Force's Installation Resilience Operations Command and Control (IROC) program, which is now expanding to five additional Air Force, Space Force, and Army installations from the one prototype installation, Tyndall Air Force Base. Our team combines over 100 years of direct mission experience solving hard problems with 50 years technical expertise deploying DevSecOps, cybersecurity, and cloud infrastructure, giving us a deep appreciation for our customers’ mission and end users’ priorities. We build for scale, architecting and prioritizing technical work for long term sustainability. Simplesense is looking for a Staff Cloud Ops Engineer to join our hybrid, US-based team, focused on production reliability of cloud-hosted applications and services. The Staff Cloud Ops Engineer is the technical anchor and operational leader for critical cloud services, ensuring reliability, performance, and operational success. This position will drive the technical direction, process improvement, and mentorship for Cloud Operations, orchestrating activities across various operational domains, delivering cohesive and high-quality results. The ideal candidate is an excellent communicator, attentive, and efficient. They can complete work skillfully and independently. The Staff Cloud Ops Engineer must be good at giving and receiving constructive feedback. This position reports to the Director of Operations.

Requirements

  • Experience: 7+ years in managing cloud environments, systems administration, or related fields, with a focus on cloud-native applications and services.
  • Technical Expertise:
  • Proficient in Infrastructure as Code (IaC) tools such as Cloudformation or Terraform.
  • Proficient in configuration tooling such as Ansible.
  • Strong understanding of CI/CD pipelines and tooling development.
  • Experience with implementing and tuning observability stacks, including monitoring, logging, and tracing systems.
  • Experience in IP networking fundamentals.
  • Familiarity using Git command line and other IDE tooling.
  • Proven track record in leading complex troubleshooting efforts and root cause analyses related to critical incidents.
  • Experience in mentoring junior engineers and enhancing the team's operational readiness.
  • Excellent interpersonal and communication skills for cross-team collaboration.
  • Must be able to obtain DoD 8570/8140 IAT Level II certification (e.g., CompTIA Security+ CE) within 6 months of hire
  • Travel requirements: 10% travel for quarterly team planning.
  • Must be a U.S. Citizen and able to obtain a DoD NIPR network account and Common Access Card (CAC).
  • Must have, or be able to obtain, a Secret Clearance.

Nice To Haves

  • Experience in the operational intelligence or industrial technology sectors.
  • AWS Networking experience with Transit Gateways, Managed VPNs and Direct Connect
  • AWS Certifications: Networking Specialty, Security Specialty, or Solution Architect

Responsibilities

  • End-to-End Operational Ownership: Act as the single technical owner responsible for the operational success of critical cloud systems, defining System-Level Objectives (SLOs) and System-Level Indicators (SLIs). Work with other Staff and Principals Engineers to establish operational Infrastructure as Code (IaC) standards and best practices.
  • Cross-Functional Collaboration: Coordinate with development to ensure repeatable and reliable feature deployments into cloud environments using CI/CD pipelines and maintain infrastructure through IaC practices.
  • Ambiguity Navigation: Tackle vague and complex operational challenges by defining technical strategies and leading the team toward holistic, sustainable solutions.
  • Mentorship and Improvement: Elevate the operational maturity of the team through insightful reviews of operational runbooks, CI/CD pipelines, and automation scripts. Mentor operations engineers on troubleshooting, problem solving, and incident response.
  • Operational Execution: Focus on the health of critical systems, conducting root cause analysis (RCA) for major incidents and resolving complex, intermittent issues that span on-prem/cloud boundaries.
  • Active Operational Support: Participate in periodic help desk rotations and Tier 3 / Tier 4 on-call support, troubleshooting and resolving issues, fixing bugs and implementing solutions to enhance system reliability and performance.
  • CI/CD Development: Build automated delivery pipelines and develop internal self-service tools to enhance operational efficiency.
  • Stakeholder Collaboration: Work with product and development teams to define operational requirements and communicate system trade-offs effectively.
  • Demonstrated experience providing technical leadership, mentorship, and guidance to engineers, with the ability to influence team direction, operational practices, and outcomes. Prior or potential experience supporting people leadership responsibilities (such as onboarding, coaching, or performance feedback) is a plus.

Benefits

  • Equity
  • Medical, Life, Short-Term Disability, and AD&D insurance
  • Medical travel coverage
  • Dental coverage
  • Vision coverage
  • 401k matching
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service