Principal Dev Ops Engineer

Iridium Satellite•Tempe, AZ

1d•Hybrid

About The Position

Iridium is seeking a highly skilled Principal DevOps Engineer to lead the strategy, design, and evolution of DevOps practices supporting their cloud-native Open RAN and 4G/5G Core network. This role involves setting the technical direction for CI/CD, infrastructure-as-code, automation, and observability frameworks to ensure reliable, scalable operations across Core, RAN, Transport, and Cloud domains. The engineer will define and implement greenfield CI/CD pipelines, establish standardized automation and monitoring approaches, and create advanced telemetry, alerting, and automated remediation capabilities. Collaboration with NOC Operations, Engineering, Cloud, Development, and Test teams is key to driving operational excellence, reducing Mean Time to Repair (MTTR), and minimizing alert fatigue. As a technical leader, this individual will provide governance, best practices, and hands-on expertise to global teams. The ideal candidate possesses deep experience in cloud-native architectures, Kubernetes, CI/CD, telemetry pipelines, and infrastructure-as-code, with familiarity in telecom network environments and Agile practices.

Requirements

Bachelor’s degree in Engineering, Computer Science, Telecommunications, or related field.
10+ years of experience in DevOps, Site Reliability Engineering, or network automation roles supporting cloud-native environments.
Strong proficiency with CI/CD pipeline management, Infrastructure-as-Code frameworks, and containerized deployments.
Hands-on experience with Kubernetes (EKS and on-prem K8s) and Docker-based cloud-native network functions (CNFs).
Proficiency with AWS cloud services.
Advanced Python scripting skills, with additional experience in Bash or Go.
Experience building Grafana dashboards, alerting logic, and observability workflows.
Familiarity with Kafka-based event streaming architectures.
Strong Linux system administration skills.
Strong understanding of telecom architecture, including 4G EPC, 5G Core, IMS, Open RAN.
Experience integrating and operationalizing probe-based observability solutions (e.g., Viavi).
Deep understanding of monitoring concepts, including metrics, logs, traces, and APM.
Excellent communication skills, with the ability to convey products, deliverables, analyses, and/or issues clearly and confidently, and recognize and adapt to different communication techniques.
Ability to analyze a situation or problem, generate effective solutions, and see those solutions through to completion.
Must possess the creativity and resourcefulness needed to make reliable decisions and determine methods on new assignments.
Ability to thrive in a dynamic environment by handling multiple tasks and managing shifting priorities.
Proactive in sharing knowledge with others.

Nice To Haves

Experience supporting Mavenir 4G/5G Core in production.
Knowledge of SIP, Diameter, GTP, HTTP/2, PFCP protocols.
Experience with Prometheus, ELK stack, or OpenTelemetry.
CI/CD experience (GitLab, Jenkins, ArgoCD).
Kubernetes certification (CKA/CKAD).
AWS certifications.
Experience building closed-loop automation for telecom NOCs.

Responsibilities

Lead the design and implementation of CI/CD pipelines supporting cloud-native and G-RAN deployments.
Manage Kubernetes environments (EKS and on-prem) by monitoring CNF health, automating scaling policies, and optimizing resource allocation.
Implement Infrastructure-as-Code solutions using Terraform and Ansible to deploy and maintain monitoring and observability stacks.
Integrate observability platforms and tools into operational workflows to strengthen visibility and diagnostic capabilities.
Design and enhance observability frameworks using Grafana dashboards, alert correlation, health checks, Core CDR dashboards, Viavi probe integrations, and SolarWinds telemetry feeds.
Build unified dashboards for national-level visibility and real-time health insights.
Optimize alarm thresholds and event correlation to reduce false positives and alert storms.
Implement structured logging, metrics, and distributed tracing for cloud-native network functions.
Develop automation using Python, Bash, or Go to auto-triage common alarms, perform health validations, and trigger corrective actions.
Build event-driven automation using Kafka feeds from Mavenir and Gatehouse OSS systems.
Implement automated remediation for common failure scenarios to reduce manual NOC intervention.
Reduce manual NOC intervention through closed-loop automation.
Implement Infrastructure as Code (Terraform/Ansible) for monitoring stack deployments.
Integrate observability tools into DevSecOps workflows.
Support Major Incident Management by providing telemetry insights, automated diagnostics, and post-incident analyses.
Perform post-incident analysis using logs, traces, and performance metrics.
Drive improvements that reduce MTTD and MTTR.
Partner with Core, RAN, Transport, and Cloud engineering teams to prevent recurring issues through root-cause analysis.
Mentor junior DevOps and NOC engineers in automation, observability, and DevOps best practices.
Develop reusable automation frameworks and operational standards.
Document playbooks, reference architectures, and best-practice patterns to mature operations from reactive to predictive.
Participate in on-call rotations for automation platform support.
Support major incidents requiring automation troubleshooting.

Benefits

Award-winning and innovative workplace.
Opportunity to make a difference in the world.
Empowering and inclusive culture.
Challenging work with opportunities to collaborate on new, bold ideas and solutions.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume