Principal SRE Engineer

Movius Corp.•Herndon, VA

55d

About The Position

At Movius, we solve a critical gap companies face with employee-to-client communication over voice and messaging. We are the leading global provider of Secure Communication as a Service (SCaaS). Our flagship solution, MultiLine, enhances workflows, resolves compliance gaps and unifies cross-channel messaging. Movius AI-powered solutions enable businesses to build strong and lasting relationships with their customers in a company-owned, controllable system. Welcome to Phone 3.0. Headquartered in Alpharetta, GA, with offices in Silicon Valley, Bangalore, India, New York, and London, Movius partners with leading global wireless carriers like T-Mobile, Vodafone, TELUS, BT, Singtel & more. To learn more about Movius, visit www.movius.ai. Principal SRE

Requirements

Bachelor's or Master's degree in Computer Science, IT, or equivalent experience.
15+ years in DevOps, Infrastructure, or SRE roles.
4+ years in a senior/principal-level capacity driving SRE strategy and automation.
Proven success designing and scaling large distributed, cloud-native platforms.
Deep knowledge of AWS (EKS, EC2, RDS, IAM, VPC, Kafka, CloudWatch, API Gateway, Lambda, WAF, KMS).
HelmChart mastery and container orchestration (EKS).
Hands-on experience with Elastic APM and observability tools.
Expert in Terraform, Jenkins, Bitbucket, and Python/Bash/Go scripting.
Strong grasp of SLO/SLI frameworks, error budgets, and AIOps.
Experience in chaos engineering, performance optimization, and resilience testing.
Excellent documentation and system design communication skills.

Nice To Haves

Telecom domain experience is a plus.
AWS Certified Solutions Architect - Professional or DevOps Engineer - Professional.
Certified Kubernetes Administrator (CKA) or Application Developer (CKAD).
SRE Foundation, Google SRE, Dynatrace Performance Professional, or Elastic Certified Engineer.

Responsibilities

Maintain architecture blueprints, playbooks, and templates for SLOs, postmortems, and change management.
Lead the end-to-end SRE architecture and define technical, reliability, and automation standards.
Drive the SRE roadmap aligned with business SLAs, platform goals, and cloud strategy (AWS preferred).
Serve as reliability authority in design reviews and architecture boards.
Architect and manage full-stack observability (Elastic Stack, OpenTelemetry, Prometheus) with integrated traces, metrics, and logs.
Define and automate SLO/SLI tracking, error budgets, and incident management lifecycles.
Build event-driven, self-healing systems and automate infrastructure, deployments, and monitoring.
Optimize distributed systems, Kubernetes workloads, and microservices for scale, performance, and cost.
Lead chaos engineering, root cause analysis, and continuous improvement to reduce MTTR.
Mentor engineers in reliability, automation, and architecture best practices; champion an automation-first culture.