Principal SRE Engineer

Movius Corp.Herndon, VA
55d

About The Position

At Movius, we solve a critical gap companies face with employee-to-client communication over voice and messaging. We are the leading global provider of Secure Communication as a Service (SCaaS). Our flagship solution, MultiLine, enhances workflows, resolves compliance gaps and unifies cross-channel messaging. Movius AI-powered solutions enable businesses to build strong and lasting relationships with their customers in a company-owned, controllable system. Welcome to Phone 3.0. Headquartered in Alpharetta, GA, with offices in Silicon Valley, Bangalore, India, New York, and London, Movius partners with leading global wireless carriers like T-Mobile, Vodafone, TELUS, BT, Singtel & more. To learn more about Movius, visit www.movius.ai. Principal SRE

Requirements

  • Bachelor's or Master's degree in Computer Science, IT, or equivalent experience.
  • 15+ years in DevOps, Infrastructure, or SRE roles.
  • 4+ years in a senior/principal-level capacity driving SRE strategy and automation.
  • Proven success designing and scaling large distributed, cloud-native platforms.
  • Deep knowledge of AWS (EKS, EC2, RDS, IAM, VPC, Kafka, CloudWatch, API Gateway, Lambda, WAF, KMS).
  • HelmChart mastery and container orchestration (EKS).
  • Hands-on experience with Elastic APM and observability tools.
  • Expert in Terraform, Jenkins, Bitbucket, and Python/Bash/Go scripting.
  • Strong grasp of SLO/SLI frameworks, error budgets, and AIOps.
  • Experience in chaos engineering, performance optimization, and resilience testing.
  • Excellent documentation and system design communication skills.

Nice To Haves

  • Telecom domain experience is a plus.
  • AWS Certified Solutions Architect - Professional or DevOps Engineer - Professional.
  • Certified Kubernetes Administrator (CKA) or Application Developer (CKAD).
  • SRE Foundation, Google SRE, Dynatrace Performance Professional, or Elastic Certified Engineer.

Responsibilities

  • Maintain architecture blueprints, playbooks, and templates for SLOs, postmortems, and change management.
  • Lead the end-to-end SRE architecture and define technical, reliability, and automation standards.
  • Drive the SRE roadmap aligned with business SLAs, platform goals, and cloud strategy (AWS preferred).
  • Serve as reliability authority in design reviews and architecture boards.
  • Architect and manage full-stack observability (Elastic Stack, OpenTelemetry, Prometheus) with integrated traces, metrics, and logs.
  • Define and automate SLO/SLI tracking, error budgets, and incident management lifecycles.
  • Build event-driven, self-healing systems and automate infrastructure, deployments, and monitoring.
  • Optimize distributed systems, Kubernetes workloads, and microservices for scale, performance, and cost.
  • Lead chaos engineering, root cause analysis, and continuous improvement to reduce MTTR.
  • Mentor engineers in reliability, automation, and architecture best practices; champion an automation-first culture.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service