Apple-posted 4 months ago
Senior
Austin, TX
5,001-10,000 employees

We are looking for a Senior Site Reliability Engineer (SRE) with strong architectural experience to join JMET SRE Team. This individual will play a key role in designing and scaling reliable, secure, and high-performance infrastructure across our cloud and hybrid environments. You will be responsible for establishing reliability patterns, driving large-scale systems design, and building automation frameworks to support production systems at scale. This is a hands-on leadership role with architectural ownership, strategic influence, and deep technical impact across multiple domains, including application and infrastructure security, incident response engineering, and resilience automation.

  • Architect Scalable Infrastructure: Design, evolve, and review highly reliable, performant, and cost-efficient cloud-native and hybrid infrastructure using IaC, containers, and micro services principles.
  • Support Cryptographic Systems at Scale: Design and operationalize scalable, secure integrations with Hardware Security Modules (HSMs) for sensitive workloads, key management, and cryptographic operations.
  • Drive SRE Best Practices: Define and implement service-level indicators (SLIs), objectives (SLOs), and agreements (SLAs) to guide engineering teams towards reliability and observability goals.
  • Incident Architecture & Prevention: Serve as a technical lead during major incidents. Partner with security and platform teams to conduct deep post-incident reviews, drive systemic improvements, and establish preventive architectural controls.
  • System Design & Tooling: Build and maintain reusable tooling, automation frameworks, and reliability platforms (observability, alerting, chaos testing, auto-scaling, failover).
  • Reliability as Code: Champion resilience engineering via automation pipelines, CI/CD integrations, canary releases, and chaos engineering principles.
  • Multi-Cloud and Hybrid Systems: Design, assess, and guide architecture decisions across AWS, GCP, AliCloud, and on-premises infrastructure. Ensure consistency, interoperability, and regulatory compliance.
  • Security & Compliance: Ensure architectural patterns are aligned with security standards, compliance requirements, and audit readiness.
  • 7+ years of experience in SRE, DevOps, or Infrastructure Engineering roles, with 2+ years in an architectural or principal engineering capacity.
  • Deep expertise in cloud infrastructure (AWS, GCP, or AliCloud) and container orchestration (Kubernetes, EKS).
  • Proven experience with Infrastructure as Code (Terraform, Pulumi, CloudFormation).
  • Strong understanding of distributed systems, networking, and systems design at scale.
  • Proficiency in at least one programming or scripting language (Python, Go, Bash, or similar).
  • Experience designing observability stacks (Prometheus, Grafana, Datadog, OpenTelemetry, ELK, etc.).
  • Solid background in CI/CD tools and modern deployment strategies (ArgoCD, Spinnaker, GitOps).
  • Familiarity with security best practices in cloud and containerized environments.
  • Familiarity with HSMs and crypto operations at scale will be a plus.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service