Apple-posted 4 months ago
$147,400 - $272,100/Yr
Full-time • Senior
Computer and Electronic Product Manufacturing

As a Site Reliability Engineer, you will be a pivotal contributor to the reliability and scalability of the backend services that underpin machine learning models, safeguarding against abuse and fraud by providing and maintaining state of the art cloud-based infrastructure services and automation tools. You will collaborate with engineering and machine learning teams to translate requirements into resilient infrastructure designs, and subsequently deploy those systems employing modern system reliability engineering practices. Whether it involves constructing automation to manage extensive service deployments, implementing observability frameworks to proactively identify and resolve issues, or driving performance improvements across distributed systems you will be at the forefront of operational perfection. You will assume ownership of end-to-end service health, participating in on-call rotations and leading technical incident response when vital. We are seeking a versatile and adaptable professional who thrives in a multifaceted and fast-paced environment. The ideal candidate will demonstrate exceptional ability to manage multiple responsibilities, navigate shifting priorities, and meet stringent deadlines. Success in this position necessitates strong open interpersonal skills, the capacity to engage in constructive dialogue, and a keen analytical mindset.

  • Contribute to the reliability and scalability of backend services supporting machine learning models.
  • Provide and maintain cloud-based infrastructure services and automation tools.
  • Collaborate with engineering and machine learning teams to translate requirements into resilient infrastructure designs.
  • Deploy systems employing modern system reliability engineering practices.
  • Construct automation to manage extensive service deployments.
  • Implement observability frameworks to proactively identify and resolve issues.
  • Drive performance improvements across distributed systems.
  • Assume ownership of end-to-end service health.
  • Participate in on-call rotations and lead technical incident response.
  • 7+ years of experience in Site Reliability Engineering, DevOps or Infrastructure focused role.
  • Hands-on experience with one or more programming languages such as Python, Golang, or Java.
  • Solid grasp of networking concepts.
  • Experience with CI/CD pipelines and best practices.
  • Deep understanding of scaling and deploying applications including monitoring, resilience, maintainability and performance in AWS.
  • Strong problem-solving skills and proactive approach in identifying and addressing potential issues in large scale distributed systems.
  • BS in Computer Science or related field, or equivalent employment.
  • Knowledge on AWS services.
  • Experience building and operating container orchestrating systems like Kubernetes or EKS.
  • Knowledge in Kafka, Ray, Spark and ML Model pipelines.
  • Comprehensive medical and dental coverage.
  • Retirement benefits.
  • Discounted products and free services.
  • Reimbursement for certain educational expenses including tuition.
  • Opportunity to participate in Apple's discretionary employee stock programs.
  • Eligibility for discretionary bonuses or commission payments.
  • Relocation assistance.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service