Senior Site Reliability Engineer

DuolingoPittsburgh, PA
2d

About The Position

Our mission at Duolingo is to develop the best education in the world and make it universally available. It’s a big mission, and that’s where you come in! At Duolingo, you’ll join a team that cares about finding innovative solutions to complex technical problems , running countless experiments (300+ at a time!) with our massive user base to make data-driven decisions, and educating our users and employees alike. You’ll have limitless learning opportunities, mentorship and collaboration with world-class minds, and a variety of projects with large scopes — while doing work that’s both fun and meaningful. Join our life-changing mission to develop education for our half a billion (and growing!) learners around the world. About the role As a Senior Site Reliability Engineer, you will work closely with both product and platform engineering teams to ensure Duolingo’s sophisticated distributed systems and products are built and maintained with extraordinary quality, and operated in measurable and scalable ways.

Requirements

  • 3+ years of experience within site reliability engineering/DevOps of a product with millions of users
  • Experience identifying and solving issues in large-scale distributed systems
  • Experience with Java, Kotlin, Python or Go
  • Proficiency in networking protocols, such as TCP/IP, HTTP, SSL, DNS, etc
  • An understanding of containerization toolsets and container orchestration technologies (Docker, Mesos, Kubernetes, Nomad, etc)

Nice To Haves

  • Experience in improving automation and tooling to reduce service maintenance toil
  • Proven experience driving improvements to incident response processes
  • Experience assessing reliability and troubleshooting issues in MySQL and/or PostgreSQL databases

Responsibilities

  • Collaborate with internal teams to identify sources of instability in distributed systems and drive operational excellence
  • Own core infrastructure (i.e understand, diagnose, and debug these systems in production)
  • Provide system design consulting, develop software platforms/frameworks, and conduct launch reviews and root cause analysis
  • Maintain and document sustainable postmortem/incident response practices
  • Advocate for and implement changes that improve reliability, scalability, and velocity
  • Reduce the burden of toil with iterative development of tooling and automation
  • Collaborate with engineering teams to release new features and become an authority on our services
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service