Software Engineer II, SRE

Etsy, Inc.Brooklyn, NY
28dHybrid

About The Position

Etsy's Services Infrastructure group is looking for a Site Reliability Engineer II to join us in our mission of building and supporting reliable large scale Kubernetes infrastructure. The SRE team owns several aspects of business critical services(search retrieval and ranking) & Machine Learning Models infrastructure(Kubernetes hosted on Google Cloud) that enable engineers to efficiently build and release, as well as support the uptime of critical systems behind etsy.com. You will be playing an instrumental role in crafting the future architecture of how we run our systems in the cloud while being part of a dynamic international team. You'll get exposure to a variety of technologies ranging from Kubernetes, Golang, LLMs, Model Serving, Search Retrieval & Ranking and more as you build systems to support the services that support our 86M active buyers and 5.5M sellers! As the Software Engineer II, SRE you will drive the adoption of containers and Kubernetes, improve reliability, automating the operations and providing a self-service runtime platform to accelerate Etsy's product & ML engineering, and contribute to the design and implementation of Observability & CI/CD on top of Kubernetes. Do you find joy in improving developer velocity and have the itch to work on complex large-scale distributed systems? If so, this could be the perfect match. This is a full-time position reporting to the Senior Engineering Manager. In addition to salary, you will also be eligible for an equity package, an annual performance bonus, and our competitive benefits that support you and your family as part of your total rewards package at Etsy. For this role, we are considering candidates based in the United States. Candidates living within commutable distance of Etsy's Brooklyn Office Hub or in the San Francisco Bay Area may be the first to be considered. For candidates within commutable distance, Etsy requires in-office attendance once or twice per week depending on your proximity to the office. Etsy offers different work modes to meet the variety of needs and preferences of our team. Learn more details about our work modes and workplace safety policies here. What's this team like at Etsy? This team improves the Developer experience around build, deploy, release and observing services and ML Models transparently on Google Kubernetes Engine. They work on 20+ Kubernetes clusters with hundreds of nodes running services with low latency requirements. This team also standardizes cluster and application security with common admission policies and container vulnerability, as well as establishing standard SLI/O for all services running on Kubernetes. This team works closely with many product and enablement teams across Etsy. This team handles 20+ Kubernetes clusters with hundreds of nodes running services with low latency requirements. Build and support the CI/CD platform (Buildkite) used by more than a few hundred engineers to deploy their workloads to GKE. Maintain and upgrade GKE addons(CertManager, Gatekeeper), ingress controllers (Contour, Envoy), and various telemetry components (kube-prometheus, AlertManager, Karma) and Container Security. Here's a sneak peek into our Roadmap for the next year Support multiple Search, ML & Gen AI teams to efficiently utilise GPUs across different zones and regions. Evaluate Build vs Buy decisions within LLM space. Enable service mesh across GKE and enable a native way of accessing services across the stack. Standardizing cluster and application security and container vulnerability scanning (both during build and run time)

Requirements

  • You have strong software engineering and coding skills and ability to write high performance production quality code. You have 2+ years of experience in systems/infrastructure engineering or SRE or DevOps roles, preferably in a cloud environment.
  • Exposure to container orchestration systems like Kubernetes (traffic ingresses, cluster networking/administration, pod security policies).
  • Experience iterating on multiple projects on a collaborative team, each of which may have taken months or longer to complete.
  • Proficiency in one programming language like PHP, Python, or Go.
  • Hands-on experience with Infrastructure As Code tooling like Terraform and configuration management tooling like Chef/Ansible.
  • Hands on debugging experience with Linux based operating systems.
  • Willing to work with and improve on code you did not originally write.
  • You understand that being an effective software engineer is as much about communicating with people as it is about writing code.

Nice To Haves

  • Working knowledge of ML Operations(MLOps) is nice to have.

Responsibilities

  • Administer GKE clusters and automate operations like provisioning and service observability. Support the partner teams running their workloads on the Kubernetes Platform. .
  • Provide guidance and collaborate with multi-functional engineering teams to streamline and improve the adoption of Kubernetes
  • Build paved paths for wider product engineering with codelabs, documentation, automation and self-service portals to develop, deploy and operate services on GKE.
  • Participate in an on-call rotation and seek opportunities for reducing toil and avoiding technical debt to reduce support and operations load on the team.
  • Of course, this is just a sample of the kinds of work this role will require! You should assume that your role will encompass other tasks, too, and that your job duties and responsibilities may change from time to time at Etsy's discretion, or otherwise applicable with local law

Benefits

  • equity package
  • an annual performance bonus
  • competitive benefits that support you and your family as part of your total rewards package at Etsy

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Industry

General Merchandise Retailers

Education Level

No Education Listed

Number of Employees

1,001-5,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service