About The Position

The Global E-commerce SRE team of US Tech Services works with engineering and product teams to build and run large-scale, globally distributed, observable, fault-tolerant systems. As an SRE, you will deliver on production ownership and be responsible for observability and automation across complex, large-scale service mesh architectures. In order to enhance collaboration and cross-functional partnerships, among other things, at this time, our organization follows a hybrid work schedule that requires employees to work in the office 3 days a week, or as directed by their manager/department. We regularly review our hybrid work model, and the specific requirements may change at any time.

Requirements

  • Good understanding of Unix/Linux operating systems internals and networking
  • Experience writing code in Java, Go, Python or a similar language
  • Expertise in designing, analyzing, and troubleshooting large-scale distributed systems (Redis, Elasticsearch, Kafka, Druid, Hadoop, Flink or comparable solutions), relational databases, caching solutions and web service frameworks
  • Experience with algorithms, data structures, complexity analysis and software design
  • Experience developing tools and APIs to reduce manual interaction with systems and applications using a variety of coding and scripting standards
  • Systematic problem-solving approach, coupled with effective communication skills and a sense of drive

Nice To Haves

  • Familiarity with running production grade web services at scale and understanding cloud native technologies and networking
  • Knowledge about a variety of strategies for ingesting, modeling, processing, and persisting data, ETL design, dimensional modeling, and cube design

Responsibilities

  • Own the service level of a critical, revenue generating E-commerce platform as well as all supporting infrastructure and services. This role will focus on service reliability, highly-scalable design and release management in a cloud-native environment.
  • Define service level indicators and data-driven objectives to uphold and improve uptime, latency, and system health of a core TikTok production platform.
  • Collaborate cross team with engineering and product to ensure that key requirements (such as capacity planning and launch reviews) are performed to enable transparent service delivery to customers.
  • Automation geared towards infrastructure-as-code, scalability and service resiliency
  • Implement SRE practices around incident management, post-mortems while being part of on-call rotations.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Industry

Broadcasting and Content Providers

Education Level

No Education Listed

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service