Data Ingestion SRE, Data Platform -USDS

Tiktok•Seattle, WA

40d•Hybrid

About The Position

About the Team: The Data Ingestion team builds and maintains the end-to-end applog pipeline that collects, processes, and routes large volumes of data reliably and efficiently across the organization. We focus on real-time processing, stream computing for ETL, and event tracking while ensuring high availability and reliability of our services. As a Site Reliability Engineer in the Data Ingestion team, you will have the opportunity to maintain, optimize and grow one of the largest data platforms in the world. You'll have the opportunity to gain hands-on experience on core systems in the data platform ecosystem. Your work will have a direct and huge impact on the company's core products as well as hundreds of millions of users. In order to enhance collaboration and cross-functional partnerships, among other things, at this time, our organization follows a hybrid work schedule that requires employees to work in the office 3 days a week, or as directed by their manager/department. We regularly review our hybrid work model, and the specific requirements may change at any time.

Requirements

Bachelor's degree in Computer Science, a related technical field involving software or systems engineering, or equivalent practical experience.
Experience writing code in Java, Scala, Go, Python, or a similar language.
Knowledge of Linux/Unix systems and familiarity with system internals, networking, and resource management (memory, CPU, storage).
Familiarity with Continuous Integration/Continuous Deployment pipelines and tools (e.g., Jenkins, GitLab CI).

Nice To Haves

Understanding of cloud-native technologies, networking, and storage management to support high-availability and large-scale environments.
Experience with distributed processing frameworks such as Spark, Kafka, Flink, or similar technologies is highly desirable.
Experience in developing tools and APIs that automate system and application processes using diverse coding and scripting standards.
Strong understanding of traditional relational databases like MySQL or PostgreSQL. Ability to write queries, perform joins, use aggregate functions, and optimize basic SQL queries.

Responsibilities

Maintain highly reliable, fault-tolerant, and scalable systems that are both cost-effective and efficient, ensuring data, services, and infrastructure meet business needs.
Design and implement robust, scalable, and extensible big data systems that support the core business and products, ensuring seamless data flow and system integration.
Contribute to and enhance every stage of the service life cycle - from design and development through deployment, operation and ongoing optimization.
Be responsible for production stability and participate in on-call rotations for production incidents, ensuring critical issues are addressed swiftly.
Develop and maintain clear runbooks, Standard Operating Procedures (SOPs), and practice sustainable incident response and blameless postmortems to drive continuous improvement.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume