Data Site Reliability Engineer
PayPay
·
Posted:
May 1, 2023
·
Remote
About the position
PayPay is seeking an experienced Data SRE to ensure high availability, top-notch performance, and reliability of their data systems and pipelines. The ideal candidate will have expertise in designing, analyzing, and troubleshooting large-scale distributed systems, as well as knowledge of AWS Cloud Native Data Applications and production workloads. They will work with a cross-functional team to develop scalable Big Data solutions and positive user experiences. The role involves managing day-to-day operations of data services, creating new designs and architectures, and ensuring data integrity and quality.
Responsibilities
- Define SLOs, SLIs with respect to key indicators like Data Freshness, Data Quality, etc.
- Design, Support and improve the availability, scalability, stability, reliability, monitoring and alerting & latency of Paypay Data systems
- Manage day-to-day operations of data services, near real-time and batch data pipelines
- Create new designs, architectures, standards and methods for large-scale distributed systems
- Knowledge of ingesting, modelling, processing and ETL designs
- Has, in past demonstrated managing large scale production grade Data Lake, Data Warehouse & ETL systems
- Should be the Point Person for data integrity/quality within data storage systems and perform root cause analysis & triage issues
- Should be involved in Data Storage capacity planning, make forecasts, and ability to tune the systems
- Work with multiple stakeholders across teams to build secure data transfer Qualifications
- Good understanding of DevOps concepts and implementation
- Expertise in designing, analyzing, and troubleshooting large-scale distributed systems like Redis, Elasticsearch, Kafka, Hadoop and MySQL
- Experience in analytical solutions like Looker
- Experience in Data Lakes like Apache Hudi, Data Warehouse like Big Query and RedShift
- Knowledge of Spark, Glue, Python and Scala
- In-depth knowledge and hands-on experience with AWS Cloud Native Data Applications and production workloads
- Knowledge about Microservices
- Knowledge about observability and how to gather data
- System design experience and capacity planning for large distributed systems
- Understanding of Automation tools and implementation
- Terraform/cloud formation experience
- Experience with managing monitoring tools like Cloudwatch, NewRelic, etc. Good understanding of DevOps concepts and implementation
Requirements
- Define SLOs, SLIs with respect to key indicators like Data Freshness, Data Quality, etc.
- Design, Support and improve the availability, scalability, stability, reliability, monitoring and alerting & latency of Paypay Data systems
- Manage day-to-day operations of data services, near real-time and batch data pipelines
- Create new designs, architectures, standards and methods for large-scale distributed systems
- Knowledge of ingesting, modelling, processing and ETL designs
- Has, in past demonstrated managing large scale production grade Data Lake, Data Warehouse & ETL systems
- Should be the Point Person for data integrity/quality within data storage systems and perform root cause analysis & triage issues
- Should be involved in Data Storage capacity planning, make forecasts, and ability to tune the systems
- Work with multiple stakeholders across teams to build secure data transfer Qualifications
- Good understanding of DevOps concepts and implementation
- Expertise in designing, analyzing, and troubleshooting large-scale distributed systems like Redis, Elasticsearch, Kafka, Hadoop and MySQL
- Experience in analytical solutions like Looker
- Experience in Data Lakes like Apache Hudi, Data Warehouse like Big Query and RedShift
- Knowledge of Spark, Glue, Python and Scala
- In-depth knowledge and hands-on experience with AWS Cloud Native Data Applications and production workloads
- Knowledge about Microservices
- Knowledge about observability and how to gather data
- System design experience and capacity planning for large distributed systems
- Understanding of Automation tools and implementation
- Terraform/cloud formation experience
- Experience with managing monitoring tools like Cloudwatch, NewRelic, etc. Good understanding of DevOps concepts and implementation