Senior Data Engineer - Dallas/California, United States

Photon Career Site

31d

About The Position

As part of the Mail Analytics Data Engineering team, you will be working on large-scale batch pipelines, data serving, data lakehouse, and analytics systems, enabling mission critical decision making, downstream, AI-powered capabilities, and more. If you're passionate about building data infrastructure and platforms that power modern Data- and AI-driven business at scale, we want to hear from you!

Requirements

BS in Computer Science/Engineering, relevant technical field, or equivalent practical experience, with specialization in Data Engineering
8+ years of experience building scalable ETL pipelines on industry standard ETL orchestration tools (Airflow, Composer, Oozie) with deep expertise in SQL, PySpark, or scala.
3+ years leading data engineering development directly with business or data science partners
Built, scaled, and maintained Multi-Terabyte data sets and having an expansive toolbox for debugging and unblocking large scale analytics challenges (skew mitigation, sampling strategies, accumulation patterns, data sketches, etc.)
Experience with at least one major cloud's suite of offerings (AWS, GCP, Azure).
Developed or enhanced ETL orchestrations tools or frameworks
Worked within standard GitOps workflow (branch and merge, PRs, CI / CD systems)
Experience working with GDPR
Self-driven, challenge-loving, detail oriented, teamwork spirit, excellent communication skills, ability to multitask and manage expectations

Nice To Haves

MS/PhD in Computer Science/Engineering or relevant technical field, with specialization in Data Engineering
3 years experience in Google Cloud Platform technologies (BiqQuery, Dataproc, Dataflow, Composer, Looker)

Responsibilities

Partner with Data Science, Product, and Engineering to collect requirements to define the data ontology for Mail Data & Analytics
Lead and mentor junior Data Engineers to support Yahoo Mail’s ever-evolving data needs
Design, build, and maintain efficient and reliable batch data pipelines to populate core data sets
Develop scalable frameworks and tooling to automate analytics workflows and streamline users interactions with data products
Establish and promote standard methodologies for data operations and lifecycle management
Develop new or improve and maintain existing large-scale data infrastructures and systems for data processing or serving, optimizing complex code through advanced algorithmic concepts and in-depth understanding of underlying data system stacks
Create and contribute to frameworks that improve the efficacy of the management and deployment of data platforms and systems, while working with data infrastructure to triage and resolve issues
Prototype new metrics or data systems
Define and manage Service Level Agreements for all data sets in allocated areas of ownership
Develop complex queries, very large volume data pipelines, and analytics applications to solve analytics and data engineering problems
Collaborate with engineers, data scientists, and product managers to understand business problems, technical requirements to deliver data solutions
Engineering consulting on large and complex data lakehouse data