About The Position

As part of the Mail Analytics Data Engineering team, you will be working on large-scale batch pipelines, data serving, data lakehouse, and analytics systems, enabling mission critical decision making, downstream, AI-powered capabilities, and more. If you're passionate about building data infrastructure and platforms that power modern Data- and AI-driven business at scale, we want to hear from you!

Requirements

  • BS in Computer Science/Engineering, relevant technical field, or equivalent practical experience, with specialization in Data Engineering
  • 8+ years of experience building scalable ETL pipelines on industry standard ETL orchestration tools (Airflow, Composer, Oozie) with deep expertise in SQL, PySpark, or scala.
  • 3+ years leading data engineering development directly with business or data science partners
  • Built, scaled, and maintained Multi-Terabyte data sets and having an expansive toolbox for debugging and unblocking large scale analytics challenges (skew mitigation, sampling strategies, accumulation patterns, data sketches, etc.)
  • Experience with at least one major cloud's suite of offerings (AWS, GCP, Azure).
  • Developed or enhanced ETL orchestrations tools or frameworks
  • Worked within standard GitOps workflow (branch and merge, PRs, CI / CD systems)
  • Experience working with GDPR
  • Self-driven, challenge-loving, detail oriented, teamwork spirit, excellent communication skills, ability to multitask and manage expectations

Nice To Haves

  • MS/PhD in Computer Science/Engineering or relevant technical field, with specialization in Data Engineering
  • 3 years experience in Google Cloud Platform technologies (BiqQuery, Dataproc, Dataflow, Composer, Looker)

Responsibilities

  • Partner with Data Science, Product, and Engineering to collect requirements to define the data ontology for Mail Data & Analytics
  • Lead and mentor junior Data Engineers to support Yahoo Mail’s ever-evolving data needs
  • Design, build, and maintain efficient and reliable batch data pipelines to populate core data sets
  • Develop scalable frameworks and tooling to automate analytics workflows and streamline users interactions with data products
  • Establish and promote standard methodologies for data operations and lifecycle management
  • Develop new or improve and maintain existing large-scale data infrastructures and systems for data processing or serving, optimizing complex code through advanced algorithmic concepts and in-depth understanding of underlying data system stacks
  • Create and contribute to frameworks that improve the efficacy of the management and deployment of data platforms and systems, while working with data infrastructure to triage and resolve issues
  • Prototype new metrics or data systems
  • Define and manage Service Level Agreements for all data sets in allocated areas of ownership
  • Develop complex queries, very large volume data pipelines, and analytics applications to solve analytics and data engineering problems
  • Collaborate with engineers, data scientists, and product managers to understand business problems, technical requirements to deliver data solutions
  • Engineering consulting on large and complex data lakehouse data

Benefits

  • Medical, vision, and dental benefits
  • 401k retirement plan
  • variable pay/incentives
  • paid time off
  • paid holidays are available for full time employees
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service