Data Engineer II

ChainalysisNew York, NY
1d

About The Position

As a Data Engineer, you'll be at the heart of our infrastructure, designing and building scalable ETL pipelines and analytics databases that power both internal operations and external customer solutions. You'll collaborate with product managers, data scientists, and UX teams to understand requirements and translate them into robust, cloud-deployed systems. This role offers the opportunity to work on large-scale distributed systems while maintaining code quality, implementing observability, and contributing to a culture of continuous improvement. In this role, you'll: Scale systems and redesign existing infrastructure to handle billions of daily requests and process tens of terabytes of data Build ETL data pipelines using Databricks, EMR, Athena, and Glue to enable internal and external customer insights Develop frameworks for data quality testing and continuous quality assessment of data vendors Collaborate with product managers, data scientists, and UX teams to understand requirements and expose metrics effectively Write solid, maintainable code in a 100% cloud-deployed infrastructure with self-healing capabilities Implement observability and monitoring solutions to track system health and performance Participate in design and code review processes while supporting teammates through on-call responsibilities Build integrations with various data vendors and develop frameworks to streamline future integrations

Requirements

  • Bachelor's degree in Computer Science, Mathematics, Engineering, Information Management, Information Technology, or related field (or 5+ years equivalent experience); Master's degree acceptable with 3+ years experience
  • 4+ years developing scalable, large-scale data processing and ETL pipelines
  • 3+ years building data pipelines using EMR, Airflow, Athena, Redshift, PostgreSQL, Snowflake, Kinesis, Lambda, or Databricks
  • 4+ years building software using Python or SQL
  • 3+ years implementing observability and monitoring tools (Humio, Datadog, Amazon CloudWatch, AWS CloudTrail)
  • Strong communication skills with both technical and non-technical stakeholders
  • Experience with product-critical issues and ability to support teams during off-hours

Nice To Haves

  • Experience with self-healing infrastructure and auto-scaling systems
  • Familiarity with blockchain or cryptocurrency data platforms
  • Previous exposure to multiple data platforms and vendor ecosystems

Responsibilities

  • Scale systems and redesign existing infrastructure to handle billions of daily requests and process tens of terabytes of data
  • Build ETL data pipelines using Databricks, EMR, Athena, and Glue to enable internal and external customer insights
  • Develop frameworks for data quality testing and continuous quality assessment of data vendors
  • Collaborate with product managers, data scientists, and UX teams to understand requirements and expose metrics effectively
  • Write solid, maintainable code in a 100% cloud-deployed infrastructure with self-healing capabilities
  • Implement observability and monitoring solutions to track system health and performance
  • Participate in design and code review processes while supporting teammates through on-call responsibilities
  • Build integrations with various data vendors and develop frameworks to streamline future integrations
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service