Data Engineer II

Chainalysis•New York, NY

About The Position

As a Data Engineer, you'll be at the heart of our infrastructure, designing and building scalable ETL pipelines and analytics databases that power both internal operations and external customer solutions. You'll collaborate with product managers, data scientists, and UX teams to understand requirements and translate them into robust, cloud-deployed systems. This role offers the opportunity to work on large-scale distributed systems while maintaining code quality, implementing observability, and contributing to a culture of continuous improvement. In this role, you'll: Scale systems and redesign existing infrastructure to handle billions of daily requests and process tens of terabytes of data Build ETL data pipelines using Databricks, EMR, Athena, and Glue to enable internal and external customer insights Develop frameworks for data quality testing and continuous quality assessment of data vendors Collaborate with product managers, data scientists, and UX teams to understand requirements and expose metrics effectively Write solid, maintainable code in a 100% cloud-deployed infrastructure with self-healing capabilities Implement observability and monitoring solutions to track system health and performance Participate in design and code review processes while supporting teammates through on-call responsibilities Build integrations with various data vendors and develop frameworks to streamline future integrations

Requirements

Bachelor's degree in Computer Science, Mathematics, Engineering, Information Management, Information Technology, or related field (or 5+ years equivalent experience); Master's degree acceptable with 3+ years experience
4+ years developing scalable, large-scale data processing and ETL pipelines
3+ years building data pipelines using EMR, Airflow, Athena, Redshift, PostgreSQL, Snowflake, Kinesis, Lambda, or Databricks
4+ years building software using Python or SQL
3+ years implementing observability and monitoring tools (Humio, Datadog, Amazon CloudWatch, AWS CloudTrail)
Strong communication skills with both technical and non-technical stakeholders
Experience with product-critical issues and ability to support teams during off-hours

Nice To Haves

Experience with self-healing infrastructure and auto-scaling systems
Familiarity with blockchain or cryptocurrency data platforms
Previous exposure to multiple data platforms and vendor ecosystems

Responsibilities

Scale systems and redesign existing infrastructure to handle billions of daily requests and process tens of terabytes of data
Build ETL data pipelines using Databricks, EMR, Athena, and Glue to enable internal and external customer insights
Develop frameworks for data quality testing and continuous quality assessment of data vendors
Collaborate with product managers, data scientists, and UX teams to understand requirements and expose metrics effectively
Write solid, maintainable code in a 100% cloud-deployed infrastructure with self-healing capabilities
Implement observability and monitoring solutions to track system health and performance
Participate in design and code review processes while supporting teammates through on-call responsibilities
Build integrations with various data vendors and develop frameworks to streamline future integrations

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume