Principal Data Engineer

Gemini•San Francisco, CA

97d•Hybrid

About The Position

Gemini is a global crypto and Web3 platform founded by Cameron and Tyler Winklevoss in 2014, offering a wide range of simple, reliable, and secure crypto products and services to individuals and institutions in over 70 countries. Our mission is to unlock the next era of financial, creative, and personal freedom by providing trusted access to the decentralized future. We envision a world where crypto reshapes the global financial system, internet, and money to create greater choice, independence, and opportunity for all — bridging traditional finance with the emerging cryptoeconomy in a way that is more open, fair, and secure. As a publicly traded company, Gemini is poised to accelerate this vision with greater scale, reach, and impact. The Department: Data At Gemini, our Data Team is the engine that powers insight, innovation, and trust across the company. We bring together world-class data engineers, platform engineers, machine learning engineers, analytics engineers, and data scientists — all working in harmony to transform raw information into secure, reliable, and actionable intelligence. From building scalable pipelines and platforms, to enabling cutting-edge machine learning, to ensuring governance and cost efficiency, we deliver the foundation for smarter decisions and breakthrough products. We thrive at the intersection of crypto, technology, and finance, and we’re united by a shared mission: to unlock the full potential of Gemini’s data to drive growth, efficiency, and customer impact. The Role: Principal Data Engineer The Data Engineering Team owns the ingestion and transformation of data from production databases, streams, and external data sources into our data warehouse. As a Principal Data Engineer, you will set the technical direction for how data is modeled, processed, and delivered across the organization. You will partner closely with product, analytics, ML, finance, operations, and engineering teams to move, transform, and model data reliably, with observability, resilience, and agility. You’ll lead by example through design excellence, mentoring, and technical leadership, ensuring our data architecture is scalable, governed, and ready for the next generation of analytics and machine learning at Gemini. This is a senior individual contributor role — highly technical, strategic, and cross-functional — where you’ll influence the design of data systems that underpin key decisions and customer-facing products across Gemini. This role is required to be in person twice a week at either our New York City, NY or San Francisco, CA office.

Requirements

10+ years of experience in data engineering (or similar) roles
Strong experience in ETL/ELT pipeline design, implementation, and optimization
Deep expertise in Python and SQL writing production-quality, maintainable, testable code
Experience with large-scale data warehouses (e.g. Databricks, BigQuery, Snowflake)
Solid grounding in software engineering fundamentals, data structures, and systems thinking
Hands-on experience in data modeling (dimensional modeling, normalization, schema design)
Experience building systems with real-time or streaming data (e.g. Kafka, Kinesis, Flink, Spark Streaming), and familiarity with CDC frameworks
Experience with orchestration / workflow frameworks (e.g. Airflow)
Familiarity with data governance, lineage, metadata, cataloging, and data quality practices
Strong cross-functional communication skills; ability to translate between technical and non-technical stakeholders
Proven experience in recruiting, mentoring, leading design discussions, and influencing data-engineering best practices across teams

Nice To Haves

Experience with crypto, financial services, trading, markets, or exchange systems
Experience with blockchain, crypto, Web3 data — e.g. blocks, transactions, contract calls, token transfers, UTXO/account models, on-chain indexing, chain APIs, etc.
Experience with infrastructure as code, containerization, and CI/CD pipelines
Hands-on experience managing and optimizing Databricks on AWS

Responsibilities

Define and drive the long-term vision for data architecture, modeling, and transformation at Gemini
Establish standards for data reliability, observability, and quality across all pipelines and data products using languages and frameworks such as Python, SQL, Spark, Flink, Beam, or equivalents
Partner with Staff and Senior Data Engineers, Platform Engineers, and Analytics Engineers to unify how data is produced, stored, and consumed
Lead large-scale design initiatives that span multiple teams and systems, ensuring maintainability, performance, and security
Partner with data scientists, ML engineers, analysts, and product teams to understand data requirements, define SLAs, and deliver coherent data products that others can self-serve
Establish data quality, validation, observability, and monitoring frameworks (data auditing, alerting, anomaly detection, data lineage)
Investigate and resolve complex production issues: root cause analysis, performance bottlenecks, data integrity, fault tolerance
Mentor and guide more junior and mid-level data engineers: lead code reviews, design reviews, and best-practice evangelism
Help recruit and onboard new talent, shaping the future of Gemini’s data engineering discipline
Stay up to date on new tools, technologies, and patterns in the data and cloud space, bringing proposals and proof-of-concepts when appropriate
Document data flows, data dictionaries, architecture patterns, and operational runbooks