Founding Data Engineer

ElicitOakland, CA
95d$185,000 - $270,000

About The Position

Elicit is an AI research assistant that uses language models to help professional researchers and high-stakes decision makers break down hard questions, gather evidence from scientific/academic sources, and reason through uncertainty. We are looking for a data engineer to build a complete corpus of academic papers and clinical trials, and to integrate various document types and sources into our system. The role involves architecting and implementing robust, scalable solutions to handle our growing data needs while maintaining high performance and data quality.

Requirements

  • 5+ years of experience as a data engineer.
  • Strong proficiency in Python (5+ years experience).
  • Experience creating and owning a data platform at rapidly-growing startups.
  • Experience architecting and optimizing large data pipelines, ideally with Spark.
  • Strong SQL skills, including aggregation functions, window functions, UDFs, self-joins, partitioning, and clustering.
  • Experience with columnar data storage formats like Parquet.
  • Strong opinions, weakly-held about data quality management.
  • Creative and user-centric problem-solving skills.

Nice To Haves

  • Experience in developing deduplication processes for large datasets.
  • Hands-on experience with full-text extraction and processing from various document formats.
  • Familiarity with machine learning concepts and their application in search technologies.
  • Experience with distributed computing frameworks beyond Spark.
  • Experience in science and academia, familiar with academic publications.
  • Hands-on experience with industry standard tools like Airflow, DBT, or Hadoop.
  • Experience with data lake, data warehouse, or lakehouse paradigms.

Responsibilities

  • Build and optimize the academic research paper pipeline.
  • Architect and implement robust, scalable systems for data ingestion.
  • Efficiently deduplicate hundreds of millions of research papers and calculate embeddings.
  • Expand the datasets Elicit works over, including court documents and SEC filings.
  • Integrate private data for larger customers into Elicit.
  • Preprocess data to make it useful for ML models.
  • Collaborate with ML engineers to gather and apply datasets for model fine-tuning.

Benefits

  • Flexible work environment: work from the office in Oakland or remotely.
  • Fully covered health, dental, vision, and life insurance for you and generous coverage for family.
  • Flexible vacation policy with a minimum recommendation of 20 days/year + company holidays.
  • 401K with a 6% employer match.
  • A new Mac + $1,000 budget to set up your workstation or home office in your first year.
  • $1,000 quarterly AI Experimentation & Learning budget.
  • A team administrative assistant to help with personal and work tasks.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service