Senior Data Engineer

Procore TechnologiesAustin, TX

About The Position

We are looking for a Senior Data Engineer to join Procore Data team. In this role, you will be responsible for building the data architecture that connects Procore’s global ecosystem. You will work across diverse domains to create a unified, high-fidelity view of our customers, projects, and users. This is a "Data Engineering first" role that leverages AI and Machine Learning to solve complex Entity Resolution challenges. You will use the modern data stack to transform fragmented data from across the enterprise into a cohesive, intelligent data foundation that powers our global strategy.

Requirements

  • Bachelor's degree in Computer Science or a similar technical field of study
  • 4+ years of technical experience in a Data or Software Engineering role.
  • Ability to write complex analytical queries and production-grade Python code.
  • Strong experience with Databricks, Airflow, Spark, AWS, Gitlab.
  • Experience developing lightweight data services using Python frameworks (e.g., FastAPI, Flask) and integrating with external REST APIs.
  • You understand how to handle authentication, rate limiting, and robust error handling.
  • Practical experience using AI techniques (e.g., record linkage, fuzzy matching, or LLM-based classification) to solve data quality and identity problems.

Responsibilities

  • Develop and maintain scalable ETL pipelines using Apache Spark.
  • Implement partitioning strategies and apply performance tuning techniques while utilizing modern open table formats like Delta Lake and Apache Iceberg to manage data consistency
  • Drive engineering excellence through code reviews, mentorship, and the implementation of CI/CD best practices for data.
  • Develop and deploy AI/ML models and probabilistic matching logic to link and deduplicate entities across disparate business domains.
  • Design canonical data models that provide a 360-degree view of the enterprise, ensuring that a "Customer" in Sales matches the "Customer" in our Product and Marketing engines.
  • Implement AI-driven workflows to automatically clean, normalize, and enrich enterprise records, ensuring that our customers are working with the most accurate information possible.
  • Architect complex, modular data transformations, ensuring that the "logic layer" of our data stack is robust, testable, and highly performant.
  • Manage sophisticated, multi-stage workflows in Airflow, integrating Python-based scripts directly into the data lifecycle.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service