Data Engineer

Colgate•Piscataway, NJ

7h•Hybrid

About The Position

Colgate-Palmolive Company is a global consumer products company operating in over 200 countries specializing in Oral Care, Personal Care, Home Care, Skin Care, and Pet Nutrition. Our products are trusted in more households than any other brand in the world, making us a household name! Join Colgate-Palmolive, a caring, innovative growth company reimagining a healthier future for people, their pets, and our planet. Guided by our core values—Caring, Inclusive, and Courageous—we foster a culture that inspires our people to achieve common goals. Together, let's build a brighter, healthier future for all. At Colgate-Palmolive, Data Engineers focus on expanding and optimizing our data, data pipeline architecture, data flow, and collection for multi-functional teams. The Modern Data Engineer will be an experienced pipeline builder and data wrangler who enjoys optimizing data systems from the ground up to power the next generation of predictive and generative AI applications. You will support our software developers, data analysts, and data scientists on data initiatives, ensuring efficient data delivery architecture is consistent throughout ongoing projects. You must be self-directed and comfortable supporting the dynamic data needs of multiple teams. You will be excited by the prospect of re-designing our company’s data architecture to support our next generation of products, specifically by building robust infrastructure for Large Language Models (LLMs) and preparing high-quality, "AI-ready" datasets. This team is a group of innovators and technologists that love to learn and collaborate! Work visa sponsorship is not available for this position.

Requirements

Bachelor’s degree required; Graduate degree in Computer Science, Statistics, Informatics, Information Systems, or another quantitative field is a plus.
2+ years of experience in a Data Engineer role.
Strong, hands-on SQL experience, including complex query authoring, performance tuning, and deep familiarity working with a variety of relational databases.
Proven track record of designing, building, and maintaining robust, scalable data pipelines, ETL/ELT workflows, and overarching data architectures.
Strong analytic skills related to working with both structured and unstructured datasets, with a proven ability to process text, image, or document data for AI ingestion.
Familiarity with building processes supporting data transformation, data structures, metadata, dependency, and workload management.

Nice To Haves

Experience with Cloud services such as GCP or AWS.
Experience with modern relational SQL, NoSQL, and Cloud Data Warehouses (e.g., Snowflake, PostgreSQL).
Familiarity with AI/LLM orchestration frameworks (e.g., LangChain, LlamaIndex) and Vector Databases (e.g., Pinecone, Milvus, Weaviate, pgvector).
Experience with Data Flow, Data Pipeline, and workflow management tools such as Cloud Composer or Airflow.
Experience with Visualization tools (Sigma, DOMO, etc)
Experience supporting and working effectively with multi-functional teams in a dynamic environment.
Experience performing root cause analysis on internal and external data and processes to answer specific business questions and drive improvements.

Responsibilities

Build and maintain optimal data pipeline architecture that supports both traditional analytics and advanced AI/ML workloads.
Assemble large, sophisticated data sets that meet functional and non-functional business requirements, ensuring data is cleaned, structured, and enriched with metadata for optimal use in model training and fine-tuning.
Design and engineer data pipelines to support Retrieval-Augmented Generation (RAG) systems, including text chunking, embedding generation, and vector database synchronization.
Build the infrastructure required for efficient extraction, transformation, and loading of data from a wide variety of data sources.
Identify, design, and implement internal process improvements, including automating manual processes, optimizing data delivery, and re-designing infrastructure for greater scalability.
Build data tools for analytics and data scientist team members that assist them in building and optimizing our product into an innovative industry leader.
Successfully harmonize, process, and extract value from large, disconnected datasets to uncover actionable insights.

Benefits

medical
dental
vision
basic life insurance
paid parental leave
disability coverage
participation in the 401(k) retirement plan with company matching contributions subject to eligibility requirements
a minimum of 15 vacation/PTO days
13 paid holidays
Paid sick leave