Senior Data Engineer

DigiCert•Lehi, UT

14d

About The Position

We're a leading, global security authority that's disrupting our own category. Our encryption is trusted by the major ecommerce brands, the world's largest companies, the major cloud providers, entire country financial systems, entire internets of things and even down to the little things like surgically embedded pacemakers. We help companies put trust - an abstract idea - to work. That's digital trust for the real world. We’re looking for a Senior Data Engineer who can own modern data platforms end-to-end and help enable AI-powered capabilities across our products. You’ll design and operate reliable, scalable data pipelines on Databricks and collaborate with product and engineering teams to integrate intelligent data-driven solutions. This role is primarily focused on Data Engineering, with opportunities to explore and apply Generative AI and Machine Learning technologies responsibly at scale.

Requirements

5+ years in Data Engineering (or adjacent ML/Data roles) building production-grade data pipelines and platforms.
Strong proficiency in Python, SQL, and PySpark; deep experience with Databricks and cloud data stacks (AWS or equivalent).
Expertise in Delta Lake/S3-class storage, version control (Git), and CI/CD for data services.
Experience building monitoring and dashboards for data or AI services (Grafana or similar).
Exposure to AI/ML applications in production environments, including LLM or retrieval-augmented workflows.

Nice To Haves

Hands-on experience with OpenAI Agent Builder / AgentBricks or comparable AI agent frameworks.
Familiarity with MLflow for experiment and model lifecycle management.
Working knowledge of LangChain, LlamaIndex, and vector databases.
Understanding of LLM observability, evaluations, and feedback loops.
Familiarity with security and governance domains (PKI, identity, data privacy).

Responsibilities

Design, build, and optimize batch and streaming pipelines on Databricks (Spark, Delta Lake) for high-volume, mission-critical data.
Implement robust data modeling, transformation, quality, and metadata practices (expectations, profiling, lineage).
Ensure reliability and performance of data services with CI/CD, orchestration (e.g., Databricks Workflows/Airflow), and infrastructure-as-code.
Build observability (logging, metrics, dashboards, alerting) for data and downstream AI services.
Partner with security, platform, and product teams to strengthen data governance, access control, and cost optimization.
Collaborate with engineers to deliver LLM- and AI-backed features using OpenAI Agent Builder / Agent Bricks (or similar) and OpenAI/Azure OpenAI APIs.
Contribute to retrieval pipelines, vector store integrations, and model evaluation processes.
Participate in prompt design, safety/guardrails, and performance evaluation for applied AI solutions.