Big Data Engineer

AvanSixHopewell, VA

About The Position

AdvanSix is seeking a Big Data Engineer to build and operate our enterprise Unified Data Layer (UDL) - spanning IT and OT - to deliver trustworthy, performant data products that power Finance, Operations, Supply Chain & Logistics, HSE, Commercial, and corporate analytics. You’ll engineer batch/CDC/streaming pipelines, model curated/semantic layers, and harden run-state with testing, CI/CD, security, and observability. You’ll partner closely with the data team and larger IT organization. Mission: Design and deliver scalable, secure data pipelines and data models that safely connect operational systems to analytics, ensure trusted and well-governed data, and enable repeatable delivery of BI, ML, AI, and automation solutions.

Requirements

  • Minimum 5 years' in data engineering building production pipelines at scale (batch/CDC/streaming).
  • Hands-on with Azure data stack: Databricks or Fabric/Synapse, ADF/Pipelines, ADLS/OneLake, Azure SQL/SQL MI, Key Vault.
  • Strong SQL and Python/PySpark; comfort with Spark Structured Streaming and performance tuning.
  • Experience implementing tests/observability (freshness, schema, expectations), and Git-based CI/CD.
  • Familiarity with SAP S/4HANA structures and SAP DataSphere semantic modeling.
  • OT concepts: historians (PHD/PI), OPC UA/MQTT, event/batch frames, ISA-95/99 basics.
  • Understanding of Power BI consumption (semantic models, RLS) and APIs for downstream AI/ML apps/agents.

Nice To Haves

  • Time-series/data-quality tooling (e.g., Great Expectations or equivalent patterns), feature/metric stores.
  • MDM concepts (keys, survivorship), lineage/catalog tooling.
  • TMS/WMS, LIMS, Historian, HSE domain exposure; Lean/Six Sigma mindset; FinOps awareness.

Responsibilities

  • Build ingestion pipelines (batch, CDC, streaming) from S/4HANA/DataSphere, PHD/historian, LIMS, TMS, HSE, and other sources into landing → curated → semantic layers.
  • Implement data contracts, schema/versioning, SCD handling, partitioning, and performance tuning (file formats, clustering, caching).
  • Develop dimensional/semantic models that back certified Power BI datasets and APIs for apps/agents.
  • Integrate OT data via OPC UA/MQTT, broker/DMZ patterns, read-only historian feeds, and event/batch frames—no control-net reads.
  • Collaborate with plant controls on change control, signal quality, and downtime windows.
  • Embed data quality rules, unit/integration tests, and validation checks (freshness, completeness, drift/PSI).
  • Instrument lineage and end-to-end monitoring; build alerting and on-call runbooks to minimize MTTR.
  • Enforce RBAC, secrets management, PII/HSE classifications, and retention aligned to Governance/MDM policies.
  • Automate build/test/deploy with Git-based CI/CD (environments, approvals, blue/green).
  • Track and optimize cost/performance (cluster sizing, autoscaling, cache strategy); contribute to FinOps reviews.
  • Partner with Reporting & BI on semantic model contracts, RLS, and performance SLAs; avoid direct system scraping.
  • Produce “readme” docs, data dictionaries, runbooks, and post-incident reviews; support knowledge transfer with vendors.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service