Big Data Engineer

AvanSix•Hopewell, VA

23h

About The Position

AdvanSix is seeking a Big Data Engineer to build and operate our enterprise Unified Data Layer (UDL) - spanning IT and OT - to deliver trustworthy, performant data products that power Finance, Operations, Supply Chain & Logistics, HSE, Commercial, and corporate analytics. You’ll engineer batch/CDC/streaming pipelines, model curated/semantic layers, and harden run-state with testing, CI/CD, security, and observability. You’ll partner closely with the data team and larger IT organization. Mission: Design and deliver scalable, secure data pipelines and data models that safely connect operational systems to analytics, ensure trusted and well-governed data, and enable repeatable delivery of BI, ML, AI, and automation solutions.

Requirements

Minimum 5 years' in data engineering building production pipelines at scale (batch/CDC/streaming).
Hands-on with Azure data stack: Databricks or Fabric/Synapse, ADF/Pipelines, ADLS/OneLake, Azure SQL/SQL MI, Key Vault.
Strong SQL and Python/PySpark; comfort with Spark Structured Streaming and performance tuning.
Experience implementing tests/observability (freshness, schema, expectations), and Git-based CI/CD.
Familiarity with SAP S/4HANA structures and SAP DataSphere semantic modeling.
OT concepts: historians (PHD/PI), OPC UA/MQTT, event/batch frames, ISA-95/99 basics.
Understanding of Power BI consumption (semantic models, RLS) and APIs for downstream AI/ML apps/agents.

Nice To Haves

Time-series/data-quality tooling (e.g., Great Expectations or equivalent patterns), feature/metric stores.
MDM concepts (keys, survivorship), lineage/catalog tooling.
TMS/WMS, LIMS, Historian, HSE domain exposure; Lean/Six Sigma mindset; FinOps awareness.

Responsibilities

Build ingestion pipelines (batch, CDC, streaming) from S/4HANA/DataSphere, PHD/historian, LIMS, TMS, HSE, and other sources into landing → curated → semantic layers.
Implement data contracts, schema/versioning, SCD handling, partitioning, and performance tuning (file formats, clustering, caching).
Develop dimensional/semantic models that back certified Power BI datasets and APIs for apps/agents.
Integrate OT data via OPC UA/MQTT, broker/DMZ patterns, read-only historian feeds, and event/batch frames—no control-net reads.
Collaborate with plant controls on change control, signal quality, and downtime windows.
Embed data quality rules, unit/integration tests, and validation checks (freshness, completeness, drift/PSI).
Instrument lineage and end-to-end monitoring; build alerting and on-call runbooks to minimize MTTR.
Enforce RBAC, secrets management, PII/HSE classifications, and retention aligned to Governance/MDM policies.
Automate build/test/deploy with Git-based CI/CD (environments, approvals, blue/green).
Track and optimize cost/performance (cluster sizing, autoscaling, cache strategy); contribute to FinOps reviews.
Partner with Reporting & BI on semantic model contracts, RLS, and performance SLAs; avoid direct system scraping.
Produce “readme” docs, data dictionaries, runbooks, and post-incident reviews; support knowledge transfer with vendors.