Sr. Data Engineer

Mariana Minerals•Houston, TX

3d•$140,000 - $180,000

About The Position

Mariana Minerals is building the critical minerals supply chain from the ground up and is looking for a Senior Data Engineer to help make it autonomous. Mariana is a mining company that builds software, designing, building, commissioning, and operating its own mines and refineries. They develop proprietary chemical processes and run them at lab, pilot, and commercial scale, with their first commercial-scale lithium production facility targeting initial production in Q1 of 2027. As a Senior Data Engineer, you will own a data domain end-to-end, designing the pipelines, schemas, and contracts that make plant data trustworthy and queryable. The systems built will be the foundation for every model and operational decision. The tech stack involves an internal platform, PlantOS, which uses reinforcement learning toolkits applied to autonomous control of mineral refining circuits. The data backbone must handle noisy and non-stationary environments with drifting sensors, malformed lab results, shifting compositions, and aging equipment to achieve fully autonomous refining operations.

Requirements

4+ years in data engineering or a closely related role.
Strong Python and SQL, with deep experience designing database and warehouse schemas, including time-series and/or analytical data.
Proven experience building reliable, orchestrated data pipelines and operating them in the cloud with containers and CI/CD.
Experience with data quality, observability, and lineage, and comfort with messy real-world sources—drifting sensors, malformed exports, and the quirks of industrial systems.
A self-starter comfortable in high-ambiguity environments, working directly with process engineers, ML engineers, and operations teams.

Nice To Haves

Experience feeding data to ML systems—training datasets, feature pipelines, model monitoring—or working with industrial, sensor, or historian data.

Responsibilities

Work across domains—for example, all plant sensor and historian data, or all lab and analytical results—including schema design, orchestration, reliability, and the contract it exposes to everyone downstream.
Design and evolve our fleet of pipelines that pull from messy industrial sources—sensors, lab systems, historians, imagery, and more—into our databases and warehouse.
Model time-series and analytical plant data for both human analysis and machine learning training, validation, and monitoring; own data quality, observability, and lineage in your domain.
Build the data architecture that feeds production ML—the training and monitoring layer—in partnership with the ML engineers who own the model-specific semantics.
Mentor earlier-career engineers and define the data contracts other teams build against.
Work the boundary with machine learning deliberately: you own the platform and the interface it exposes; ML engineers own the features and models built on top of it. The training and monitoring layer is shared ground you design together.