Data Engineer III

onX•Bozeman, MT

1h•$125,000 - $145,000•Remote

About The Position

onX is building the next-generation data foundation that fuels our growth. As a Data Engineer, you’ll design, build, and scale the lakehouse architecture that underpins analytics, machine learning, and AI at onX. You’ll work across teams to modernize our data ecosystem, making it discoverable, reliable, governed, and ready for self-service and intelligent automation. This role is intentionally broad in scope. We’re seeking engineers who can operate anywhere along the data lifecycle from ingestion and transformation to metadata, orchestration, and MLOps. Depending on experience, you may focus on foundational architecture, scaling reusable services, or embedding governance, semantic alignment, and observability patterns into the platform. As an onX Data Engineer, your day to day responsibilities would look like: Architecture and Design Design, implement, and evolve onX’s Iceberg-based lakehouse architecture to balance scalability, cost, and performance. Establish data layer standards (Raw, Curated, Certified) that drive consistency, traceability, and reusability across domains. Define and implement metadata first and semantic layer architectures that make data understandable, trusted, and ready for self-service analytics. Partner with BI and business stakeholders to ensure domain models and certified metrics are clearly defined and aligned to business language. Data Pipeline Development Build and maintain scalable, reliable ingestion and transformation pipelines using GCP tools (Spark, Dataflow, Pub/Sub, BigQuery, Dataplex, Cloud Composer). Develop batch and streaming frameworks with schema enforcement, partitioning, and lineage capture. Use configuration-driven, reusable frameworks to scale ingestion, curation, and publishing across domains. Apply data quality checks and contracts at every layer to ensure consistency, auditability, and trust. MLOps and Advanced Workflows Collaborate with Data Science to integrate feature stores, model registries, and model monitoring into the platform. Build and maintain standardized orchestration and observability patterns for both data and ML pipelines, ensuring SLA, latency, and cost visibility. Develop reusable microservices that support model training, deployment, and scoring within a governed, observable MLOps framework. Implement self-healing patterns to minimize MTTR and ensure production reliability. Governance, Metadata, and Self-Service Enablement Automate governance via metadata-driven access controls (row/column permissions, sensitivity tagging, lineage tracking). Define and maintain the semantic layer that bridges the technical data platform and business self-service, enabling analysts and AI systems to explore data confidently. Use GCP Dataplex as the unifying layer for data discovery, lineage, and access management, serving as the first step in evolving our metadata fabric toward a fully connected semantic graph. Extend metadata models so datasets, pipelines, and models become interconnected, explainable, and machine-readable, enabling future intelligence built on relationships, not just tables. Champion the use of metadata and semantics as the control plane for quality, cost, and performance, empowering teams to self-serve trusted data. Collaboration and Enablement Partner with BI, Product, and Marketing to align on key business metrics, certified definitions, and self-service models. Work closely with infrastructure and security teams to embed privacy, cost management, and compliance into every layer of the stack. Mentor peers by documenting patterns, reviewing code, and promoting best practices. Participate in KTLO (Keep the Lights On) to ensure stability as modernization continues.

Requirements

Bachelor’s degree in Computer Science or equivalent work experience
Five (5) or more years of professional software development experience is required, focused on web client development
You believe that your profession is a craft and you’re driven to improve every day
A shared passion for and ability to demonstrate onX’s Company Values
You're comfortable using AI-assisted tools to improve engineering productivity, code quality, and velocity, and can help your team adopt them effectively.
Permanent US work authorization is a condition of employment with onX
Deep experience designing and building pipelines using GCP (Spark, Dataflow, Pub/Sub, BigQuery, Composer, Dataplex, Cloud Storage).
Strong programming skills in Python and SQL; familiarity with Java or Scala is a plus.
Expertise in data modeling, schema evolution, and optimization for both batch and streaming systems.
Hands-on experience with Apache Iceberg or similar table formats (Delta, Hudi).
Knowledge of MLOps frameworks (feature store, model registry, monitoring) and integration with data pipelines.
Experience implementing or supporting a semantic layer for governed self-service analytics.
Familiarity with event-driven architectures and near-real-time data processing patterns.
Understanding of data governance, quality, and compliance principles.
Proficiency with orchestration, observability, and CI/CD practices for data workloads.
Proven ability to design system architecture, lead cross-functional data initiatives, and mentor other engineers.
You think in systems and reusable patterns, not one-off pipelines.
You see metadata and semantics as strategic assets, not technical overhead.
You bridge the gap between tech

Responsibilities

Design, implement, and evolve onX’s Iceberg-based lakehouse architecture to balance scalability, cost, and performance.
Establish data layer standards (Raw, Curated, Certified) that drive consistency, traceability, and reusability across domains.
Define and implement metadata first and semantic layer architectures that make data understandable, trusted, and ready for self-service analytics.
Partner with BI and business stakeholders to ensure domain models and certified metrics are clearly defined and aligned to business language.
Build and maintain scalable, reliable ingestion and transformation pipelines using GCP tools (Spark, Dataflow, Pub/Sub, BigQuery, Dataplex, Cloud Composer).
Develop batch and streaming frameworks with schema enforcement, partitioning, and lineage capture.
Use configuration-driven, reusable frameworks to scale ingestion, curation, and publishing across domains.
Apply data quality checks and contracts at every layer to ensure consistency, auditability, and trust.
Collaborate with Data Science to integrate feature stores, model registries, and model monitoring into the platform.
Build and maintain standardized orchestration and observability patterns for both data and ML pipelines, ensuring SLA, latency, and cost visibility.
Develop reusable microservices that support model training, deployment, and scoring within a governed, observable MLOps framework.
Implement self-healing patterns to minimize MTTR and ensure production reliability.
Automate governance via metadata-driven access controls (row/column permissions, sensitivity tagging, lineage tracking).
Define and maintain the semantic layer that bridges the technical data platform and business self-service, enabling analysts and AI systems to explore data confidently.
Use GCP Dataplex as the unifying layer for data discovery, lineage, and access management, serving as the first step in evolving our metadata fabric toward a fully connected semantic graph.
Extend metadata models so datasets, pipelines, and models become interconnected, explainable, and machine-readable, enabling future intelligence built on relationships, not just tables.
Champion the use of metadata and semantics as the control plane for quality, cost, and performance, empowering teams to self-serve trusted data.
Partner with BI, Product, and Marketing to align on key business metrics, certified definitions, and self-service models.
Work closely with infrastructure and security teams to embed privacy, cost management, and compliance into every layer of the stack.
Mentor peers by documenting patterns, reviewing code, and promoting best practices.
Participate in KTLO (Keep the Lights On) to ensure stability as modernization continues.

Benefits

Competitive salaries, annual bonuses, equity, and opportunities for growth
Comprehensive health benefits including a no-monthly-cost medical plan
Parental leave plan of 5 or 13 weeks fully paid
401k matching at 100% for the first 3% you save and 50% from 3-5%
Company-wide outdoor adventures and amazing outdoor industry perks
Annual “Get Out, Get Active” funds to fuel your active lifestyle in and outside of the gym
Flexible time away package that includes PTO, STO, VTO, and 7 paid holidays annually

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume