onX is building the next-generation data foundation that fuels our growth. As a Data Engineer, you’ll design, build, and scale the lakehouse architecture that underpins analytics, machine learning, and AI at onX. You’ll work across teams to modernize our data ecosystem, making it discoverable, reliable, governed, and ready for self-service and intelligent automation. This role is intentionally broad in scope. We’re seeking engineers who can operate anywhere along the data lifecycle from ingestion and transformation to metadata, orchestration, and MLOps. Depending on experience, you may focus on foundational architecture, scaling reusable services, or embedding governance, semantic alignment, and observability patterns into the platform. As an onX Data Engineer, your day to day responsibilities would look like: Architecture and Design Design, implement, and evolve onX’s Iceberg-based lakehouse architecture to balance scalability, cost, and performance. Establish data layer standards (Raw, Curated, Certified) that drive consistency, traceability, and reusability across domains. Define and implement metadata first and semantic layer architectures that make data understandable, trusted, and ready for self-service analytics. Partner with BI and business stakeholders to ensure domain models and certified metrics are clearly defined and aligned to business language. Data Pipeline Development Build and maintain scalable, reliable ingestion and transformation pipelines using GCP tools (Spark, Dataflow, Pub/Sub, BigQuery, Dataplex, Cloud Composer). Develop batch and streaming frameworks with schema enforcement, partitioning, and lineage capture. Use configuration-driven, reusable frameworks to scale ingestion, curation, and publishing across domains. Apply data quality checks and contracts at every layer to ensure consistency, auditability, and trust. MLOps and Advanced Workflows Collaborate with Data Science to integrate feature stores, model registries, and model monitoring into the platform. Build and maintain standardized orchestration and observability patterns for both data and ML pipelines, ensuring SLA, latency, and cost visibility. Develop reusable microservices that support model training, deployment, and scoring within a governed, observable MLOps framework. Implement self-healing patterns to minimize MTTR and ensure production reliability. Governance, Metadata, and Self-Service Enablement Automate governance via metadata-driven access controls (row/column permissions, sensitivity tagging, lineage tracking). Define and maintain the semantic layer that bridges the technical data platform and business self-service, enabling analysts and AI systems to explore data confidently. Use GCP Dataplex as the unifying layer for data discovery, lineage, and access management, serving as the first step in evolving our metadata fabric toward a fully connected semantic graph. Extend metadata models so datasets, pipelines, and models become interconnected, explainable, and machine-readable, enabling future intelligence built on relationships, not just tables. Champion the use of metadata and semantics as the control plane for quality, cost, and performance, empowering teams to self-serve trusted data. Collaboration and Enablement Partner with BI, Product, and Marketing to align on key business metrics, certified definitions, and self-service models. Work closely with infrastructure and security teams to embed privacy, cost management, and compliance into every layer of the stack. Mentor peers by documenting patterns, reviewing code, and promoting best practices. Participate in KTLO (Keep the Lights On) to ensure stability as modernization continues.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level