Lead Data Engineer

Toyota North AmericaPlano, TX
Onsite

About The Position

At TFS, we're building next-generation products that redefine mobility for millions of customers worldwide. We're looking for a Sr Lead Engineer — an individual contributor at the principal level — who brings deep expertise in data engineering, streaming architectures, and analytics platforms, combined with technical leadership to make data a reliable, scalable foundation for the entire engineering organization. This isn't a management role. It's for the engineer who thinks in pipelines and data contracts: the one who can design a Lakehouse architecture, build a real-time streaming platform, ensure data quality at scale, and make it all self-service for the teams that depend on it. You'll work at the intersection of backend engineering, ML/AI, and analytics — making sure the data that powers our products, models, and decisions is trustworthy, timely, and accessible. If you want to build the data backbone of a modern engineering org — not just move files around — this is the role. This position is based in Plano, TX. The selected candidate will be expected to reside within a commutable distance of this location.

Requirements

  • Bachelor's degree in Computer Science, Data Engineering, Information Systems, or related field, or equivalent practical experience
  • 7+ years of software or data engineering experience, including 3–5 years focused specifically on data platform and pipeline engineering at scale, with a track record of operating at a principal or staff engineer level
  • Deep expertise in designing and building data lake and Lakehouse architectures on AWS, including: S3 as the foundation for data lake storage, with strong opinions on partitioning, file formats (Parquet, Avro, ORC), and lifecycle management
  • AWS Glue for ETL/ELT jobs, crawlers, and the Data Catalog
  • Amazon Athena for serverless SQL analytics over the data lake
  • Lake Formation for fine-grained access control, governance, and cross-account data sharing
  • Amazon Redshift or Redshift Serverless for data warehousing and high-performance analytical queries
  • Amazon EMR or EMR Serverless for large-scale Spark, Hive, or Presto workloads
  • Production experience with real-time and streaming data architectures, including: Amazon Kinesis (Data Streams, Data Firehose) for real-time ingestion and delivery
  • Amazon MSK (Managed Kafka) or self-managed Kafka for event streaming at scale
  • EventBridge, SQS, or SNS for event-driven integration with application services
  • Lambda for lightweight stream processing and event transformation
  • Apache Flink (via Amazon Managed Service for Apache Flink) or Spark Structured Streaming for stateful stream processing
  • Strong proficiency in Python and SQL — you write production-quality pipeline code, not just ad-hoc scripts, and you can optimize a complex query as fluently as you can design a DAG
  • Experience with workflow orchestration tools: Step Functions, Apache Airflow (via Amazon MWAA), or similar — you know how to build reliable, observable, and recoverable pipeline DAGs
  • Solid understanding of data modeling for both analytical and operational use cases: star schemas, slowly changing dimensions, wide tables, event sourcing, and CDC (change data capture) patterns
  • Experience with data quality and governance tooling and practices: Great Expectations, Deequ, or custom validation frameworks — plus data cataloging, lineage tracking, and access control
  • Strong understanding of Infrastructure as Code using AWS CDK, CloudFormation, or Terraform for data infrastructure
  • Experience with observability and monitoring for data systems: pipeline health dashboards, data freshness tracking, SLA monitoring, and alerting on failures or anomalies (CloudWatch, Datadog, or similar)
  • Strong understanding of security best practices for data: IAM policies, Lake Formation permissions, encryption at rest and in transit, data masking, and PII handling
  • Deep experience debugging complex issues across data systems — pipeline failures, data skew, schema mismatches, streaming lag, and storage cost runaway
  • Experience with testing strategies for data pipelines: data validation, schema contract testing, integration testing, and pipeline idempotency
  • Strong written and verbal communication — you can write a clear RFC, lead a design review, and explain a data architecture tradeoff to a non-technical stakeholder

Nice To Haves

  • Master's degree in Computer Science, Data Engineering, or related field
  • Experience in the financial services, banking, or insurance industry
  • Experience with open table formats: Apache Iceberg, Delta Lake, or Apache Hudi for ACID transactions, time travel, and schema evolution on the data lake
  • Experience with feature store design and implementation for ML/AI use cases (SageMaker Feature Store, Feast, or custom)
  • Familiarity with dbt or similar transformation frameworks for analytics engineering and data modeling
  • Experience with real-time analytics serving layers: Amazon OpenSearch, DynamoDB, or ElastiCache for low-latency data access
  • Experience designing multi-account AWS data architectures with proper governance and guardrails (AWS Organizations, Control Tower, cross-account data sharing via Lake Formation)
  • Hands-on experience with data mesh or data product patterns — decentralized ownership with centralized governance
  • Experience with CDC (change data capture) tools: AWS DMS, Debezium, or similar for streaming database changes into the data lake
  • Experience with cost optimization for data workloads: storage tiering, compute right-sizing, spot instances for Spark, and query optimization
  • Experience with GenAI data pipelines: preparing training datasets, building RAG knowledge bases, embedding generation, and vector store population
  • AWS certifications (Data Analytics Specialty, Solutions Architect, Database Specialty)
  • Experience with CI/CD pipelines for data infrastructure and pipeline deployment (CodePipeline, GitHub Actions, or similar)
  • Experience contributing to or maintaining open-source data engineering projects
  • Experience defining engineering standards, writing ADRs, or leading org-wide technical initiatives

Responsibilities

  • Serve as the technical authority for data architecture across the organization, making high-impact decisions on data lake design, streaming topologies, storage formats, partitioning strategies, and data modeling patterns
  • Design, build, and maintain production-grade data pipelines — batch and real-time — from ingestion and transformation to serving and consumption
  • Own the data platform: build and evolve the foundational infrastructure that engineering, ML/AI, and analytics teams depend on for reliable, governed, and performant data access
  • Partner closely with ML/AI engineers to ensure training data, feature pipelines, and model serving data are accurate, fresh, and efficiently delivered — you are the upstream enabler for every model in production
  • Collaborate with backend and full-stack engineers to design event-driven architectures, define data contracts, and ensure application data flows cleanly into the data platform
  • Lead technical design reviews, architecture discussions, and RFC processes for data initiatives — driving alignment across engineering teams
  • Identify and resolve systemic data issues: pipeline failures, data quality degradation, schema drift, latency in streaming systems, cost inefficiencies in storage and computing, and gaps in data observability
  • Define and champion data engineering best practices: data modeling, schema evolution, data contracts, testing strategies, lineage tracking, cataloging, and governance
  • Design and implement data quality frameworks — validation rules, anomaly detection, freshness checks, and alerting — so downstream consumers can trust the data without asking
  • Collaborate closely with Engineering Managers, Product, Data Science, and Analytics to shape data roadmaps and ensure the platform evolves with business needs
  • Mentor and grow engineers at all levels through code reviews, pairing, design feedback, and technical guidance on data engineering topics
  • Contribute to hiring by conducting technical interviews and helping define what great looks like for data engineering at TFS
  • Proactively communicate technical risks, tradeoffs, and recommendations to both engineering and non-technical stakeholders

Benefits

  • A work environment built on teamwork, flexibility, and respect
  • Professional growth and development programs to help advance your career, as well as tuition reimbursement
  • Team Member Vehicle Purchase Discount
  • Toyota Team Member Lease Vehicle Program (if applicable)
  • Comprehensive health care and wellness plans for your entire family
  • Toyota 401(k) Savings Plan featuring a company match, as well as an annual retirement contribution from Toyota, regardless of whether you contribute
  • Paid holidays and paid time off
  • Referral services related to prenatal services, adoption, childcare, schools, and more
  • Tax-Advantaged Accounts (Health Savings Account, Health Care FSA, Dependent Care FSA)
  • Relocation Assistance (if applicable)
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service