Lead Data Engineer

Toyota North America•Plano, TX

2d•Onsite

About The Position

At TFS, we're building next-generation products that redefine mobility for millions of customers worldwide. We're looking for a Sr Lead Engineer — an individual contributor at the principal level — who brings deep expertise in data engineering, streaming architectures, and analytics platforms, combined with technical leadership to make data a reliable, scalable foundation for the entire engineering organization. This isn't a management role. It's for the engineer who thinks in pipelines and data contracts: the one who can design a Lakehouse architecture, build a real-time streaming platform, ensure data quality at scale, and make it all self-service for the teams that depend on it. You'll work at the intersection of backend engineering, ML/AI, and analytics — making sure the data that powers our products, models, and decisions is trustworthy, timely, and accessible. If you want to build the data backbone of a modern engineering org — not just move files around — this is the role. This position is based in Plano, TX. The selected candidate will be expected to reside within a commutable distance of this location.

Requirements

Bachelor's degree in Computer Science, Data Engineering, Information Systems, or related field, or equivalent practical experience
7+ years of software or data engineering experience, including 3–5 years focused specifically on data platform and pipeline engineering at scale, with a track record of operating at a principal or staff engineer level
Deep expertise in designing and building data lake and Lakehouse architectures on AWS, including: S3 as the foundation for data lake storage, with strong opinions on partitioning, file formats (Parquet, Avro, ORC), and lifecycle management
AWS Glue for ETL/ELT jobs, crawlers, and the Data Catalog
Amazon Athena for serverless SQL analytics over the data lake
Lake Formation for fine-grained access control, governance, and cross-account data sharing
Amazon Redshift or Redshift Serverless for data warehousing and high-performance analytical queries
Amazon EMR or EMR Serverless for large-scale Spark, Hive, or Presto workloads
Production experience with real-time and streaming data architectures, including: Amazon Kinesis (Data Streams, Data Firehose) for real-time ingestion and delivery
Amazon MSK (Managed Kafka) or self-managed Kafka for event streaming at scale
EventBridge, SQS, or SNS for event-driven integration with application services
Lambda for lightweight stream processing and event transformation
Apache Flink (via Amazon Managed Service for Apache Flink) or Spark Structured Streaming for stateful stream processing
Strong proficiency in Python and SQL — you write production-quality pipeline code, not just ad-hoc scripts, and you can optimize a complex query as fluently as you can design a DAG
Experience with workflow orchestration tools: Step Functions, Apache Airflow (via Amazon MWAA), or similar — you know how to build reliable, observable, and recoverable pipeline DAGs
Solid understanding of data modeling for both analytical and operational use cases: star schemas, slowly changing dimensions, wide tables, event sourcing, and CDC (change data capture) patterns
Experience with data quality and governance tooling and practices: Great Expectations, Deequ, or custom validation frameworks — plus data cataloging, lineage tracking, and access control
Strong understanding of Infrastructure as Code using AWS CDK, CloudFormation, or Terraform for data infrastructure
Experience with observability and monitoring for data systems: pipeline health dashboards, data freshness tracking, SLA monitoring, and alerting on failures or anomalies (CloudWatch, Datadog, or similar)
Strong understanding of security best practices for data: IAM policies, Lake Formation permissions, encryption at rest and in transit, data masking, and PII handling
Deep experience debugging complex issues across data systems — pipeline failures, data skew, schema mismatches, streaming lag, and storage cost runaway
Experience with testing strategies for data pipelines: data validation, schema contract testing, integration testing, and pipeline idempotency
Strong written and verbal communication — you can write a clear RFC, lead a design review, and explain a data architecture tradeoff to a non-technical stakeholder

Nice To Haves

Master's degree in Computer Science, Data Engineering, or related field
Experience in the financial services, banking, or insurance industry
Experience with open table formats: Apache Iceberg, Delta Lake, or Apache Hudi for ACID transactions, time travel, and schema evolution on the data lake
Experience with feature store design and implementation for ML/AI use cases (SageMaker Feature Store, Feast, or custom)
Familiarity with dbt or similar transformation frameworks for analytics engineering and data modeling
Experience with real-time analytics serving layers: Amazon OpenSearch, DynamoDB, or ElastiCache for low-latency data access
Experience designing multi-account AWS data architectures with proper governance and guardrails (AWS Organizations, Control Tower, cross-account data sharing via Lake Formation)
Hands-on experience with data mesh or data product patterns — decentralized ownership with centralized governance
Experience with CDC (change data capture) tools: AWS DMS, Debezium, or similar for streaming database changes into the data lake
Experience with cost optimization for data workloads: storage tiering, compute right-sizing, spot instances for Spark, and query optimization
Experience with GenAI data pipelines: preparing training datasets, building RAG knowledge bases, embedding generation, and vector store population
AWS certifications (Data Analytics Specialty, Solutions Architect, Database Specialty)
Experience with CI/CD pipelines for data infrastructure and pipeline deployment (CodePipeline, GitHub Actions, or similar)
Experience contributing to or maintaining open-source data engineering projects
Experience defining engineering standards, writing ADRs, or leading org-wide technical initiatives

Responsibilities

Serve as the technical authority for data architecture across the organization, making high-impact decisions on data lake design, streaming topologies, storage formats, partitioning strategies, and data modeling patterns
Design, build, and maintain production-grade data pipelines — batch and real-time — from ingestion and transformation to serving and consumption
Own the data platform: build and evolve the foundational infrastructure that engineering, ML/AI, and analytics teams depend on for reliable, governed, and performant data access
Partner closely with ML/AI engineers to ensure training data, feature pipelines, and model serving data are accurate, fresh, and efficiently delivered — you are the upstream enabler for every model in production
Collaborate with backend and full-stack engineers to design event-driven architectures, define data contracts, and ensure application data flows cleanly into the data platform
Lead technical design reviews, architecture discussions, and RFC processes for data initiatives — driving alignment across engineering teams
Identify and resolve systemic data issues: pipeline failures, data quality degradation, schema drift, latency in streaming systems, cost inefficiencies in storage and computing, and gaps in data observability
Define and champion data engineering best practices: data modeling, schema evolution, data contracts, testing strategies, lineage tracking, cataloging, and governance
Design and implement data quality frameworks — validation rules, anomaly detection, freshness checks, and alerting — so downstream consumers can trust the data without asking
Collaborate closely with Engineering Managers, Product, Data Science, and Analytics to shape data roadmaps and ensure the platform evolves with business needs
Mentor and grow engineers at all levels through code reviews, pairing, design feedback, and technical guidance on data engineering topics
Contribute to hiring by conducting technical interviews and helping define what great looks like for data engineering at TFS
Proactively communicate technical risks, tradeoffs, and recommendations to both engineering and non-technical stakeholders

Benefits

A work environment built on teamwork, flexibility, and respect
Professional growth and development programs to help advance your career, as well as tuition reimbursement
Team Member Vehicle Purchase Discount
Toyota Team Member Lease Vehicle Program (if applicable)
Comprehensive health care and wellness plans for your entire family
Toyota 401(k) Savings Plan featuring a company match, as well as an annual retirement contribution from Toyota, regardless of whether you contribute
Paid holidays and paid time off
Referral services related to prenatal services, adoption, childcare, schools, and more
Tax-Advantaged Accounts (Health Savings Account, Health Care FSA, Dependent Care FSA)
Relocation Assistance (if applicable)