Lead AI Engineer

ForeFlight•Austin, TX

17h

About The Position

Jeppesen ForeFlight is hiring a Lead AI Engineer to join our RADAR (Reporting, Analytics, Data, AI & Research) team. This is the team’s first strategic ML engineering hire, and you’ll play a foundational role in shaping how we apply reproducible statistical programming, analytics automation, and GenAI to solve real business problems across Finance, Customer Success, Revenue Operations, Accounting, and Product. You will design and build end-to-end machine learning pipelines, extract information from structured and unstructured sources (PDFs, disparate systems, scanned documents), and serve as a technical mentor who elevates the analytical capabilities of the broader team. This role blends deep applied statistics with modern analytics and ML engineering practices. We’re looking for someone who can move fluidly between exploratory analysis and SQL deep dives, production-grade modeling, and teaching others how to do the same.

Requirements

5-10 years of applied experience in data science, machine learning, and/or quantitative analytics.
Strong proficiency in R and Python for statistical modeling, ML, and API pipeline development.
Hands-on experience building supervised learning models (regression, classification) using frameworks such as scikit-learn, tidymodels, XGBoost, Stan, or similar.
Demonstrated understanding of the full modeling lifecycle: data pre-processing, feature engineering, hyperparameter tuning, model evaluation, calibration, and deployment.
Experience with SQL and working against large-scale data warehouses or analytical databases.
Familiarity with NLP, text extraction, or document processing techniques (OCR, NER, or similar).
Excellent written and verbal communication skills, with the ability to present complex analytical work to non-technical stakeholders.
Bachelor’s degree in Mathematica, Statistics, Computer Science, Economics, or a related quantitative field. Master’s degree preferred.

Nice To Haves

Experience designing deterministic, reproducible ML workflows using tidymodels (R) and/or scikit-learn (Python) pipelines, including space-filling experimental designs for hyperparameter optimization.
Experience with Apache Arrow, DuckDB, or Polars for high-performance in-memory data processing and ETL.
Experience with Databricks (mlflow, notebooks, Unity Catalog, Spark) or similar cloud-based ML platforms.
Proficiency with tidyverse, DBI, odbc, dbplyr, Shiny, and ellmer.
Experience using AI-assisted development tools such as Claude Code, Codex, Cline, etc., for accelerating analytical and engineering workflows.
Experience with git-based CI/CD pipelines (GitLab CI/CD or GitHub Actions) for automated testing, model validation, Quarto renderings, deployment workflows, etc.
Comfortable working in VS Code, RStudio, or Positron (JetBrains DataGrip a plus).
Track record of developing training materials, leading workshops, or mentoring junior analysts and data professionals.
Graduate-level coursework or professional experience in Bayesian methods, mixed effects models, survival analysis, experimental design, time series analysis, and/or pre-training transformers for event sequence problems.
Experience working across multiple business domains (finance, healthcare, operations, etc.) and adapting analytical approaches to varied problem types.

Responsibilities

Design, build, and maintain reproducible end-to-end machine learning pipelines for multivariate regression and classification tasks using frameworks such as R’s tidymodels and/or Python’s scikit-learn.
Apply gradient boosting methods (XGBoost, LightGBM) and ensemble approaches (random forests), among other ML and deep learning algorithms, to high-impact business problems.
Implement rigorous data pre-processing, feature engineering, hyperparameter optimization (including space-filling designs), and post-processing techniques such as probability calibration.
Build and deploy GenAI-enabled information extraction workflows including OCR, named entity recognition (NER), NLP, and custom prompting schemas to pull structured data from PDFs, scanned contracts, and other unstructured documents across systems.
Deploy trained model objects and workflows into production environments using Databricks, APIs, SQL (for in-database inference using frameworks like Orbital), or containerized services.
Develop and deliver upskilling content, tutorials, and hands-on workshops for internal RADAR team members and extended data scientists at Jeppesen ForeFlight covering git and version control, scripting in R and Python, boosting productivity with GenAI tools, and core data science concepts.
Partner with cross-functional stakeholders to translate ambiguous business questions into well-scoped analytical projects, and communicate findings to both technical and non-technical audiences.
Contribute to the team’s standards for reproducible, version-controlled analytical work.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume