As an AI/ML Data Engineer, you’ll design, build, and operate the data and ML plumbing that powers personalized student experiences at scale. You’ll create batch and streaming pipelines, ML‑ready datasets, feature/embedding stores, and the services that move models into production safely and compliantly. You’ll collaborate with Product, Data Science, and Analytics to turn raw events into reliable, privacy‑preserving features that drive real impact for students and higher‑ed partners. In this role, you will: ML Data Platform & Pipelines (40%) Design, build, and own batch and streaming ETL (e.g., Kinesis/Kafka → Spark/Glue → Step Functions/Airflow) for training, evaluation, and inference use cases. Stand up and maintain offline/online feature stores and embedding pipelines (e.g., S3/Parquet/Iceberg + vector index) with reproducible backfills. Implement data contracts & validation (e.g., Great Expectations/Deequ), schema evolution, and metadata/lineage capture (e.g., OpenLineage/DataHub/Amundsen). Optimize lakehouse/warehouse layouts and partitioning (e.g., Redshift/Athena/Iceberg) for scalable ML and analytics. Model Enablement & LLM DataOps (30%) Productionize training and evaluation datasets with versioning (e.g., DVC/LakeFS) and experiment tracking (e.g., MLflow). Build RAG foundations: document ingestion, chunking, embeddings, retrieval indexing, and quality evaluation (precision@k, faithfulness, latency, and cost). Collaborate with DS to ship models to serving (e.g., SageMaker/EKS/ECS), automate feature backfills, and capture inference data for continuous improvement. Reliability, Security & Compliance (15%) Define SLOs and instrument observability across data and model services (freshness, drift/skew, lineage, cost, and performance). Embed security & privacy by design (PII minimization/redaction, secrets management, access controls), aligning with College Board standards and FERPA. Build CI/CD for data and models with automated testing, quality gates, and safe rollouts (shadow/canary). Documentation & Enablement (15%) Maintain docs‑as‑code for pipelines, contracts, and runbooks; create internal guides and tech talks. Mentor peers through design reviews, pair/mob sessions, and post‑incident learning.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
No Education Listed
Number of Employees
1,001-5,000 employees