Data Engineer, AWS & AI/ML Enablement

College Board

23h•$140,000 - $151,000•Remote

About The Position

As a Data Engineer, AWS & AI/ML Enablement, you will design, build, and operate scalable, secure, and high-quality data platforms that power analytics, reporting, and emerging AI/ML use cases. This role is primarily a Data Engineering position, with a strong focus on cloud-native data pipelines, analytics infrastructure, and software engineering best practices, while also partnering closely with Data Science and AI teams to enable ML-ready datasets, feature pipelines, and model production workflows. You will work in an AWS-native, microservices environment, collaborating with Product Owners, Architects, Software Engineers, and Data Scientists to transform raw data into trusted, actionable insights and AI-enabled capabilities that drive real impact for students and higher ed partners. In this role, you will:

Requirements

4+ years of experience in Data Engineering or Software Engineering in a production environment using AWS services such as S3, Glue, Lambda, Athena, DynamoDB, Step Functions, Redshift and Kinesis
Strong proficiency in Python and SQL, including performance tuning for large datasets.
1+ years of hands-on experience designing, building, and deploying production-grade ML and generative AI solutions using AWS SageMaker and Amazon Bedrock
Experience designing and operating ETL/ELT pipelines, data models, and analytics-ready datasets.
Solid understanding of cloud computing, DevOps, CI/CD, and microservices architectures.
Strong security and privacy mindset, especially when working with sensitive data.
Demonstrated interest in continuous learning, including keeping up with evolving data engineering and AI/ML best practices.
Excellent communication skills with the ability to explain technical concepts to both technical and non-technical stakeholders.
A passion for expanding educational and career opportunities and mission-driven work
Authorization to work in the United States for any employer
Curiosity and enthusiasm for emerging technologies, with a willingness to experiment with and adopt new AI-driven solutions and a comfort learning and applying new digital tools independently and proactively.
Clear and concise communication skills, written and verbal
A learner's mindset and a commitment to growth: welcoming diverse perspectives, giving and receiving timely, respectful feedback, and continuously improving through iterative learning and user input.
A drive for impact and excellence: solving complex problems, making data-informed decisions, prioritizing what matters most, and continuously improving through learning, user input, and external benchmarking.
A collaborative and empathetic approach: working across differences, fostering trust, and contributing to a culture of shared success.

Nice To Haves

Experience with event-driven architectures and real-time analytics.
Front-end or API experience (e.g., React, Node.js) is a plus.
Exposure to observability and monitoring for data pipelines, including freshness, volume, and performance metrics.
Experience collaborating with product managers and analytics partners to translate business requirements into well-designed data solutions.

Responsibilities

Design, build, and maintain scalable batch and streaming data pipelines using AWS services such as S3, Glue, Lambda, Kinesis, Step Functions, Redshift, Athena, and DynamoDB.
Develop and optimize data models and complex SQL queries to support analytics, reporting, and downstream consumers.
Build and operate serverless ETL frameworks for automated ingestion, transformation, and loading of structured and semi-structured data.
Implement cloud-first, microservices-based architectures, ensuring high availability, performance, and cost efficiency.
Ensure data quality, reliability, and observability through automated testing, validation, monitoring, and alerting.
Integrate BI and analytics tool such as QuickSight to enable real-time and self-service analytics.
Contribute to CI/CD pipelines, infrastructure automation, and secure development practices to deliver production-grade data systems.
Partner with Data Science and AI teams to productionize ML-ready datasets, including training, evaluation, and inference data pipelines.
Build and maintain feature pipelines and embedding workflows that support ML models and experimentation.
Support MLOps/LLMOps workflows, including dataset versioning, experiment tracking, and capturing inference data for continuous improvement.
Enable AI use cases such as recommendation systems, personalization, and retrieval-augmented generation (RAG) through robust data foundations.
Apply a thoughtful approach to AI feasibility, fairness, and effectiveness, especially when working with sensitive or regulated data.
Participate actively in Agile/Scrum ceremonies, design reviews, and peer code reviews.
Collaborate cross-functionally with Product, UX, Infrastructure, and Security teams.
Mentor junior engineers by providing guidance on data architecture, coding standards, and best practices.
Produce clear documentation, runbooks, and technical guides to support long-term platform sustainability.