Intern – Data Engineering

American Medical AssociationChicago, IL
1dHybrid

About The Position

Intern – Data Engineering Chicago, IL (Hybrid) The American Medical Association (AMA) is the nation's largest professional Association of physicians and a non-profit organization. We are a unifying voice and powerful ally for America's physicians, the patients they care for, and the promise of a healthier nation. To be part of the AMA is to be part of our Mission to promote the art and science of medicine and the betterment of public health. At AMA, our mission to improve the health of the nation starts with our people. We foster an inclusive, people-first culture where every employee is empowered to perform at their best. Together, we advance meaningful change in health care and the communities we serve. We encourage and support professional development for our employees, and we are dedicated to social responsibility. We invite you to learn more about us and we look forward to getting to know you. We have an opportunity at our corporate offices in Chicago for an Intern – Data Engineering on our Health Solutions team. This is a hybrid position reporting into our Chicago, IL office, requiring 3 days a week in the office. As an Intern – Data Engineering, you will help enhance and maintain the AMA’s data and AI pipeline infrastructure, supporting the intake, processing, and management of critical healthcare data assets. You will collaborate with cross-functional teams to build scalable data pipelines, improve data quality, and ensure data is accessible for analytical and AI initiatives across the Health Solutions Group.

Requirements

  • Be working towards a BS or MS degree in Computer Science, Software Engineering, Data Engineering, Data Science, Information Systems, or a related field.
  • Basic experience with programming languages such as Python, Java, or Scala for data processing and automation.
  • Familiarity with SQL for querying and manipulating datasets; exposure to ETL concepts, data warehouse, and data mart architectures a plus.
  • Understanding of data ingestion, transformation, standardization, metadata management, and data enhancement, ideally working with formats such as XML, JSON, streaming data, and REST APIs.
  • Exposure to orchestration and scheduling tools like Airflow, Jenkins, or similar workflow automation.
  • Interest in cloud platforms (especially AWS) and big data technologies (especially Spark); hands-on coursework or projects is a plus.
  • Comfort working with structured and unstructured data, and a willingness to solve data quality or integration challenges.
  • Knowledge or interest in master data management, entity resolution, and developing scalable, high-quality data pipelines.
  • Strong documentation skills and attention to detail, with a passion for lifecycle process improvement.

Responsibilities

  • Assist in the design, development, and deployment of robust data and AI pipelines, enabling seamless data ingestion, transformation, and loading from a variety of sources (relational databases, APIs, streaming data, unstructured data, etc.).
  • Support the modernization of the AMA’s data infrastructure, contributing to the migration and integration efforts for new platforms and the implementation of innovative data engineering solutions.
  • Facilitate the automation of data workflows and reporting, creating repeatable processes to reduce manual intervention and improve efficiency.
  • Work with technology teams to ensure high standards of data quality, integrity, and security across all pipeline stages; assist with developing scalable solutions for monitoring and error detection.
  • Participate in the implementation and optimization of master data management strategies, including activities related to metadata management, data cleansing, and entity resolution across multiple data sources.
  • Collaborate with stakeholders to gather requirements, develop technical documentation, and contribute to proof-of-concept projects integrating new data sets and technologies.
  • Aid in the setup, orchestration, and scheduling of data pipelines using Databricks and our custom DAG execution components in AWS.
  • Contribute to the development and testing of AI and machine learning pipelines to enable advanced search, analytics, and NLP applications.
  • May include other responsibilities as assigned.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service