Vera Institute of Justice-posted 5 days ago
$17 - $25/Yr
Intern
Hybrid • Los Angeles, NY
251-500 employees

The Data Engineering Intern for the Research Department’s Central Data Science team at Vera is an opportunity for a college student or recent grad to immerse themselves into working in a data role at a non-profit organization. Their work will support the construction and maintenance of a centralized data ingestion/processing framework and data warehouse to support researchers who work with public and/or large-scale data in national and place-based initiatives. O ver the course of their internship, the data engineering intern will focus on an engineering project that is relevant to their interests and experience. This might involve helping to design and manage large-scale data infrastructure systems, creating computational frameworks, designing data models, or building new tools to empower our organization’s researchers or community partners to leverage and improve Vera’s existing repository of data. Depending on their interest, other day-to-day tasks could include exploratory analyses, analytical software development, web scraping, or other related areas of interest. The intern will also participate in all day-to-day team activities, ranging from project planning and execution to code review sessions, pair programming, social activities, and more. They will work directly with a senior data engineer on the team on a range of responsibilities, including data collection, data modelling, automation, building data infrastructure, and ensuring data quality. The Data Engineering intern will learn: To understand the process for data ingestion with variety of sources To build their skills in data modeling Workflow dynamics in a fast-moving team involved in multiple projects at once

  • Data ingestion Integrate new sources of criminal justice, immigration, and economics data into our internally collected data; clean, transform, organize, ensure quality of production data Refactor existing web scrapers and data processes to new centralized infrastructure and frameworks
  • Central data model construction Contribute code to building a central data model for unrestricted datasets
  • Code review and codebase maintenance Help maintain existing codebase through reviewing requests Coordinate with data science staff to ensure consistency of datasets, naming conventions, code repository structure, etc.
  • Demonstrated proficiency working with data collection and processing in Python, with preference for experience using SQL and Python Pandas library.
  • Proficiency developing code collaboratively using GitHub
  • Commitment to advancing racial and gender equity
  • Curiosity about emerging research and advocacy in the criminal justice space and/or immigration spaces
  • Wrestles with creative and concrete ways to use data to shift power and advance equity and inclusion
  • The current tech stack uses the following technologies and working fluency in the following is required: Python SQL/relational databases Cloud Technology (GCP and/or AWS) GitHub Airflow Docker
  • Professional, personal or academic engagement with issues of mass incarceration and mass criminalization
  • Experience working with Google Cloud Platform and its tools, including Airflow
  • generous paid time off
  • a comprehensive health insurance plan
  • student loan repayment benefits
  • professional development training opportunities and up to $2,000 annual for education costs and fees relevant to Vera work
  • employer-funded retirement plan
  • flexible time
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service