Analytics Engineer

Charter School Growth Fund

About The Position

The Charter School Growth Fund (CSGF) is a leading nonprofit venture philanthropy fund that has spent 20+ years identifying high-quality public charter schools and investing in their growth. Today the portfolio spans 200+ networks, 1,700+ schools, and more than 840,000 students. This role sits within a new public data infrastructure project that CSGF is incubating alongside its core operations. The project is designed to evolve our internal assessment data pipeline into free, open infrastructure that researchers, funders, policymakers, and school networks can use independently. School performance data is currently fragmented across dozens of state agencies, inconsistently formatted, and practically inaccessible to anyone without significant technical resources. This platform processes standardized assessment data across 40+ charter states, calculates school performance metrics, and publishes results through a public-facing portal. A core part of our vision is building toward an open source codebase and we're looking for someone who is excited about that kind of ultimate public-facing technical work, not just internal tooling. This role sits within a small, dedicated team building public data infrastructure for the education investing, research, and policy community. The team operates as a focused engineering and data function: processing state assessment files, maintaining the data pipeline, publishing metrics, and keeping the platform and its documentation current. Everyone on the team works close to the data and close to the code. The team is organized around four core functions: Data Collection and Management, Infrastructure and Engineering, Data Validation and QA, Publishing and Documentation. As an Analytics Engineer you will help build and maintain the data models, validation scripts, and documentation that the platform depends on. This is a one-year contract role with the possibility of extension. This role will report to the Vice President who owns product direction and architecture. The team is highly collaborative and there are always opportunities to develop skills outside the core responsibilities of an individual role. The project runs on an annual state release cycle, with the bulk of new assessment data arriving in the summer months. This role will need to be actively contributing to parser updates and dbt model changes within the first four to six weeks of starting. We are looking for someone who can orient in an unfamiliar codebase quickly and move from observation to independent contribution without an extended onboarding period.

Requirements

Hands-on experience writing and maintaining dbt models in a production or near-production environment
Solid SQL skills, including the ability to debug complex transformations and identify data quality issues
Comfort working with messy, real-world data: inconsistent formats, missing values, undocumented quirks, and files that require investigation before they can be modeled
Comfort with Python for data analysis tasks: reading files, transforming data, writing validation scripts
Ability to work independently on loosely defined problems. New state parsers and unfamiliar file formats are a recurring part of this role, not an exception.
Familiarity with software development practices: version control with Git, code review, and basic CI/CD workflows
Strong written communication skills: able to document data decisions, methodology choices, and known limitations clearly for a non-technical audience
Strong attention to detail and a habit of verifying outputs rather than assuming correctness
Experience as an analytics engineer, data analyst, data engineer, or similar role; or relevant undergraduate education (e.g. B.A. in Computer Science, Data Science, or a related field)
Candidates must have permanent authorization to work in the US.

Nice To Haves

Familiarity with DuckDB or other embedded/local analytics databases
Familiarity with Quarto or Markdown for publishing data outputs
Experience contributing to open-source projects or building for public data audiences
Comfort using AI coding tools (Claude Code, Cursor, Copilot, or similar) as part of a day-to-day development workflow
A general orientation toward AI-augmented development: using LLMs routinely across code review, documentation, debugging, and exploratory analysis
Background in education data, public sector data, or policy-adjacent research contexts is a plus

Responsibilities

Build and maintain dbt models that transform raw state assessment files into clean, analysis-ready datasets
Develop new state parsers for assessment files that vary significantly in format, structure, and quality across states. This requires independent problem solving, not just following established patterns.
Contribute to school and CMO classification reference datasets, keeping them accurate as the underlying data evolves
Write dbt tests and maintain data documentation so that outputs are auditable by external researchers
Perform in-depth QA on state data, identifying and resolving issues across heterogeneous source formats
Author data dictionaries and methodology documentation with the same care as the code itself
Use Python to perform rigorous QA and data validation: cross-state consistency checks, metric range validation, outlier detection, and regression testing against prior releases
Write validation scripts that can be re-run as data updates, creating a consistent and reviewable QA record
Publish documentation, data dictionaries, and methodology updates to the Quarto-based portal as part of regular data releases. The portal structure exists; this role maintains and extends its content.
Write clearly for a mixed audience: technical enough for researchers building on the data, accessible enough for non-technical partners tracking what changed and why