Data Infrastructure Engineer

Funga PBC

4h•$120,000 - $150,000•Remote

About The Position

At Funga, we leverage cutting-edge data pipelines to manage large-scale ecological, genomic, and geospatial datasets. We are looking for a Data Infrastructure Engineer to help us build scalable data solutions, streamline data ingestion, and maintain high-quality databases that support our scientific and operational teams. This role will be instrumental in optimizing our data infrastructure, ensuring efficient data access, and supporting the broader team in making data-driven decisions. We are building a lean, high-performance, low-overhead stack built on open-source foundations. You will be the primary architect of how Funga stores, moves, and accesses its data. This is a role for a builder who prefers simple, robust, accessible systems that are tailored to the teams who rely on them. This is a rare opportunity to have total technical ownership over the data systems that will scale our impact globally.

Requirements

4+ years of experience in Data or Backend Engineering, with a track record of building production data systems from the ground up.
Database expertise: Deep SQL proficiency. You understand indexing, partitioning, and optimizing Postgres for high performance.
Cloud Infrastructure: Significant experience building, managing, and optimizing data services within AWS or GCP (e.g. ECS, Lambda, RDS) with a focus on cost-effective, maintainable architecture.
Modern DevOps & Python: Proficiency in production-grade Python and modern DevOps practices, including containerization (Docker), CI/CD pipelines, and infrastructure monitoring.
Data fluency: Experience handling large-scale Parquet, Avro, binary, or simliar data types.
Product mindset: You treat internal teams as your customers. You are driven to build solutions that serve and accelerate internal teams' work.
Pragmatic: You prefer simple, mature, composable tools over heavy, managed platforms, and you have a bias for creating value now versus perfection later.

Nice To Haves

Familiarity with geospatial data types and tools (PostGIS, GDAL) is a plus.

Responsibilities

Architect Core Systems & Cloud
Own the Stack: Architect and maintain our central storage (PostgreSQL/PostGIS, SQLite) and cloud environment (AWS/GCP), leveraging ECS, Lambda, and S3.
Modern DevOps: Standardize environments using Docker, CI/CD pipelines, and Infrastructure as Code to automate the testing and deployment of data services.
Scale for Performance: Optimize data models and database performance management for extensibility as our genomic and geospatial inputs scale.
Manage the Data Lifecycle
Build Lean Ingest: Design and automate scalable ELT/ETL pipelines for genomic, geospatial, and tabular data from sources like Survey123, ArcGIS, and Asana.
QA/QC at Scale: Build automated validation pipelines to ensure data integrity and version control from the moment it hits our system.
Enable the Mission
Internal Enablement: Support scientists and operational teams by designing the data models that power internal modeling workflows, dashboards, and reporting.
System Connectivity: Develop lightweight APIs and connectors to sync data between our core infrastructure and downstream applications (e.g. Asana, ArcGIS, and internal dashboards).