Data Engineer

Pavago

16d•Remote

About The Position

Our client is seeking a Data Engineer to design, build, and maintain scalable data infrastructure and reliable data pipelines that power analytics, reporting, and operational decision-making across the business. This role requires strong software engineering fundamentals, deep experience with modern data stacks, and a passion for building clean, reliable, and high-performance data systems. The Data Engineer will ensure data flows seamlessly from source systems into warehouses, dashboards, and downstream applications while maintaining high standards for quality, governance, and scalability. The ideal candidate is analytical, detail-oriented, and comfortable working across engineering, analytics, and business teams to deliver trustworthy and actionable data.

Requirements

3+ years of experience in Data Engineering, Back-End Engineering, or Data Infrastructure roles
Strong proficiency in Python and SQL
Experience with at least one modern data warehouse (Snowflake, Redshift, BigQuery)
Hands-on experience with orchestration tools such as Airflow or Prefect
Strong understanding of ETL/ELT pipelines, data modeling, and data transformation workflows
Familiarity with cloud platforms such as AWS, GCP, or Azure

Nice To Haves

Experience with dbt for data modeling and transformation management
Streaming and event-driven data pipeline experience (Kafka, Kinesis, Pub/Sub)
Experience with cloud-native data services such as AWS Glue, GCP Dataflow, or Azure Data Factory
Familiarity with Docker, Kubernetes, Terraform, or CI/CD workflows
Background in regulated industries such as healthcare, fintech, or enterprise SaaS
Experience optimizing warehouse costs and query performance at scale

Responsibilities

Build, maintain, and optimize ETL/ELT pipelines using Python, SQL, or Scala
Orchestrate workflows using Airflow, Prefect, Dagster, or similar orchestration tools
Ingest structured and unstructured data from APIs, SaaS platforms, databases, files, and streaming systems
Develop scalable connectors and automated ingestion workflows
Manage and optimize cloud data warehouses such as Snowflake, BigQuery, or Redshift
Design scalable schemas using star and snowflake modeling techniques
Implement partitioning, clustering, indexing, and performance optimization strategies
Build clean, analytics-ready datasets for business intelligence and reporting use cases
Implement validation checks, anomaly detection, logging, and monitoring to ensure data integrity
Enforce naming conventions, lineage tracking, and documentation standards using tools such as dbt or Great Expectations
Maintain audit-ready data processes and ensure compliance with GDPR, HIPAA, or industry-specific requirements
Monitor pipeline health and proactively resolve failures or inconsistencies
Build and manage real-time data pipelines using Kafka, Kinesis, Pub/Sub, or similar platforms
Support low-latency ingestion and event-driven architectures for time-sensitive applications
Monitor streaming infrastructure and optimize throughput and reliability
Partner closely with analysts, data scientists, and business stakeholders to deliver reliable datasets
Support dashboard and reporting initiatives across Tableau, Looker, or Power BI
Translate business requirements into scalable data solutions and models
Maintain clear technical documentation for pipelines, schemas, and workflows
Containerize data services using Docker and manage deployments through Kubernetes when applicable
Automate deployments using CI/CD pipelines such as GitHub Actions, Jenkins, or GitLab CI
Manage cloud infrastructure using Terraform, CloudFormation, or similar Infrastructure-as-Code tools
Continuously optimize performance, scalability, reliability, and cloud costs