Data & Software Engineer

Avalore, LLC•Chantilly, VA

About The Position

The Data & Software Engineer works with a small team to build complex data flows for a custom application. Successful candidate will have advanced Python programming skills, familiarity with Java, an understanding of data security, privacy, governance and compliance principles and a demonstrated history of building production data pipelines and ETL workflows at scale. Candidate must have experience: Building end-to-end data pipelines leveraging Python, Using orchestration tools to deploy data pipelines, including configuring and updating Spark Jobs, Containerizing and deploying applications in cloud environments like AWS, Working with MySQL and PostgreSQL including performance tuning, schema design, and query optimization for complex, analytical workloads, Leveraging industry standard tools for code control (Git, IaaC control, etc.), Working with data catalogs, tracking data lineage and handling a variety of data formats, including Geospatial, Using Bash scripting for automation and data processing tasks, Integrating Al/ML services and models.

Requirements

Advanced Python programming skills
Familiarity with Java
Understanding of data security, privacy, governance and compliance principles
Demonstrated history of building production data pipelines and ETL workflows at scale
Experience building end-to-end data pipelines leveraging Python
Experience using orchestration tools to deploy data pipelines, including configuring and updating Spark Jobs
Experience containerizing and deploying applications in cloud environments like AWS
Experience working with MySQL and PostgreSQL including performance tuning, schema design, and query optimization for complex, analytical workloads
Experience leveraging industry standard tools for code control (Git, IaaC control, etc.)
Experience working with data catalogs, tracking data lineage and handling a variety of data formats, including Geospatial
Experience using Bash scripting for automation and data processing tasks
Experience integrating Al/ML services and models
Minimum of 5 years' experience with Apache Spark & PySpark
Minimum of 5 years' experience with Advanced Python skills (including Pandas & NumPy)
Minimum of 5 years' experience with Docker, Podman
Minimum of 5 years' experience with AWS S3, Lambda & Step functions
Minimum of 5 years' experience with Apache Iceberg, Airflow, etc.
Minimum of 5 years' experience with SQL (with Trino)
Minimum of 5 years' experience with NoSQL, DynamoDB
Minimum of 5 years' experience with Unity Catalog OSS, Apache Polaris
Minimum of 5 years' experience with Apache Superset
Minimum of 5 years' experience with Terraform or CloudFormation
Minimum of 5 years' experience with OpenLineage
Minimum of 5 years' experience with H3, PostGIS

Responsibilities

Work with stakeholders to understand data requirements, assess feasibility, and design appropriate solutions with minimal oversight
Leverage strong problem-solving and debugging skills for data quality issues, pipeline failures, and performance bottlenecks
Leverage a background in large-scale data migration or platform modernization efforts
Contribute to data engineering documentation, best practices, and design patterns.