Data Engineer

Space Telescope Science InstituteBaltimore, MD
Hybrid

About The Position

The Space Telescope Science Institute (STScI), operated by the Association of Universities for Research in Astronomy (AURA), is NASA’s science operations center for missions including Roman, Hubble and James Webb Space Telescopes. STScI leads observation planning, data analysis, public engagement, and data archiving for flagship missions. We are seeking a Junior Data Engineer who is a good team player, problem solver, critical thinker, and quick learner. You will help build, maintain, and support performant and scalable databases for analytics, reporting, and scientific applications using PostgreSQL. You will work closely with the team to develop and maintain reliable data pipelines using Python and Apache Airflow. Some hands-on experience with AWS is required. You will also assist in optimizing our Massively Parallel Processing (MPP) database system to ensure fast and reliable data access for the Mikulski Archive for Space Telescopes (MAST) - one of the world’s most advanced public astronomical data archives. This position supports hybrid work. Candidates must reside in or be willing to relocate to our local market (MD, DE, VA, PA, DC, or WV). U.S. Citizenship or Permanent Residence is required to meet ITAR requirements.

Requirements

  • Good knowledge of PostgreSQL (Greenplum is a plus) and SQL optimization.
  • Experience with Apache Airflow for building data pipelines.
  • Hands-on experience with AWS cloud services.
  • Strong Python and SQL skills.
  • Basic understanding of Kubernetes and Terraform.
  • Bachelor’s or Master’s degree in computer science, Information Technology, or a related discipline
  • 5+ years of professional experience in Linux-based environments with expertise in data engineering and data management

Nice To Haves

  • Familiarity with Trino or Apache Iceberg.
  • Experience with CI/CD pipelines.
  • Exposure to Lakehouse architectures

Responsibilities

  • Support performance tuning, maintenance, and operations of PostgreSQL and Greenplum databases.
  • Build, monitor, and troubleshoot data pipelines using Apache Airflow.
  • Assist the platform team with deploying and managing workloads on Kubernetes and Infrastructure as Code (Terraform).
  • Help maintain and improve data systems including relational databases and Parquet storage.
  • Work with scientists and cross-functional teams to implement data solutions.
  • Participate in monitoring, alerting, and ensuring data reliability and accuracy.

Benefits

  • Employer retirement contribution – direct STScI contribution of 10% of your salary from your first day
  • 12 days sick leave, up to 24 days’ vacation, and 10 paid holidays
  • Flexible work schedule with healthy work/life balance
  • Comprehensive medical/dental/vision/prescription plans, and more!
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service