Data Engineer I

University of Texas at Austin•Austin, TX

1d•Onsite

About The Position

The Data Engineer is responsible for expanding and optimizing the healthcare system’s data and data pipeline architecture, as well as optimizing data flow and collection for cross-functional teams. This role designs, builds, and maintains scalable data infrastructure to support clinical, operational, and strategic decision-making. Reporting to the Director of Data Intelligence and Decision Science, the Data Engineer collaborates with data scientists, analysts, software engineers, and clinical informatics teams. This position ensures data quality, security, and accessibility by integrating data from different sources such as EHRs, medical devices, financial systems, and external partners. The Data Engineer is critical to enabling predictive analytics, population health management, and regulatory compliance.

Requirements

Requires a Bachelor's Degree in Computer Science, Information Systems, Engineering, Statistics, or a related field with at least 2 year(s) of experience in data engineering, architecture, or ETL development.
Proficiency with big data tools (e.g., Hadoop, Spark, Kafka).
Experience with both SQL and NoSQL databases.
Skilled in data pipeline and workflow management tools.
Familiarity with AWS services, such as EC2, EMR, RDS, Redshift, Glue, DynamoDB.
Programming/scripting experience in Python, Java, C++, Scala, or similar.
Technical Learning: Quickly learns new technical skills and knowledge; is good at learning new industry, company, product, or technical knowledge. Adopts new data tools and frameworks with minimal supervision. Learns and applies healthcare-specific data standards (e.g., HL7, FHIR). Keeps current with cloud platform updates and best practices.
Problem Solving: Uses rigorous logic and methods to solve difficult problems with effective solutions. Diagnoses root causes of data pipeline failures. Designs scalable solutions for complex data integration challenges. Applies statistical methods to validate data quality.
Functional/Technical Skills: Possesses the functional and technical knowledge and skills to do the job at a high level of accomplishment. Writes efficient SQL and Python code for data processing. Configures cloud infrastructure for data workloads. Implements secure and compliant data architectures.
Dealing with Ambiguity: Copes with change effectively; can shift gears comfortably; can decide and act without having the total picture. Designs flexible data models for evolving clinical needs. Navigates incomplete or inconsistent data sources. Adapts to shifting priorities in fast-paced environments.
Collaborates: Works effectively with others to achieve shared goals; actively listens and communicates openly. Partners with clinicians to understand data needs. Participates in cross-functional agile teams. Resolves conflicts between technical and business priorities.
Strategic Agility: Sees ahead clearly; can anticipate future consequences and trends accurately. Designs data systems that scale with organizational growth. Aligns data engineering efforts with enterprise analytics strategy. Anticipates regulatory changes and prepares infrastructure accordingly.

Nice To Haves

Master's Degree in Data Engineering, Computer Science, or related field with at least 5 year(s) of experience in healthcare data engineering or analytics.
Advanced SQL skills and hands-on relational database work.
Expertise in building and optimizing big data pipelines using Python.
Experience managing data transformation, metadata, dependencies, and workload orchestration.
Understanding of message queuing, stream processing, and scalable data storage systems.
Strong project management skills.
AWS Certified Data Analytics
Certified Health Data Analyst (CHDA)
Project Management Professional (PMP)

Responsibilities

Designs and Maintains Data Pipelines: Creates and maintains optimal data pipeline architecture for structured and unstructured healthcare data. Assembles large, complex data sets that meet functional and non-functional business requirements. Builds scalable ETL/ELT pipelines using SQL and AWS big data technologies. Optimizes pipeline performance for latency, throughput, and fault tolerance. Ensures pipelines comply with HIPAA and other regulatory standards.
Develops and Manages Data Infrastructure: Builds infrastructure for optimal extraction, transformation, and loading of data from diverse sources. Creates and maintains data lakes, warehouses, and marts using platforms like Snowflake, Redshift, or BigQuery. Configures cloud-based storage and compute environments (AWS, Azure, GCP). Implements schema design, indexing, and partitioning strategies. Ensures high availability and disaster recovery protocols.
Enables Analytics and Data Science: Creates data tools for analytics and data science teams to build and optimize data products. Develops reusable components for reporting and dashboarding tools. Builds data models and views for use by analysts and data scientists. Enables self-service analytics through curated datasets. Collaborates with stakeholders to define KPIs and metrics.
Improves Internal Processes and Scalability: Identifies, designs, and implements internal process improvements. Automates manual processes and optimizes data delivery. Re-designs infrastructure for greater scalability and performance. Refactors legacy systems for maintainability. Implements CI/CD pipelines for data workflows.
Collaborates Across Teams: Works with stakeholders including Executive, Product, Data, and Design teams to support data infrastructure needs. Translates business requirements into technical specifications. Provides mentorship to junior data engineers. Communicates technical concepts to non-technical stakeholders. Supports cross-functional initiatives and agile squads.
Ensures Data Governance and Security: Keeps data separate and secure, following all relevant data governance and security protocols. Implements data validation, anomaly detection, and cleansing routines. Collaborates with data governance teams to enforce policies. Audits data for completeness, accuracy, and timeliness. Supports data stewardship and master data management initiatives.
Conducts training sessions for analysts and clinical staff on data tools.
Participates in vendor evaluations and proof-of-concept projects.
Supports data integration for mergers, acquisitions, or new service lines.
Assists in disaster recovery drills and business continuity planning.
Contributes to grant proposals or research initiatives requiring data support.
Performs related duties as required.