Data Architect (Data Platform)

RevSpring•Boston, MA

47d

About The Position

As a Data Architect for healthcare applications, you are responsible for innovating, designing and managing scalable, secure, and interoperable data systems that support clinical, operational, and financial workflows. This role focuses on structuring complex healthcare data from electronic health records (EHRs) to healthcare financial data into cohesive architectures that enable accurate reporting, analytics, and patient care insights. This role will ensure compliance with healthcare regulations such as HIPAA, implement industry standards like HL7 and FHIR for seamless data exchange, and establish strong data governance, quality, and security practices such as HITRUST. By aligning data strategy with organizational goals, this role plays a critical part in improving data accessibility, reliability, and ultimately patient outcomes.

Requirements

Fluent in conceptual, logical, and physical data modeling, including understanding normalization vs. denormalization, dimensional modeling (star/snowflake schemas), and designing for scalability and performance.
Deep knowledge of both relational and non-relational systems, including familiarity with data lakes, lakehouses, and distributed storage systems/warehouses (e.g., S3, Delta Lake, BigQuery).
Designing pipelines that move and transform data reliably, including experience with ETL/ELT tools (DBT), streaming systems (Kafka, Kinesis), and orchestration frameworks (Airflow, etc.) with the ability to understand batch vs. real-time tradeoffs.
Expertise in indexing strategies, partitioning, query tuning, and workload management.
Ability to architect for scale, resiliency and business continuity.
12+ years of software engineering experience, including hands-on technical experience building, maintaining and scaling data systems.
5+ years of experience as a tech lead who successfully converts business / product requirements into well-architected designs.
Extensive experience in building and scaling large data pipelines including real-time processing and / or 100+ GB transformation in Java, Python, DBT, and SQL.
Extensive experience in building and driving large business outcomes by leveraging a combination of existing and new technologies.
A deep knowledge of common data technology stacks such as GCP BigQuery, Snowflake, Databricks, DBT, Datalake architecture on AWS S3 or GCP Cloud storage.
A deep knowledge in cloud platforms such as AWS, GCP, or Azure, and cloud-native API solutions.
Deep knowledge of data modeling and data governance control.
Strong RESTful API design principles, microservices architecture, distributed asynchronous systems and good design patterns.
Strong knowledge with CI/CD pipelines (CircleCI, Github Action), containerization (Docker, Kubernetes), and version control (Git), infrastructure as code (Pulumi, Terraform), Relational and NoSQL databases, caching mechanisms (Redis, Memcached), and performance optimization techniques.
Strong leadership and communication skills, with the ability to influence cross-functional teams and communicate complex technical details to upper management and non-technical stakeholders.
Proven problem-solving ability with a focus on delivering solutions.
Ability to read, analyze and interpret general business periodicals, professional journals, technical procedures or governmental regulations.
Ability to write reports, business correspondence and procedure manuals.
Ability to effectively present information and respond to questions from a variety of both internal and external sources.

Responsibilities

Innovating, designing and managing scalable, secure, and interoperable data systems that support clinical, operational, and financial workflows.
Structuring complex healthcare data from electronic health records (EHRs) to healthcare financial data into cohesive architectures that enable accurate reporting, analytics, and patient care insights.
Ensuring compliance with healthcare regulations such as HIPAA.
Implementing industry standards like HL7 and FHIR for seamless data exchange.
Establishing strong data governance, quality, and security practices such as HITRUST.
Aligning data strategy with organizational goals.
Defining the future data architecture.
Determining how and where to reduce technical debt.
Identifying how to enable analytics insights, incorporate AI, and drive self-service.
Defining and evolving data architectures that support AI/ML workloads, including curated training datasets, feature stores, and scalable pipelines for batch and real-time inference.
Defining and evolving data architectures that leverage AI to drive greater operational efficiency, reduce system complexity, and accelerate the ingestion and processing of healthcare data across platforms.
Designing scalable pipelines and platforms (e.g., lakehouse, streaming, feature stores) that enable faster data availability for AI-driven insights and real-time decision support.