Data Engineer II

Knit Health•San Francisco, CA

About The Position

Knit Health is building a novel clinical foundation model to improve the way healthcare is delivered. We combine expertise in AI with deep clinical knowledge to develop safe, trustworthy systems that improve care, expand access, and reduce waste. Knit is led by a founding team from the University of California Berkeley who have developed a novel AI architecture which learns to reason like physicians. We’re now closing the loop and using our novel foundation model, together with frontier clinical LLMs, to build a next generation clinical intelligence platform for providers. We are venture backed and have partnered with multiple US-based health systems and data providers.

Requirements

4-6 years of professional experience specifically in data engineering (building data pipelines, ETL/ELT workflows, data modeling, and warehouse architectures)
An advanced degree (MS or PhD) in data science, computer science, computer engineering, or an adjacent technical discipline, paired with demonstrable data engineering project work
A combination of internships, research, and substantial project experience that clearly demonstrates equivalent data engineering capability
Proficiency in SQL and Python
Hands-on experience with at least one major cloud platform (Azure, AWS, etc.)
Engineering fundamentals: comfort with version control (Git), code review, testing, and the habits of writing code others can read, maintain, and trust
SQL: strong command of joins, window functions, CTEs, and aggregate logic; a basic understanding of query performance and when to worry about it
Python: fluency writing clean, modular code for data manipulation, transformation, and scripting; familiarity with common libraries such as pandas and at least one testing framework (pytest or similar)
ML data processing: An understanding of basic machine learning and AI concepts as well as an understanding of the typical AI/ML data workflows.
Spark / distributed processing: working familiarity with PySpark and an understanding of how distributed compute differs from single-machine workflows
Cloud platforms: hands-on experience with at least one major cloud provider; Azure and Databricks preferred, but strong experience with AWS or GCP translates
Data engineering concepts: a solid grounding in batch and streaming processing, data modeling, orchestration, data quality, governance, and database fundamentals (both relational and columnar)
Communication: the ability to explain technical tradeoffs clearly, in writing and in conversation, to both engineers and non-engineers
Healthcare: Prior exposure to healthcare data or the healthcare domain more broadly

Nice To Haves

Familiarity with healthcare interoperability standards such as FHIR and HL7
Awareness of healthcare privacy and compliance frameworks (HIPAA, BAAs, and similar)
An eye for compute cost structures and the instincts to build with efficiency in mind

Responsibilities

Design, build, and maintain the data pipelines and infrastructure that power both our product and research applications — from ingestion through analytics-ready delivery
Partner closely with our data science and ML teams to integrate, structure, and scale the stack as our needs evolve
Help establish and uphold standards for data quality, testing, documentation, and observability across the stack
Navigate the complex and often ambiguous landscape of healthcare data, bringing clarity, organization, and thoughtful structure to messy problem spaces
Contribute to architectural decisions that will shape how we work with data at scale

Benefits

medical, dental, and vision coverage with 100% of premiums paid for employees and dependents
coverage begins on the first day of employment
401(k) plan
24 days of PTO annually

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume