Data Scientist

Arlo•New York, NY

About The Position

Arlo is rebuilding health insurance from the ground up using AI. The healthcare experience today is expensive, confusing, and often so frustrating that people delay the care they need. We’re changing that by reimagining what a health plan should be: a proactive partner that enables health rather than denying it. Our AI-native platform delivers continuous, personalized support for members—helping them navigate benefits, schedule appointments, access high-quality care, and avoid financial fear. Powered by the industry’s most advanced risk-pricing engine, Arlo is already scaling fast: we’ve grown to $XXXM in premiums, cover tens of thousands of people, and see accelerating demand across brokers, employers, and partners. Backed by Upfront Ventures, 8VC, and General Catalyst, our team combines deep industry expertise (Palantir, YC) with the ambition to modernize a $1T market. About the role We use a proprietary risk model to price group health insurance policies from census and external claims data. While that foundation is strong, we know the external database has blind spots — and that other signals available at quoting time (prior rates, aggregate claims reports, self-reported conditions) can meaningfully sharpen our view of risk. This role sits at the center of four connected problems: learning from post-policy data to understand where our risk blindness lies, building automated quoting systems that act on softer signals, understanding how algorithmic changes ripple through to sales outcomes, and maintaining the integrity of the data pipelines that underpin all of it.

Requirements

3–5 years in a data science or quantitative analyst role
Proficiency in Python (scikit-learn, pandas, statsmodels) and SQL
Experience building and validating predictive models end-to-end
Comfort working with messy, inconsistent real-world datasets
Experience designing or auditing data pipelines for quality and consistency
Ability to communicate model behavior and business impact clearly to non-technical stakeholders

Nice To Haves

Background in insurance, actuarial data, or healthcare claims
Experience with GLMs, survival analysis, or credibility theory for pricing
Familiarity with group health underwriting or broker distribution models
Experience building confidence-based routing or triage systems
Exposure to sales funnel analytics or conversion attribution
Familiarity with MLflow, dbt, or similar tooling

Responsibilities

Emerging risk detection. Analyze early-period claims data to surface patterns that indicate risk was underestimated at quoting — and quantify the gap.
Blindness measurement. Build frameworks to systematically identify where the external claims database is incomplete or lagged, and estimate the magnitude of those gaps.
Adjustment factor modeling. Develop calibration factors that translate soft signals — prior rate history, aggregate claims, self-reported conditions — into rate adjustments layered on top of core model output.
Feedback loop infrastructure. Create pipelines that carry post-quoting learnings back into upstream models so calibration improves continuously.
Multi-signal fusion. Design models that ingest heterogeneous inputs and synthesize them into an enriched risk view that reduces reliance on manual review.
Confidence scoring. Build the logic that decides when a quote can be issued automatically versus routed to a human, minimizing queue volume without increasing adverse selection.
Threshold and rule design. Collaborate with underwriting to set and validate auto-issuance decision thresholds, and monitor their performance over time.
Algorithm impact attribution. When we adjust rating algorithms, tighten auto-quoting rules, or change data quality requirements, measure the downstream effect on quote volume, win rates, and sales conversion — and distinguish model-driven changes from market-driven ones.
Sales funnel diagnostics. Identify where quote kickouts, rate changes, or data submission friction are creating drop-off in the sales pipeline, and quantify the cost of each leakage point.
Data quality incentive analysis. Understand whether brokers and groups that provide richer data at submission (more complete census, aggregate claims, prior rates) achieve better pricing outcomes — and help us make that case externally.
Ingest delay profiling. Map the latency characteristics of our live policy data — claims, enrollment, eligibility — and identify which delays are structural versus operational, and where they introduce systematic bias into our models.
Consistency monitoring. Build checks and alerting for data inconsistencies across our ingestion layer: duplicate records, mismatched member IDs, enrollment timing gaps, and reporting lags from carriers or TPAs.
Upstream data partnerships. Work with engineering and data teams to document known data quality issues, prioritize fixes, and maintain a clear picture of what the data can and cannot reliably support at any given point.

Benefits

High ownership: You’ll get real responsibility from day one—our high-trust team empowers you to run with big problems and shape core parts of the company.
Join an important mission: Your work directly influences how people access care and improves lives at scale.
Growth & expansion: We’re moving fast, and as we grow, your scope will grow with us—new challenges, bigger opportunities, and rapid career velocity.
Apply AI to a problem that matters: Instead of optimizing ads or cutting labor costs, you’ll use AI to fundamentally reimagine how people get healthcare.
High pace, high collaboration: We operate with velocity, first-principles thinking, and a team that works closely, openly, and with ambition.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume