Data Scientist - Investment Team

Millennium•New York, NY

3d•$150,000 - $200,000•Onsite

About The Position

Millennium is hiring a Data Scientist to partner directly with our fundamental investment analysts and PMs within one of our trading teams. This person will (1) develop and curate high-quality data assets from the datasets we already license and generate, (2) produce actionable investment insights through rigorous statistical analysis, and (3) help define and execute our AI/LLM integration roadmap across the research and investment workflow. This is a highly applied role: you will ship data products, analyses, and tooling that materially improve how our investment teams source ideas, test theses, monitor positions, and communicate insights.

Requirements

4+ years of experience in data science (investment firm, data analytics firm, or software firm).
Strong Python skills (pandas/numpy/scipy/statsmodels) and Flask/Django, Tableau, Streamlit, or similar for application development
SQL proficiency and experience with data warehouses, specifically Snowflake (or equivalent modern warehouse experience with a quick ramp to Snowflake).
Familiarity with search/RAG, embeddings, vector databases, and evaluation of LLM systems.
Experience with cloud tooling and orchestration (e.g., Airflow/dbt) and versioning practices (Git).
Strong foundation in statistics / inference (e.g. Bayesian inference, regression, experimental design / quasi-experimental thinking).
Experience in fundamental investing workflows (healthcare a plus, not required).
Proactive, eager to learn; comfortable partnering with demanding end-users (PMs/analysts) and iterating quickly.

Responsibilities

Inventory, evaluate, and integrate internal and vendor datasets into usable, well-documented research-ready data assets with strong data quality controls.
Develop reusable data models, metrics, and entity mapping (e.g., company/product/provider mapping, time-series alignment, cross-dataset joins).
Translate PM/analyst questions into structured analyses and empirical tests; deliver conclusions clearly with caveats and sensitivities.
Apply statistics and econometrics (e.g., regressions, Bayesian methods, hypothesis testing, causal/Quasi-causal thinking) to evaluate signals and explain drivers.
Build monitoring and diagnostics for data drift, signal decay, and regime changes.
Prototype and productionize AI-enabled research process (e.g., summarization, extraction, classification, search, analyst copilots) and workflows with clear evaluation criteria (accuracy, hallucination risk, latency, cost, auditability).
Establish best practices for prompts, retrieval-augmented generation (RAG), tool use, and governance (PII, compliance, model risk).
Build lightweight internal tools and interfaces that make data and insights easy to consume (e.g., Flask/Django, dashboards, notebooks, Tableau).