Sr Data Science Engineer

LegitScript

7d•Hybrid

About The Position

You'll own the full lifecycle — from raw data ingestion to model deployment to measuring real-world business impact — with a current focus on building a sophisticated risk detection system using LLMs, Generative AI techniques, and classical ML within our SaaS platform. This is not a pure engineering role or a pure research role. You'll need both, and you'll need to move fluidly between them.

Requirements

5–8+ years spanning data engineering and data science/ML, with a demonstrated track record of shipping models to production
Strong Python proficiency; experience with Spark/PySpark for large-scale data processing
Advanced SQL for complex transformation, analysis, and data modeling
Hands-on experience with cloud data platforms such as Databricks or Snowflake
Experience with ETL/ELT frameworks — dbt, Lakeflow Declarative Pipelines, Databricks Autoloader, Informatica, or similar
Familiarity with ML experiment tracking tools such as MLflow or Weights & Biases
DevOps fluency: Git-based development, branching strategies, CI/CD, IaC (DABs/Terraform), and Docker
Experience with orchestration tools such as Databricks Workflows or Apache Airflow

Nice To Haves

Hands-on experience with LLMs and Generative AI techniques in a production context (prompt engineering, RAG architectures, fine-tuning, or evaluation frameworks)
Experience building or operating ML platforms, feature stores, or model registries
Prior work in risk, compliance, fraud detection, or other high-stakes ML domains

Responsibilities

Research, prototype, and develop ML and LLM-based models to solve complex business problems, with a current focus on risk detection and prioritization
Wrap models into production-ready APIs and integrate them into our core product
Ensure model outputs are interpretable — translating predictions into actionable reason codes for end users
Partner directly with operational teams to gather feedback, refine features, and improve model relevance over time
Design, build, and maintain scalable pipelines to ingest data from disparate sources into our data warehouse/lake
Implement robust data validation, quality checks, and transformation workflows across raw, curated, and serving layers
Build and maintain curated datasets optimized for both analytics and model training use cases
Implement and maintain CI/CD pipelines for both data workflows and ML model deployment across environments
Monitor pipeline latency, data drift, and model performance in production; design alerting and retraining triggers
Own the business outcomes of your models — define success metrics, track ROI, and iterate based on real-world efficacy
Manage infrastructure as code and containerized deployments to ensure reproducible, environment-consistent releases