Sr Data Science Engineer

LegitScript
Hybrid

About The Position

You'll own the full lifecycle — from raw data ingestion to model deployment to measuring real-world business impact — with a current focus on building a sophisticated risk detection system using LLMs, Generative AI techniques, and classical ML within our SaaS platform. This is not a pure engineering role or a pure research role. You'll need both, and you'll need to move fluidly between them.

Requirements

  • 5–8+ years spanning data engineering and data science/ML, with a demonstrated track record of shipping models to production
  • Strong Python proficiency; experience with Spark/PySpark for large-scale data processing
  • Advanced SQL for complex transformation, analysis, and data modeling
  • Hands-on experience with cloud data platforms such as Databricks or Snowflake
  • Experience with ETL/ELT frameworks — dbt, Lakeflow Declarative Pipelines, Databricks Autoloader, Informatica, or similar
  • Familiarity with ML experiment tracking tools such as MLflow or Weights & Biases
  • DevOps fluency: Git-based development, branching strategies, CI/CD, IaC (DABs/Terraform), and Docker
  • Experience with orchestration tools such as Databricks Workflows or Apache Airflow

Nice To Haves

  • Hands-on experience with LLMs and Generative AI techniques in a production context (prompt engineering, RAG architectures, fine-tuning, or evaluation frameworks)
  • Experience building or operating ML platforms, feature stores, or model registries
  • Prior work in risk, compliance, fraud detection, or other high-stakes ML domains

Responsibilities

  • Research, prototype, and develop ML and LLM-based models to solve complex business problems, with a current focus on risk detection and prioritization
  • Wrap models into production-ready APIs and integrate them into our core product
  • Ensure model outputs are interpretable — translating predictions into actionable reason codes for end users
  • Partner directly with operational teams to gather feedback, refine features, and improve model relevance over time
  • Design, build, and maintain scalable pipelines to ingest data from disparate sources into our data warehouse/lake
  • Implement robust data validation, quality checks, and transformation workflows across raw, curated, and serving layers
  • Build and maintain curated datasets optimized for both analytics and model training use cases
  • Implement and maintain CI/CD pipelines for both data workflows and ML model deployment across environments
  • Monitor pipeline latency, data drift, and model performance in production; design alerting and retraining triggers
  • Own the business outcomes of your models — define success metrics, track ROI, and iterate based on real-world efficacy
  • Manage infrastructure as code and containerized deployments to ensure reproducible, environment-consistent releases

Benefits

  • competitive compensation
  • flexible work options
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service