AI/ML Data Engineer

Marvell TechnologySanta Clara, CA
$105,200 - $157,600

About The Position

Embedded within the AI/ML team, this role owns the data engineering layer that powers both Gen AI applications and ML model development. Responsible for building production-grade pipelines, curating AI-ready datasets for LLMs and ML models, and contributing to front-end interfaces when required — ensuring the team can deliver complete, data-driven AI products without external dependency.

Requirements

  • Databricks — Spark, Delta Lake, Databricks Workflows, Unity Catalog; production-grade experience required
  • Snowflake — advanced SQL, data modeling, performance tuning, cost management
  • Python — strong engineering fundamentals; PySpark, pandas, pipeline frameworks (dbt, Airflow, or equivalent)
  • SQL — expert level; complex transformations, query optimization, schema design
  • Front-End Development — React, JavaScript/TypeScript, REST API integration, and Streamlit for rapid AI/ML application prototyping and internal tooling
  • Solid understanding of ML lifecycle — feature stores, training pipelines, inference data patterns
  • Cloud-native experience on AWS, Azure, or GCP
  • Data quality and observability tooling

Nice To Haves

  • Hands-on experience with MLflow, Feast, LangChain, or LlamaIndex
  • Exposure to graph databases (Neo4j, Neptune, or equivalent)
  • Exposure to vector databases (Pinecone, Weaviate, pgvector, or equivalent)
  • Experience with streaming pipelines (Kafka, Kinesis, Spark Structured Streaming)
  • Familiarity with LLM evaluation frameworks and dataset benchmarking

Responsibilities

  • Architect and deliver production-grade ELT/ETL pipelines across Databricks and Snowflake for ML training, validation, and inference workflows
  • Build and maintain AI-ready datasets optimized for both ML model consumption and Gen AI use cases — clean, versioned, and reproducible
  • Curate and structure high-quality datasets for RAG pipelines and embedding generation; design document chunking strategies, metadata schemas, and grounding data layers that directly improve retrieval accuracy and Gen AI application performance
  • Implement data quality frameworks and data contracts at pipeline boundaries to protect model and application integrity
  • Build and manage vector-ready data assets, integrating with vector stores and embedding infrastructure for Gen AI applications
  • Establish DataOps best practices — CI/CD for pipelines, data lineage, versioning, and cost observability across platforms
  • Develop Streamlit applications and React-based UIs to surface model outputs, data products, and internal AI tooling
  • Partner with ML Engineers, Data Scientists, and AI Engineers to translate modeling and application requirements into reliable data products
  • Contribute to lakehouse architecture decisions, storage optimization, and compute efficiency across the AI/ML data platform

Benefits

  • employee stock purchase plan with a 2-year look back
  • family support programs
  • robust mental health resources
  • recognition and service awards
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service