About The Position

We are seeking an AI Systems & Data Engineer to join our team. We are building a fast, flexible, and complex platform with a robust, event-driven architecture. This role requires expertise in building data pipelines within the Databricks environment, specifically for ingesting unstructured data, and leveraging that data to build AI agents. You’ll be a crucial member of rolling out products that will have immediate impact.

Requirements

  • 5-7 years of experience building production-grade ML, data, or AI systems.
  • Strong grasp of prompt engineering, context construction, and retrieval design.
  • Comfortable working in LangChain and building agents.
  • Experience with PySpark and Databricks to handle real-world data scale.
  • Ability to write testable, maintainable Python with clear structure.
  • Understanding of model evaluation, observability, and feedback loops.
  • Excited to push from prototype → production → iteration.
  • Familiarity with Databricks Data Intelligence Platform which unifies data warehousing and AI use cases on a single platform.
  • Knowledge of Unity Catalog for open and unified governance of data, analytics, and AI on the lakehouse.
  • Understanding of data security concerns related to AI and how to mitigate them using the Databricks AI Security Framework (DASF).
  • Confident English skills to collaborate clearly and effectively with teammates

Nice To Haves

  • Have built scalable agent-like workflows on the Databricks platform.
  • Have worked on semantic chunking, vector search, or hybrid retrieval strategies.
  • Can walk us through a real-world prompt failure and how you fixed it.
  • Have contributed to OSS tools or internal AI platforms.
  • Think of yourself as both an engineer and a systems designer.
  • Are familiar with the concept of a data lakehouse architecture.

Responsibilities

  • Design and operate Databricks pipelines in Python to ingest and normalize large-scale unstructured data
  • Build streaming and batch ingestion using Auto Loader, Delta Live Tables, and Workflows
  • Model and maintain AI-ready lakehouse tables with Delta Lake and Unity Catalog
  • Prepare retrieval and context datasets for RAG and agent systems
  • Orchestrate Temporal-based workflows to coordinate data prep, validation, and AI handoff
  • Enforce data quality, lineage, and access controls across pipelines
  • Optimize PySpark jobs for performance, reliability, and cost
  • Integrate pipeline outputs into production AI systems and APIs
  • Monitor freshness, schema drift, and pipeline health

Benefits

  • Flexible hours
  • async-friendly culture
  • engineering-led environment
  • competitive comp
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service