AI Systems & Data Engineer

HyperFiSan Francisco, CA
96d

About The Position

We are seeking an AI Systems & Data Engineer to join our team. We are building a fast, flexible, and complex platform with a robust, event-driven architecture. This role requires expertise in building data pipelines within the Databricks environment, specifically for ingesting unstructured data, and leveraging that data to build AI agents. You’ll be a crucial member of rolling out products that will have immediate impact.

Requirements

  • 5-7 years of experience building production-grade ML, data, or AI systems.
  • Strong grasp of prompt engineering, context construction, and retrieval design.
  • Comfortable working in LangChain and building agents.
  • Experience with PySpark and Databricks to handle real-world data scale.
  • Ability to write testable, maintainable Python with clear structure.
  • Understanding of model evaluation, observability, and feedback loops.
  • Excited to push from prototype → production → iteration.
  • Familiarity with Databricks Data Intelligence Platform which unifies data warehousing and AI use cases on a single platform.
  • Knowledge of Unity Catalog for open and unified governance of data, analytics, and AI on the lakehouse.
  • Understanding of data security concerns related to AI and how to mitigate them using the Databricks AI Security Framework (DASF).
  • Confident English skills to collaborate clearly and effectively with teammates

Nice To Haves

  • Have built scalable agent-like workflows on the Databricks platform.
  • Have worked on semantic chunking, vector search, or hybrid retrieval strategies.
  • Can walk us through a real-world prompt failure and how you fixed it.
  • Have contributed to OSS tools or internal AI platforms.
  • Think of yourself as both an engineer and a systems designer.
  • Are familiar with the concept of a data lakehouse architecture.

Responsibilities

  • Design and build data pipelines in Databricks for ingesting unstructured data.
  • Construct retrieval-augmented generation (RAG) systems from scratch using ingested data.
  • Build agentic LLM pipelines utilizing frameworks like LangChain, LangGraph, and LangSmith.
  • Own orchestration of PySpark and Databricks workflows to prepare inputs and track outputs for AI models.
  • Instrument evaluation metrics and telemetry to guide the evolution of prompt strategies.
  • Work alongside product, frontend, and backend engineers to tightly integrate AI into user-facing flows.
  • Leverage Databricks features such as Auto Loader for automatic detection of new files on cloud storage and schema changes.
  • Utilize Delta Lake for reliability, security, and performance on the data lake for streaming and batch operations.
  • Apply Databricks Workflows for orchestrating tasks to integrate data.
  • Implement Delta Live Tables for building reliable data pipelines with a declarative approach.

Benefits

  • Flexible hours
  • async-friendly culture
  • engineering-led environment
  • competitive comp
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service