AI Data Engineer

SchonfeldNew York, NY
1d

About The Position

Schonfeld Strategic Advisors is seeking an experienced AI Data Engineer to join our Data Engineering team. In this role, you will be responsible for designing, building, and maintaining robust data pipelines that power SchonAI, our firm's internal AI platform. You will work at the intersection of data engineering and AI, ensuring that high-quality, timely, and relevant data flows seamlessly to our AI systems to support investment professionals across the firm.

Requirements

  • Programming: Strong proficiency in Python; experience with SQL and at least one other language (e.g. Java, Scala, Go, Rust)
  • Data Engineering: 5+ years of experience building production data pipelines using tools like Apache Airflow, Prefect, Dagster, or similar
  • Big Data Technologies: Hands-on experience with distributed computing frameworks (Spark, Flink) and modern data platforms
  • Cloud Platforms: Proficiency with AWS services (S3, Kubernetes) or equivalent GCP services
  • Databases: Experience with both SQL (PostgreSQL, MySQL) and NoSQL databases (MongoDB, DynamoDB, Elasticsearch)
  • AI/ML Data: Understanding of data requirements for ML/AI systems, including experience with vector databases (Pinecone, Weaviate, Qdrant) and embedding pipelines
  • Bachelor's or Master's degree in Computer Science, Data Engineering, or related technical field
  • Strong problem-solving skills and attention to detail
  • Excellent communication skills with ability to translate technical concepts for non-technical stakeholders
  • Experience working in fast-paced, collaborative environments
  • Self-motivated with ability to manage multiple priorities

Nice To Haves

  • Experience building data pipelines for LLM applications or RAG (Retrieval Augmented Generation) systems
  • Familiarity with financial data sources (market data, fundamental data, alternative data)
  • Knowledge of data streaming technologies (Kafka, Kinesis, Pub/Sub)
  • Experience of Analytics/Warehouse/OLAP DB (BigQ, SingleStore, RedShift, ClickHouse)
  • Experience with containerization (Docker) and orchestration (Kubernetes)
  • Understanding of MLOps practices and tools
  • Experience with data quality frameworks (Great Expectations, Deequ)

Responsibilities

  • Data Pipeline Development
  • Design and build scalable, reliable data pipelines to ingest, transform, and deliver structured and unstructured data to SchonAI using Prefect.
  • Develop ETL/ELT processes for diverse data sources including market data, research documents, internal databases, and third-party APIs.
  • Implement real-time and batch data processing workflows to meet varying latency requirements.
  • Ensure data quality, consistency, and integrity across all pipelines.
  • AI Data Infrastructure
  • Build and maintain data infrastructure optimized for AI/ML workloads, including vector databases and semantic search systems.
  • Design data schemas and storage solutions that support efficient retrieval and processing for LLM applications.
  • Implement data versioning, lineage tracking, and observability for AI training and inference pipelines.
  • Optimize data delivery for low-latency AI interactions and high-throughput batch processing.
  • Integration & Collaboration
  • Partner with AI engineers, software developers, and data scientists to understand data requirements.
  • Integrate with existing firm systems including risk platforms, trading systems, portfolio management tools, and research databases.
  • Collaborate with infrastructure teams on cloud architecture, security, and compliance requirements.
  • Work closely with business stakeholders to prioritize data sources and pipeline enhancements.
  • Data Governance & Security
  • Implement appropriate data access controls, encryption, and compliance measures.
  • Ensure adherence to data governance policies and regulatory requirements.
  • Monitor and maintain data pipeline performance, reliability, and cost efficiency.
  • Document data flows, transformations, and dependencies.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service