About The Position

We are seeking a Senior Machine Learning Engineer / Platform Engineer to design and build a production-grade agentic workflow platform. This role sits at the intersection of LLM systems engineering, distributed platforms, and applied ML, with a strong emphasis on orchestration, reliability, and extensibility. You will be responsible for architecting and implementing agent-based workflows that integrate large language models, retrieval systems, structured knowledge, and external APIs—designed for robustness, observability, and real-world business use. Design and implement multi-agent and single-agent workflows using orchestration patterns and tools, context engineering, memory management, and guardrail strategies. Design RAG pipelines incorporating vector search, hybrid retrieval, and citation tracking. Implement knowledge graph–backed reasoning, including ontologies, entity resolution and graph-based context construction. Design evaluation frameworks for agent task completion correctness, quality, cost, and latency. Develop and deploy machine learning models, focusing on production readiness, scalability, and performance. Collaborate with data scientists to transition experimental models into robust, production-grade applications. Integrate with collaboration platforms (e.g., Teams, alerting systems) for intelligent distribution of insights. Implement and manage CI/CD pipelines to automate deployment, testing, and monitoring of models. Architect and deploy systems on AWS, leveraging compute, storage and security services

Requirements

  • Bachelor’s or master’s degree in computer science, Engineering, or related field.
  • 6+ years of experience in software engineering, ML engineering, or platform engineering.
  • Strong proficiency in writing production-grade Python, and experience with Claude Code or Cursor.
  • Hands-on experience with LLM-based systems, including: LangChain / LangGraph MCP Langsmith Claude or comparable frontier models AWS AgentCore or comparable agentic frameworks
  • Solid understanding of RAG architectures, embeddings, and vector search.
  • Experience designing and consuming APIs (REST and/or async/event-driven).
  • Strong cloud engineering experience on AWS.
  • Knowledge of how to fine-tune frontier models to specific domain knowledge
  • Experience deploying traditional machine learning models into production environments using MLOps tools and best practices.
  • Knowledge of distributed systems, large-scale model optimization, and API development.
  • Exceptional ability to work on a team – especially a dynamic, innovative “tiger team” developing early stage PoC systems.
  • Strong understanding of container orchestration and cloud-native application design.
  • Ability to work in dynamic environments, handling rapid experimentation and iterative development.

Nice To Haves

  • Experience with distillation, quantization and small language models is a plus

Responsibilities

  • Design and implement multi-agent and single-agent workflows using orchestration patterns and tools, context engineering, memory management, and guardrail strategies.
  • Design RAG pipelines incorporating vector search, hybrid retrieval, and citation tracking.
  • Implement knowledge graph–backed reasoning, including ontologies, entity resolution and graph-based context construction.
  • Design evaluation frameworks for agent task completion correctness, quality, cost, and latency.
  • Develop and deploy machine learning models, focusing on production readiness, scalability, and performance.
  • Collaborate with data scientists to transition experimental models into robust, production-grade applications.
  • Integrate with collaboration platforms (e.g., Teams, alerting systems) for intelligent distribution of insights.
  • Implement and manage CI/CD pipelines to automate deployment, testing, and monitoring of models.
  • Architect and deploy systems on AWS, leveraging compute, storage and security services
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service