About The Position

Reddit is a community of communities. It’s built on shared interests, passion, and trust, and is home to the most open and authentic conversations on the internet. Every day, Reddit users submit, vote, and comment on the topics they care most about. With 100,000+ active communities and approximately 121 million daily active unique visitors, Reddit is one of the internet’s largest sources of information. For more information, visit www.redditinc.com. Reddit is a community of communities where people can dive into anything through experiences built around their interests, hobbies, and passions. Our mission is to bring community, belonging, and empowerment to everyone in the world. Reddit users submit, vote, and comment on content, stories, and discussions about the topics they care about the most. From pets to parenting, with over 100,000 active communities and over 70 million daily active users, it is home to the most open and authentic conversations on the internet. For more information, visit redditinc.com. Who We Are: The Machine Learning Platform team at Reddit is a high-impact team that owns the infrastructure that powers recommendations, content discovery, user and content quantification, while directly impacting other teams such as Growth, Ads, Feeds, and Core Machine Learning teams. What You’ll Do: As a Senior Software Engineer, you will lead the development of a large-scale GenAI Platform at Reddit.

Requirements

  • 5+ years of experience in ML Engineering, AI Platform Engineering, or Cloud AI Deployment roles.
  • Have experience operating orchestration systems such as Kubernetes at scale
  • Deep experience with cloud-based technologies for supporting an ML platform, including tools like AWS, Google Cloud Storage, infrastructure-as-code (Terraform), and more
  • Proficiency with the common programming languages and frameworks of ML, such as Go, Python, etc.
  • Excellent communication skills with the ability to articulate technical AI concepts to non-technical stakeholders
  • Strong focus on scalability, reliability, performance, and ease of use. You are an undying advocate for platform users and have a deep intuition for the genAI product development lifecycle.

Nice To Haves

  • Strong knowledge of model serving, inference pipelines, monitoring, and observability for AI systems is a plus
  • Strong proficiency in Python and experience with modern AI/ML frameworks (e.g. LangChain, Vertex AI Agent Builder, TensorFlow, PyTorch is a plus

Responsibilities

  • Contribute to the design, implementation, and maintenance of the LLM Gateway, focusing on features like unified API endpoints for internal/externally hosted LLM, rate/token limit management, and intelligent failover mechanisms to boost uptime and reliability.
  • Designed and developed ML and Generative AI systems in cloud-based production environments at scale.
  • Build and manage enterprise-grade RAG applications using embeddings, vector search, and retrieval pipelines.
  • Implement and operationalize agentic AI workflows with tool use using frameworks such as LangChain and LangGraph.
  • Drive adoption of MLOps / LLMOps practices, including CI/CD automation, versioning, testing, and lifecycle management.
  • Establish best practices for observability, monitoring, evaluation, and governance of GenAI pipelines in production.
  • Strong ownership mindset and platform thinking.
  • Ability to lead AI platform delivery from concept to production.

Benefits

  • Comprehensive Healthcare Benefits and Income Replacement Programs
  • 401k with Employer Match
  • Global Benefit programs that fit your lifestyle, from workspace to professional development to caregiving support
  • Family Planning Support
  • Gender-Affirming Care
  • Mental Health & Coaching Benefits
  • Flexible Vacation & Paid Volunteer Time Off
  • Generous Paid Parental Leave
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service