Senior DevOps Engineer

CellebriteTysons, VA

About The Position

We are building a rapidly scaling GenAI-powered SaaS platform that enables investigators to interact with complex case data through a conversational AI interface. Our system leverages RAG architecture and agentic GenAI workflows to deliver advanced AI capabilities in production. We are looking for a Senior DevOps / Cloud Engineer to own our application services, cloud infrastructure, deployment pipelines, and production reliability in this dynamic AI environment. This is a hands-on role focused on serverless architecture, LLM-based systems, and agentic workflows, working closely with Engineering and Customer Success to ensure the platform is reliable, scalable, and cost-efficient.

Requirements

  • 5+ years of experience in DevOps / SRE / Cloud Engineering
  • Strong hands-on experience with Google Cloud Platform (GCP)
  • Proven experience with serverless architectures (Cloud Run, Cloud Functions, or similar)
  • Experience working with BigQuery (querying, performance tuning, troubleshooting)
  • Experience running and supporting production SaaS applications
  • Hands-on experience with GenAI / LLM-based applications in production (including RAG systems, model APIs, or similar)
  • Experience supporting or operating multi-step AI pipelines or agentic workflows
  • Strong experience with CI/CD pipelines (GitHub Actions, etc.)
  • Solid scripting/programming skills (Python, TypeScript, Bash, or similar)
  • Experience with observability and monitoring tools

Nice To Haves

  • Experience optimizing LLM performance, cost, and reliability at scale
  • Familiarity with vector databases, embeddings, and retrieval systems
  • Experience with infrastructure as code (Terraform or similar)
  • Background in secure or regulated environments
  • Experience in fast-scaling or experimental product environments

Responsibilities

  • Own and manage application services running on GCP infrastructure, including serverless and managed services
  • Design and maintain robust CI/CD pipelines for rapid, safe deployments
  • Operate and optimize GenAI/LLM workloads in production, including RAG pipelines and agentic workflows
  • Monitor and improve latency, cost, and reliability of AI-driven systems
  • Troubleshoot complex production issues across application, data, and infrastructure layers
  • Work with and optimize BigQuery-based data workflows, queries, and performance
  • Support and debug multi-step AI pipelines and agent orchestration flows
  • Implement and maintain observability (logging, metrics, tracing, alerting), including for AI pipelines
  • Collaborate with engineering teams on architecture improvements for evolving GenAI systems
  • Partner with Customer Success to investigate and resolve customer-impacting issues (minimal direct customer interaction)
  • Enforce security and best practices in a sensitive data environment
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service