DevOps Engineer

VitolHouston, TX
5dOnsite

About The Position

We are looking for a talented and motivated DevOps Engineer to join our global Cloud and Platform Services team. In this role you will serve as a founding technical pillar for our AI "tiger team," transitioning early-stage AI experiments into hardened, enterprise-scale production environments while helping shape our broader CI/CD strategy.

Requirements

  • Experience: 5+ years in DevOps, MLOps, or Cloud Engineering. Previous experience in commodities or trading is highly preferred.
  • Cloud & Containers: Deep expertise in AWS services and container orchestration (Docker, ECS) in hybrid environments.
  • Automation Toolkit: Proficiency in Terraform and GitOps. Hands-on experience with Jenkins, GitHub Actions, or GitLab CI.
  • AI/ML Specialized Skills: Experience with agent runtime environments (LangGraph, AgentCore).
  • Familiarity with LLM evaluation (LangSmith, Ragas) and model serving.
  • Management of Vector stores and embedding pipelines.
  • Software Background: Strong proficiency in Python (required); ability to write automation scripts in Bash or PowerShell.
  • Middleware & Ecosystem: Exposure to Kafka, Redis, Elastic, and Windows/Linux OS environments.
  • Strong communication skills (written and verbal)

Responsibilities

  • AI & Platform Engineering: Transition early-stage AI systems to scaled, enterprise-wide deployments. Build and operate the shared platform landscape, ensuring high availability and global support.
  • Advanced CI/CD & Automation: Design, implement, and maintain CI/CD pipelines for both standard software and agentic AI workflows (LLMs, RAG services, and ML models), incorporating build, test, and evaluation gates.
  • Cloud Architecture & Infrastructure: Architect and operate AWS cloud infrastructure (ECS, Lambda, S3, VPC) using Infrastructure-as-Code (Terraform). Integrate cloud resource creation directly into application deployment processes.
  • Data & AI Operations (MLOps): Manage vector database infrastructure (OpenSearch, Pinecone) and implement MLOps workflows including model registries, versioning, and experiment tracking (SageMaker/MLflow).
  • Security & Governance: Collaborate with Cyber Security to enforce secrets management (Secrets Manager), access controls, and compliance guardrails, with specific focus on LLM API governance and data residency.
  • Observability & FinOps: Instrument full-stack observability (Datadog) across the platform and build cost-visibility dashboards to track GPU/CPU and LLM API spend.
  • Cross-Functional Collaboration: Act as a thought-partner to AI teams to define standards, environment parity, and deployment runbooks.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service