LLMOps Engineer

Brooklyn Sports & Entertainment•New York, NY

18d•$130,000 - $150,000•Onsite

About The Position

The Large Language Model Ops (LLMOps) Engineer ensures Brooklyn Sports & Entertainment’s AI systems are safe to run in production and safe to evolve over time. This role builds and operates the infrastructure that powers the company’s AI platform, including deployment pipelines, evaluation systems, observability tooling, and infrastructure for agent-based workflows. Working primarily within AWS and Amazon Bedrock, the LLMOps Engineer builds the production environment required to reliably operate AI systems that support ticketing operations, venue analytics, digital fan engagement, partnerships workflows, and internal engineering tools. The role works closely with the AI Engineer, sharing responsibility for production operations while focusing on infrastructure reliability and AI platform governance. This role reports into the VP, Artificial Intelligence.

Requirements

3–6 years of experience working in DevOps, MLOps, or infrastructure engineering roles.
Strong experience operating AWS cloud infrastructure.
Experience deploying and operating AI or machine learning systems in production environments.
Experience with Amazon Bedrock or similar LLM platforms.
Experience building CI/CD pipelines and automated deployment systems.
Experience implementing observability systems and operational dashboards.

Responsibilities

Design and operate the AWS-based infrastructure foundation for AI and agent systems.
Manage integrations with Amazon Bedrock and other model providers.
Implement secure environment configuration, secrets management, and infrastructure policies.
Build scalable systems capable of supporting ongoing operations.
Build and maintain CI/CD pipelines for AI systems, including deployment of prompts, models, and agent workflows.
Implement versioning for models, prompts, and LangGraph agent graphs.
Design environment promotion standards and automated rollback mechanisms.
Design and operate the CodeAI development pipeline, enabling AI agents to assist with software development tasks.
Build sandboxed environments allowing AI agents to safely interact with repositories, CI systems, and development workflows.
Establish governance and security guardrails for AI-assisted development and vibe coding practices.
Implement monitoring systems using AWS CloudWatch and evaluation dashboards.
Define Service Level Objectives (SLOs) for AI system reliability.
Build shared evaluation infrastructure that measures hallucination rates, tool-call success, and workflow reliability.
Monitor and optimize cost-per-request metrics.
Participate in deployment, monitoring, and incident response for AI systems.
Develop operational playbooks that reduce mean time to recovery and improve system resilience.