AI Infrastructure Engineer

Capstone Investment Advisors•New York, NY

57d

About The Position

We are seeking an AI Infrastructure Engineer to design, build, and scale the foundational infrastructure that enables AI-driven development across the organization. This role will focus on building secure, production-grade systems that support intelligent agents, large language models (LLMs), and distributed AI tooling. The ideal candidate combines strong software engineering fundamentals with hands-on experience in AI infrastructure and Site Reliability Engineering (SRE) experience.

Requirements

5+ years of professional software engineering experience, with a proven track record of designing and delivering scalable, production-grade systems.
Demonstrated experience managing Kubernetes in a production environment.
Expertise in building and supporting AI/ML infrastructure, agent-based systems, or AI-enhanced developer platforms.
Solid understanding of modern security architecture, including API security, OAuth 2.0 and Keycloak authentication flows, and secure system design principles.

Nice To Haves

Hands-on experience with AI agent frameworks, LLM integrations, or Model Context Protocol (MCP) implementations.
Experience designing multi-tenant or federated AI platforms operating across distributed environments.
Familiarity with enterprise AI governance, compliance frameworks, and operational risk controls.
Background in cybersecurity, security architecture, or penetration testing.
Experience working with AI-assisted development tools, automated code generation, and modern testing frameworks.

Responsibilities

Build Intelligent Agent Platforms: Design and implement orchestration frameworks and secure execution environments that enable LLM-powered tools and agent-based workflows.
Architect AI Infrastructure: Develop scalable infrastructure that supports distributed agent development and deployment across multiple teams and business units.
Oversee AI Agent Operations: Manage agent lifecycles and ensure AI-generated outputs — including code — align with architectural standards, security policies, and engineering best practices.
Develop Agent Communication Systems: Build and maintain Model Context Protocol (MCP) services and supporting infrastructure to enable reliable communication, coordination, and integration with enterprise systems.
Implement Governance & Security Controls: Establish monitoring, observability, compliance, and security frameworks to ensure safe and responsible AI operations at scale.
Drive Organizational Adoption: Partner directly with teams across the firm to promote AI infrastructure best practices, provide hands-on guidance, and support adoption of AI-augmented development workflows.