Vice President, AI Platform Engineering

Ares Management Corporation•New York, NY

About The Position

We are seeking an accomplished VP of AI Platform Engineering to lead the design, development, and deployment of our enterprise generative AI platform. This leadership role focuses on building and scaling core platform components that enable safe, secure, and compliant AI application development across the firm. Working closely with the Principal AI Platform Engineer and cross-functional teams, you will drive execution on critical platform infrastructure—from multi-LLM gateways and RAG services to model registry, prompt library, and production deployment pipelines. This is an opportunity to shape how the organization leverages AI at scale while maintaining rigorous standards for security, governance, and reliability.

Requirements

7+ years of software engineering experience with 3+ years in leadership or senior IC roles
3+ years of experience with generative AI, LLMs, RAG systems, or AI platform infrastructure
Strong proficiency in Python, Go, Rust, or Java; experience building scalable backend systems
Deep knowledge of LLM architecture, fine-tuning, and RAG design patterns
Hands-on experience with model serving frameworks (vLLM, Ollama, TensorFlow Serving), vector databases, and embedding models
Proficiency with cloud platforms (AWS, GCP, Azure) and Kubernetes/Docker
Demonstrated experience building production systems with focus on reliability, performance, and observability
Strong understanding of security best practices: authentication, authorization, encryption, and secure API design
Experience with compliance frameworks and security governance
Excellent communication and cross-functional collaboration skills
Track record of delivering complex technical projects on schedule

Nice To Haves

Experience in financial services, private equity, or alternative assets
Familiarity with LangChain, or LlamaIndex orchestration frameworks
Experience with MLOps platforms and model versioning systems
Knowledge of prompt engineering evaluation and testing frameworks
Experience with data governance, metadata management, and data lineage systems
Background building internal platforms or developer tools
Experience mentoring engineers and building high-performing teams
Open source contributions or published technical work in AI/ML

Responsibilities

Platform Development & Execution Lead design and implementation of core platform components: multi-LLM gateway, RAG retrieval services, model registry, and prompt library
Drive execution on platform roadmap, breaking down complex features into deliverable milestones with clear success metrics
Own API design and service integration patterns that enable seamless consumption across AI enablement teams
Ensure technical excellence: code quality, testability, performance optimization, and architectural coherence
Multi-LLM Gateway & Model Management Design and build multi-LLM gateway architecture supporting multiple providers (OpenAI, Anthropic, Azure, self-hosted, etc.)
Implement intelligent routing, load balancing, and fallback mechanisms based on cost, latency, and capability requirements
Build model registry with versioning, metadata management, and approval workflows
Implement cost optimization and FinOps tracking for model usage and spending
Monitor model performance, hallucination rates, latency, and quality metrics in production
RAG & Retrieval Infrastructure Design and build enterprise RAG infrastructure: vector database integration, semantic search, and chunking strategies
Implement retrieval evaluation and quality metrics to ensure relevance and accuracy
Build indexing pipelines and data ingestion workflows from enterprise data sources
Integrate with data governance and lineage tracking systems
Model Context Protocol (MCP) & Integration Gateway Implement MCP gateway for secure, standardized integration with external tools and APIs
Build tool catalog and discovery mechanisms for AI applications
Establish security and governance controls for tool access and data handling
Prompt Library & Version Control Build organizational prompt library with versioning, tagging, and metadata
Implement testing and evaluation frameworks for prompt variants
Enable A/B testing and prompt performance analytics
Support prompt governance and approval workflows
Deployment Pipelines & DevOps Design sandbox-to-production deployment pipelines with clear promotion gates and approval workflows
Implement CI/CD for AI applications: automated testing, integration, and deployment
Build monitoring, observability, and alerting for production AI systems
Implement canary deployments, gradual rollouts, and rollback mechanisms
Establish SLOs, error budgets, and on-call protocols for platform services
Agent-to-Agent (A2A) Workflows Design orchestration framework for multi-step AI workflows with state management
Build error handling, retries, and recovery mechanisms for reliable execution
Implement workflow monitoring and debugging tools
Data Integration & Gateway Collaboration Partner with Data Products team to design AI-native data access patterns and APIs
Implement secure, governed data retrieval for RAG and model training
Build metadata and data lineage tracking for compliance and governance
Security & Governance Implementation Implement authentication, authorization, and encryption across platform services
Build audit logging, request validation, and rate limiting for all platform APIs
Implement input/output validation to prevent prompt injection and data leakage
Design model and prompt governance workflows with appropriate approval gates
Ensure compliance with firm security policies and regulatory requirements
Work with Compliance and Infosec teams on security assessments and incident response
Developer Experience & Enablement Develop SDKs, client libraries, and code samples that make platform easy to consume
Create documentation, tutorials, and best practices guides
Support AI Enablement teams with technical guidance and integration assistance
Gather feedback from users and iterate on platform based on adoption patterns
Team Leadership & Collaboration Manage and mentor engineering team focused on platform development and operations
Collaborate with Principal on architecture decisions and long-term platform vision
Partner with Data Products, AI Enablement, Security, and Compliance teams
Lead technical working groups and establish platform standards and best practices