Vice President, AI Platform Engineering

Ares Management CorporationNew York, NY
1d

About The Position

We are seeking an accomplished VP of AI Platform Engineering to lead the design, development, and deployment of our enterprise generative AI platform. This leadership role focuses on building and scaling core platform components that enable safe, secure, and compliant AI application development across the firm. Working closely with the Principal AI Platform Engineer and cross-functional teams, you will drive execution on critical platform infrastructure—from multi-LLM gateways and RAG services to model registry, prompt library, and production deployment pipelines. This is an opportunity to shape how the organization leverages AI at scale while maintaining rigorous standards for security, governance, and reliability.

Requirements

  • 7+ years of software engineering experience with 3+ years in leadership or senior IC roles
  • 3+ years of experience with generative AI, LLMs, RAG systems, or AI platform infrastructure
  • Strong proficiency in Python, Go, Rust, or Java; experience building scalable backend systems
  • Deep knowledge of LLM architecture, fine-tuning, and RAG design patterns
  • Hands-on experience with model serving frameworks (vLLM, Ollama, TensorFlow Serving), vector databases, and embedding models
  • Proficiency with cloud platforms (AWS, GCP, Azure) and Kubernetes/Docker
  • Demonstrated experience building production systems with focus on reliability, performance, and observability
  • Strong understanding of security best practices: authentication, authorization, encryption, and secure API design
  • Experience with compliance frameworks and security governance
  • Excellent communication and cross-functional collaboration skills
  • Track record of delivering complex technical projects on schedule

Nice To Haves

  • Experience in financial services, private equity, or alternative assets
  • Familiarity with LangChain, or LlamaIndex orchestration frameworks
  • Experience with MLOps platforms and model versioning systems
  • Knowledge of prompt engineering evaluation and testing frameworks
  • Experience with data governance, metadata management, and data lineage systems
  • Background building internal platforms or developer tools
  • Experience mentoring engineers and building high-performing teams
  • Open source contributions or published technical work in AI/ML

Responsibilities

  • Platform Development & Execution Lead design and implementation of core platform components: multi-LLM gateway, RAG retrieval services, model registry, and prompt library
  • Drive execution on platform roadmap, breaking down complex features into deliverable milestones with clear success metrics
  • Own API design and service integration patterns that enable seamless consumption across AI enablement teams
  • Ensure technical excellence: code quality, testability, performance optimization, and architectural coherence
  • Multi-LLM Gateway & Model Management Design and build multi-LLM gateway architecture supporting multiple providers (OpenAI, Anthropic, Azure, self-hosted, etc.)
  • Implement intelligent routing, load balancing, and fallback mechanisms based on cost, latency, and capability requirements
  • Build model registry with versioning, metadata management, and approval workflows
  • Implement cost optimization and FinOps tracking for model usage and spending
  • Monitor model performance, hallucination rates, latency, and quality metrics in production
  • RAG & Retrieval Infrastructure Design and build enterprise RAG infrastructure: vector database integration, semantic search, and chunking strategies
  • Implement retrieval evaluation and quality metrics to ensure relevance and accuracy
  • Build indexing pipelines and data ingestion workflows from enterprise data sources
  • Integrate with data governance and lineage tracking systems
  • Model Context Protocol (MCP) & Integration Gateway Implement MCP gateway for secure, standardized integration with external tools and APIs
  • Build tool catalog and discovery mechanisms for AI applications
  • Establish security and governance controls for tool access and data handling
  • Prompt Library & Version Control Build organizational prompt library with versioning, tagging, and metadata
  • Implement testing and evaluation frameworks for prompt variants
  • Enable A/B testing and prompt performance analytics
  • Support prompt governance and approval workflows
  • Deployment Pipelines & DevOps Design sandbox-to-production deployment pipelines with clear promotion gates and approval workflows
  • Implement CI/CD for AI applications: automated testing, integration, and deployment
  • Build monitoring, observability, and alerting for production AI systems
  • Implement canary deployments, gradual rollouts, and rollback mechanisms
  • Establish SLOs, error budgets, and on-call protocols for platform services
  • Agent-to-Agent (A2A) Workflows Design orchestration framework for multi-step AI workflows with state management
  • Build error handling, retries, and recovery mechanisms for reliable execution
  • Implement workflow monitoring and debugging tools
  • Data Integration & Gateway Collaboration Partner with Data Products team to design AI-native data access patterns and APIs
  • Implement secure, governed data retrieval for RAG and model training
  • Build metadata and data lineage tracking for compliance and governance
  • Security & Governance Implementation Implement authentication, authorization, and encryption across platform services
  • Build audit logging, request validation, and rate limiting for all platform APIs
  • Implement input/output validation to prevent prompt injection and data leakage
  • Design model and prompt governance workflows with appropriate approval gates
  • Ensure compliance with firm security policies and regulatory requirements
  • Work with Compliance and Infosec teams on security assessments and incident response
  • Developer Experience & Enablement Develop SDKs, client libraries, and code samples that make platform easy to consume
  • Create documentation, tutorials, and best practices guides
  • Support AI Enablement teams with technical guidance and integration assistance
  • Gather feedback from users and iterate on platform based on adoption patterns
  • Team Leadership & Collaboration Manage and mentor engineering team focused on platform development and operations
  • Collaborate with Principal on architecture decisions and long-term platform vision
  • Partner with Data Products, AI Enablement, Security, and Compliance teams
  • Lead technical working groups and establish platform standards and best practices

Benefits

  • Comprehensive Medical/Rx, Dental and Vision plans
  • 401(k) program with company match
  • Flexible Savings Accounts (FSA)
  • Healthcare Savings Accounts (HSA) with company contribution
  • Basic and Voluntary Life Insurance
  • Long-Term Disability (LTD) and Short-Term Disability (STD) insurance
  • Employee Assistance Program (EAP)
  • Commuter Benefits plan for parking and transit
  • access to a world-class medical advisory team
  • a mental health app that includes coaching, therapy and psychiatry
  • a mindfulness and wellbeing app
  • financial wellness benefit that includes access to a financial advisor
  • new parent leave
  • reproductive and adoption assistance
  • emergency backup care
  • matching gift program
  • education sponsorship program

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Executive

Education Level

No Education Listed

Number of Employees

501-1,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service