Principal AI/ML Engineer, Semantic Data

Major League Soccer•New York, NY

17h•$235,000 - $260,000•Hybrid

About The Position

Major League Soccer is building advanced AI and data platforms to power fan intelligence, personalization, and data-driven decisioning across the organization. The Principal AI/ML Engineer, Semantic Data will design and build the semantic intelligence layer that enables consistent understanding of fan data, business concepts, and operational workflows across MLS systems. This role combines semantic data systems with applied LLM engineering to build grounded, production-grade AI capabilities. This is a systems engineering role responsible for building and scaling real-world AI infrastructure, including knowledge graphs, retrieval systems, and LLM-powered applications.

Requirements

Master’s degree or higher in computer science, engineering, or related field, or equivalent experience
8–10+ years of experience in ML engineering, data systems, or applied AI
Strong expertise in Python, SQL, and production software engineering
Deep experience with semantic data modeling, ontologies, and entity resolution
Hands-on experience with embeddings, vector search, and retrieval systems
Experience building and deploying LLM-powered systems including RAG
Experience building production-grade AI systems at scale
Strong understanding of distributed systems and data architecture

Nice To Haves

Experience with knowledge graphs and graph databases
Experience designing semantic layers or feature stores
Experience with open-weight LLMs and model adaptation
Familiarity with on-prem or private GPU deployments
Experience with modern data platforms (AWS, Snowflake, Databricks)
Background in marketing analytics, personalization, or customer data platforms

Responsibilities

Design and implement embedding pipelines across fan data, content, metadata, and behavioral signals
Build metadata and enrichment systems that normalize and structure enterprise data for AI use
Develop knowledge bases and retrieval systems using vector databases and hybrid search architectures
Create context assembly pipelines combining structured data, documents, APIs, and historical outputs
Enable AI systems to operate on unified semantic representations rather than raw data
Architect and manage knowledge graphs representing fan, content, and business entity relationships
Define and maintain a semantic layer standardizing metrics, features, and business concepts
Design ontologies, taxonomies, and entity models for fan behavior and identity
Implement graph-based reasoning and enrichment workflows
Ensure semantic consistency across analytics, ML, and operational systems
Design and build retrieval-augmented generation (RAG) systems grounded in semantic data
Integrate LLMs for reasoning over structured and unstructured data
Develop pipelines translating natural language into structured outputs such as queries and analytical tasks
Build and optimize context pipelines improving LLM grounding and factual accuracy
Evaluate and integrate open-weight models for domain-specific reasoning
Fine-tune or adapt models using parameter-efficient techniques
Support deployment of LLM systems in private or on-prem GPU environments
Optimize inference workflows for latency, cost, and scalability
Enable LLM-driven workflows that reason over semantic data and retrieval systems
Build scalable, production-grade services and APIs for semantic and AI systems
Work with vector and graph databases to support retrieval and reasoning
Integrate structured data, documents, APIs, and model outputs
Partner with data engineering on batch and real-time pipelines
Ensure systems meet performance and reliability requirements
Design evaluation frameworks for retrieval quality and LLM output correctness
Monitor system performance, relevance, and model behavior
Establish guardrails for explainability, traceability, and data attribution
Ensure safe and reliable generation of structured outputs
Mitigate risks related to bias, data leakage, and inconsistencies
Collaborate with product, analytics, and engineering teams on AI use cases
Translate business problems into systems combining semantic data and LLM reasoning
Partner with ML teams to improve model performance through better grounding
Mentor engineers and establish best practices