Founding Machine Learning Engineer

Recruiting From Scratch•San Francisco, CA

55d•Onsite

About The Position

This fast-growing, venture-backed startup is building the core infrastructure layer that enables AI agents to access, understand, and act on real-time internet data. Instead of traditional search workflows designed for humans, the platform provides APIs that allow AI systems to retrieve high-fidelity, structured data directly from source systems. The company has achieved strong early traction—scaling to millions in ARR within its first year—and is already serving enterprise customers. Backed by leading investors including Y Combinator and top-tier venture firms, the team is now focused on pushing the boundaries of applied machine learning to power the next generation of AI-native data systems.

Requirements

3+ years of experience building and shipping production ML systems, particularly in NLP, information retrieval, or entity resolution
Strong hands-on experience with Python and PyTorch
Deep understanding of transformer architectures, including training and fine-tuning encoder models
Experience building retrieval systems, classifiers, or embedding-based systems
Familiarity with representation learning techniques (e.g., contrastive learning, metric learning)
Experience applying LLMs to structured data problems (e.g., extraction, classification, generation)
Strong problem-solving skills with the ability to work on ambiguous, large-scale data challenges
High ownership mindset with a strong bias toward execution in fast-paced environments

Nice To Haves

Experience with entity resolution or record linkage at scale
Background in multilingual or cross-lingual NLP
Experience building taxonomies, ontologies, or knowledge systems
Familiarity with distributed training on GPU clusters
Experience scaling LLM inference pipelines in production
Research publications or open-source contributions in NLP/IR

Responsibilities

Own the end-to-end development of core ML systems—from research and modeling to production deployment
Design and train models for information retrieval, entity resolution, classification, and structured data extraction
Build systems that transform messy, multilingual web-scale data into structured, queryable intelligence
Develop embedding models, ranking systems, and retrieval pipelines for high-precision search and matching
Apply transformer architectures and modern NLP techniques to real-world data problems
Leverage LLMs for tasks such as extraction, classification, and data enrichment at scale
Continuously evaluate and improve model performance using rigorous experimentation and metrics
Work closely with engineering and product teams to integrate ML systems into production APIs

Benefits

Base salary: $150K – $300K
Equity: 0.10% – 0.50% (founding-level ownership)
Visa sponsorship available
Opportunity to join at an early stage with strong product-market fit and rapid growth
High ownership role with direct impact on core product and company trajectory
Work alongside experienced founders and top-tier investors

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume