Embedding MLOps

interpretai.tech•San Francisco, CA

13h

About The Position

We are seeking an experienced Cluster Infrastructure Engineer to design, implement, and maintain our vector embedding infrastructure on cloud platforms and support our distributed training platform on the cloud. In this role, you will be responsible for creating a scalable, high-performance system that supports both the training of embedding models and efficient inference workflows. The ideal candidate will combine expertise in machine learning infrastructure with strong system design skills to build robust embedding systems that power our AI applications, including vector searches, recommendations, and SOTA model inference for foundation models.

Requirements

Bachelor's degree in Computer Science, Engineering, or related technical field (Master's preferred).
2+ years of experience building large-scale, high-performance backend systems.
2+ years of experience with cloud platforms (AWS, GCP, or Azure) and infrastructure-as-code tools.
Strong proficiency in at least one programming language such as Python, Go, C++, or Java.
Experience with containerization and orchestration technologies (Docker, Kubernetes).
Knowledge of distributed systems concepts like sharding, replication, and consensus algorithms.
Demonstrated experience with database systems, search technologies, or AI/ML systems.
Understanding of embedding techniques and their applications for text, images, or other data types.
Experience with memory management, networking, and troubleshooting distributed systems.
Proven ability to solve complex problems independently in fast-moving environments.

Nice To Haves

Experience with vector databases and similarity search systems (FAISS, Pinecone, Milvus, etc.).
Knowledge of ML frameworks (PyTorch, TensorFlow) and model optimization techniques.
Experience with SOTA embedding models from a variety of domains (CV, LLMs, etc.).
Understanding of embedding evaluation metrics and quality assessment techniques.
Experience with high-performance computing (HPC) environments.
Background in designing systems for machine learning training and/or inference workloads.
Knowledge of data sharding strategies, parallel processing, and memory optimization techniques.
Experience with real-time streaming data processing frameworks.
Contributions to open-source projects related to embeddings or machine learning infrastructure.

Responsibilities

Design and architect a multi-tenanted, cloud-native embedding infrastructure that supports both training and inference workloads.
Build scalable vector search and embedding generation services that handle high throughput, maintain low latency, and measure index time.
Implement fault-tolerant, high-performance systems for serving embedding models at scale.
Develop infrastructure automation using containerization, orchestration, and infrastructure-as-code practices.
Optimize embedding storage, indexing, and retrieval systems for performance and cost efficiency.
Design and implement robust monitoring and observability solutions to ensure system health and performance.
Collaborate with ML engineers and data scientists to understand and support embedding-related workloads.
Create scalable pipelines for generating and updating embeddings from various data sources (text, images, audio).
Implement security best practices and ensure compliance with data protection requirements.
Lead technical discussions regarding vector database architecture and performance optimization.
Support hiring efforts for building the core infrastructure.