AI Engineer

OpusClip•Palo Alto, CA

24d

About The Position

🎨 OpusClip is the world's No.1 AI video agent, built for authenticity on social media. We envision a world where everyone can authentically share their story through video, with no expertise needed. Within just 18 months of our launch, over 10 million creators and businesses have used OpusClip to enhance their social presence. We have raised $50 million in total funding and are fortunate to have some of the most supportive investors, including SoftBank Vision Fund, DCM Ventures, Millennium New Horizons, Fellows Fund, AI Grant, Jason Lemkin (SaaStr), Samsung Next, GTMfund, Alumni Ventures, and many more. Check out our latest coverage by Business Insider featuring our product and funding milestones, and our recognition as one of The Information's 50 Most Promising Startups in 2024. Headquartered in Palo Alto, we are a team of 100 passionate and experienced AI enthusiasts and video experts, driven by our core values: Be a Champion Team Prioritize Ruthlessly Ship fast, Quality Follows Obsess over customers Be a part of this exciting journey with us! 📝 Your Responsibility AI System Architecture: Design and build scalable, low-latency AI inference microservices. You will move beyond simple scripts to architecting robust systems that handle high-volume video processing. Engineering-First Model Deployment: Collaborate with the team to build production pipelines for Video Understanding and LLMs. You will be responsible not just for the model's accuracy, but for its throughput, cost-efficiency, and integration into the core backend. High-Standard "Vibe Coding": We embrace AI-assisted coding (Cursor, Copilot, etc.), but we demand engineering rigor. You are responsible for ensuring all code—whether written by you or generated by AI—is modular, type-safe, thoroughly tested, and maintainable. Performance Optimization: Profile and optimize Python/C++ code and model inference layers (e.g., quantization, batching, caching strategies) to minimize GPU costs and user wait time. R&D to Production: Conduct research on cutting-edge LLMs/multimodal models and rapidly refactor experimental code into stable, production-ready features.

Requirements

Bachelor's degree or above in Computer Science or related fields.
3+ years of work experience (Fresh graduates with strong internships/project experience are welcome).
Strong System Design Sense: Understanding of distributed systems, API design (REST/gRPC), asynchronous processing (task queues like Celery/Redis), and database interactions.
Solid Engineering Fundamentals:
Fluent in Python (C++ or JavaScript is a plus).
Ability to write clean, SOLID, and testable code.
Proficiency with Docker/Containerization and CI/CD workflows.
AI/ML Proficiency:
Proficient in PyTorch or TensorFlow.
Familiarity with model serving frameworks (e.g., vLLM, TGI, Triton) and ONNX.
Experience in one of the below areas:
Video Understanding / Computer Vision.
LLM Fine-tuning / RAG Systems.
Backend Systems for AI (FastAPI, Vector DBs, Microservices).
Enthusiastic, excellent communicator, self-motivated, and possessing a strong sense of ownership.

Nice To Haves

Full-Stack AI Experience: Experience building an end-to-end product feature—from the prompt engineering layer down to the API deployment and database schema.
Inference Optimization: Experience with TensorRT, quantization (AWQ/GPTQ), or FlashAttention to speed up model performance.
Vector Database at Scale: Experience managing vector stores (Pinecone, Milvus, Weaviate) in a production environment.
Open Source & Community: Experience building APIs/services/open-source tools with ChatGPT/OpenAI APIs.
Projects completed or Research papers published in top-tier conferences (ACL, CVPR, NeurIPS, etc.).

Responsibilities

AI System Architecture: Design and build scalable, low-latency AI inference microservices. You will move beyond simple scripts to architecting robust systems that handle high-volume video processing.
Engineering-First Model Deployment: Collaborate with the team to build production pipelines for Video Understanding and LLMs. You will be responsible not just for the model's accuracy, but for its throughput, cost-efficiency, and integration into the core backend.
High-Standard "Vibe Coding": We embrace AI-assisted coding (Cursor, Copilot, etc.), but we demand engineering rigor. You are responsible for ensuring all code—whether written by you or generated by AI—is modular, type-safe, thoroughly tested, and maintainable.
Performance Optimization: Profile and optimize Python/C++ code and model inference layers (e.g., quantization, batching, caching strategies) to minimize GPU costs and user wait time.
R&D to Production: Conduct research on cutting-edge LLMs/multimodal models and rapidly refactor experimental code into stable, production-ready features.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume