AI Technical Lead

NIO•San Jose, CA

1d•$192,100 - $249,600

About The Position

NIO is a pioneer and a leading company in the premium smart electric vehicle market. Founded in November 2014, NIO’s mission is to shape a joyful lifestyle. NIO aims to build a community starting with smart electric vehicles to share joy and grow together with users. NIO designs, develops, jointly manufactures and sells premium smart electric vehicles, driving innovations in next-generation technologies in autonomous driving, digital technologies, electric powertrains and batteries. NIO differentiates itself through its continuous technological breakthroughs and innovations, such as its industry-leading battery swapping technologies, Battery as a Service, or BaaS, as well as its proprietary autonomous driving technologies and Autonomous Driving as a Service, or ADaaS. NIO’s product portfolio consists of the ES8, a six-seater smart electric flagship SUV, the ES7 (or the EL7), a mid-large five-seater smart electric SUV, the ES6, a five-seater all-round smart electric SUV, the EC7, a five-seater smart electric flagship coupe SUV, the EC6, a five-seater smart electric coupe SUV, the ET7, a smart electric flagship sedan, and the ET5, a mid-size smart electric sedan. Roles and Responsibilities Architect the Hybrid AI Vision: Lead the architectural design and strategic vision for hybrid inference systems, dynamically distributing Large Language Model (LLM) and Vision-Language Model (VLM) workloads across edge computing environments and cloud infrastructure. Team Leadership & Innovation: Lead, mentor, and inspire a team of specialized engineers working across distributed systems orchestration, inference optimization, and AI compiler engineering. While you are not expected to be a hands-on master of every domain, you will drive the overarching technical roadmap, foster a culture of cutting-edge innovation, and guide domain experts in navigating complex system tradeoffs. Design Dynamic Orchestration & Resilience: Oversee the architecture of high-availability orchestration engines that intelligently route inference tasks. Guide the team in developing cascading inference mechanisms, dynamic model fallback strategies, and robust telemetry to ensure continuous, steady-state inference under varying connectivity constraints.

Requirements

Ph.D. in Computer Science, Computer Engineering, Artificial Intelligence, or a related field with 8+ years of relevant industry experience (or Master’s degree with 12+ years), including proven experience leading technical teams or driving complex architectural roadmaps.
Demonstrated capability to lead full-stack AI systems engineering.
Deep, hands-on mastery in at least one or two of the following core domains, coupled with the comprehensive systemic breadth required to effectively lead engineers working across the others: Distributed Systems & Hybrid Inference: Designing, scaling, and deploying production-grade distributed ML systems. Balancing cloud infrastructure with edge constraints using modern routing paradigms, such as cascading inference architectures and semantic routing.
Proven experience optimizing state-of-the-art LLM/VLM inference pipelines.
Deep understanding of model compression (e.g., PTQ, QAT, AWQ, FP8/INT4), hardware-aware compute optimizations (e.g., FlashAttention), and advanced memory management (e.g., PagedAttention, KV cache compression/eviction).
C++ and production-grade Python proficiency.
Deep understanding of edge/cloud model-serving frameworks (e.g., vLLM, TensorRT-LLM, ExecuTorch, MLC-LLM) and AI compilers (e.g., MLIR, Apache TVM, Triton) for compute graph optimization and custom kernel development.

Nice To Haves

Deep understanding of privacy-preserving AI techniques (federated learning, differential privacy, secure enclaves) essential for processing sensitive data across edge and cloud environments.
Publications in relevant AI, ML, or systems conferences (e.g., NeurIPS, ICML, MLSys), or active contributions to open-source ML infrastructure projects (e.g., vLLM, ONNX Runtime, Apache TVM, llama.cpp).

Responsibilities

Lead the architectural design and strategic vision for hybrid inference systems, dynamically distributing Large Language Model (LLM) and Vision-Language Model (VLM) workloads across edge computing environments and cloud infrastructure.
Lead, mentor, and inspire a team of specialized engineers working across distributed systems orchestration, inference optimization, and AI compiler engineering.
Drive the overarching technical roadmap, foster a culture of cutting-edge innovation, and guide domain experts in navigating complex system tradeoffs.
Oversee the architecture of high-availability orchestration engines that intelligently route inference tasks.
Guide the team in developing cascading inference mechanisms, dynamic model fallback strategies, and robust telemetry to ensure continuous, steady-state inference under varying connectivity constraints.

Benefits

Anthem Blue Cross, HSA, and Kaiser HMO medical plans with $0 for Employee Only Coverage.
Dental (including orthodontic coverage) and vision plan. Both provide options with a $0 paycheck contribution covering you and your eligible dependents.
Company Paid HSA (Health Savings Account) Contribution when enrolled in the High Deductible Anthem Blue Cross medical plan
Healthcare and Dependent Care Flexible Spending Accounts (FSA)
401(k) with Brokerage Link option
Company paid Basic Life, AD&D, short-term and long-term disability insurance
Employee Assistance Program
Sick and Vacation time
13 Paid Holidays a year
Paid Parental Leave for first 8 weeks at full pay (eligible after 90 days of employment with NIO)
Paid Disability Leave for first 6 weeks at full pay (eligible after 90 days of employment with NIO)
Voluntary Life and AD&D options for you, your spouse/domestic partner and dependent child(ren)
Pet insurance
Commuter benefits
Mobile Cell Phone Credit
Free lunch and snacks
Onsite gym
Employee discounts and perks program