Senior Engineering Leader - AI Infrastructure and Inferencing

Gruve•Redwood City, CA

4h•$240,000 - $250,000•Onsite

About The Position

We're seeking an exceptional Senior Engineering Leader to build and lead a high-performing engineering team focused on design and development of a distributed multi-tenant AI inference SaaS platform. Platform development responsibilities include but are not limited to software design, development and testing for multiple domains such as inference engines (AI/ML, program and compiler analysis), core platform services, and observability. This role sits at the intersection of systems engineering, AI/ML operations, and product development, requiring both deep technical expertise and proven leadership capabilities. As a leader at Gruve, you'll drive the technical vision and execution of critical infrastructure that enables our AI capabilities at scale. You'll work closely with cross-functional partners including research scientists, product managers, and other engineering leaders to deliver robust, performant systems that power our AI products. This position is based in the United States and reports to the SVP of Inferencing and Infrastructure Management

Requirements

10-15+ years of software engineering experience with at least 5+ years in engineering leadership roles managing teams of 5+ engineers
Proven track record of building and scaling high-performing engineering teams in high-growth technology companies
Deep expertise in systems programming languages (C++, Go, Rust, or similar) and architecture design
Strong background in AI model design, optimization, or adjacent systems-level programming (LLVM, MLIR, XLA, or similar frameworks)
Hands-on experience with AI/ML model development, training, and inference systems
Experience with model fine-tuning techniques and deployment optimization (quantization, pruning, etc.)
Demonstrated ability to design and build production-grade APIs and distributed systems
Strong understanding of spec-driven development processes and engineering best practices
Excellent communication skills with ability to influence across all levels of the organization
Demonstrated ability to work effectively with teams across multiple time zones
Bachelor's degree in Computer Science, Engineering, or related technical field (or equivalent practical experience)

Nice To Haves

Master's or PhD in Computer Science, Machine Learning, or related field
Experience at leading AI/ML companies or research labs (OpenAI, Google DeepMind, Meta AI, Anthropic, etc.)
Direct experience with modern ML frameworks (PyTorch, JAX, TensorFlow) and their compilation stacks
Background in GPU programming (CUDA, Triton) and hardware acceleration for ML workloads
Experience with transformer architectures and large language model (LLM) inference optimization
Track record of shipping production ML systems serving millions of requests per day
Contributions to open-source compiler or ML infrastructure projects
Experience with cloud infrastructure (AWS, GCP, Azure) and containerization/orchestration (Kubernetes, Docker)
Previous experience scaling teams from <10 to 20+ engineers

Responsibilities

Team Leadership & Development: Build, mentor, and scale a world-class engineering team of 10-15+ engineers.
Foster a culture of technical Excellence, collaboration, and continuous learning.
Conduct performance reviews, career development planning, and succession planning.
Technical Strategy & Architecture: Define and execute the technical roadmap for AI inference infrastructure, AI toolchains, and AI software development.
Make critical architectural decisions that balance performance, scalability, maintainability, and cost.
Compiler Design & Optimization: Lead the development of AI inference systems and optimizations for AI workloads, including graph optimization, kernel fusion, and hardware-specific code generation to maximize inference performance.
AI Model Development & Deployment: Oversee the end-to-end lifecycle of AI models from development through production deployment, including model fine-tuning, quantization, distillation, and serving infrastructure.
Inference API & Platform Development: Drive the design and implementation of scalable, low-latency inference APIs and platforms that serve models reliably at production scale with strict SLA requirements.
Spec-Driven Development: Champion rigorous engineering practices including comprehensive technical specifications, design reviews, and documentation to ensure alignment and quality across complex projects.
Cross-Functional Collaboration: Partner effectively with research, product, and business stakeholders to translate requirements into technical solutions and communicate progress, trade-offs, and risks clearly.
Delivery & Execution: Own quarterly planning, roadmap prioritization, and on-time delivery of major initiatives.
Establish metrics and KPIs to measure team performance and system health.