AI Software Engineer: Intelligent Data Infrastructure

NetApp•San Jose, NC

1d•Hybrid

About The Position

We are assembling an elite team of AI Infrastructure Engineers to build the future of intelligent data systems. We don't just store data—we architect the substrate that powers AI factories, from GPU clusters running training workloads to real-time inference pipelines serving billions of requests. This is a creation role where you will design the systems that enable enterprises to deploy AI at an unprecedented scale, leveraging NetApp's new AI Data Engine (AIDE) and AFX disaggregated storage architecture. You will shape what comes next and join us to solve challenges that exist at the absolute frontier of computer science.

Requirements

Mastery of Golang, Python, and C/C++.
Intuitive understanding of file systems, advanced data structures, and algorithms.
Deep knowledge of AI Infra: Kubernetes, operating systems, Storage systems and distributed systems.
Ability to build simplified performance models, identify bottlenecks through deep analysis, and design for scalability from first principles.
8+ years of software development experience with a focus on systems, infrastructure, or storage technologies.
Expert-level proficiency in Golang, Python, and C/C++.
Deep understanding of Linux kernel development, file systems, and distributed systems.
Experience with performance analysis, optimization techniques, and building quantitative models.
Familiarity with AI/ML infrastructure concepts: GPU computing, model serving, data pipelines, vector databases.
Bold Ideas, Grounded Execution: Dream big but ship with precision.
Relentless Curiosity: An insatiable desire to understand how things work and how to make them better.
Collaborative Excellence: Elevate everyone around you through mentorship, knowledge sharing, and constructive feedback.
AI Fluency: Confidence in using AI tools to accelerate all aspects of your work.

Nice To Haves

Building or optimizing storage systems for AI/ML workloads.
Working with high-performance computing (HPC) environments or GPU clusters.
Knowledge of network protocols, RDMA, and high-speed interconnects.
Experience with agile methodologies and rapid prototyping.

Responsibilities

Lead the architecture of next-generation storage systems optimized for AI workloads.
Design high-performance data pipelines for massive-scale model training.
Implement intelligent caching for KV stores.
Optimize data planes for GPU clusters.
Investigate novel approaches to scalable AI inferencing systems, semantic data discovery, and data curation systems.
Turn proof-of-concepts into production systems that redefine industry standards.
Leverage AI coding assistants and generative tools to accelerate workflow, automate repetitive tasks, and focus on solving problems.
Own problems end-to-end, from architecture to production operations.
Design storage and networking systems that connect structured and unstructured data to LLMs with unprecedented performance, enabling real-time inference and massive-scale training.
Build the next generation of ONTAP capabilities, focusing on AI-specific optimizations like vector store integration, semantic search, and automated data curation.
Develop systems capable of TB/s throughput and EB-scale data management, supporting the world's most demanding AI factories.
Pioneer internal tooling and workflows that use AI to accelerate development, from automated code review to intelligent debugging systems.
Partner with hardware engineers, product managers, and researchers to deliver groundbreaking intelligent storage solutions.