AI Senior Staff Systems Engineer

Cadence•San Jose, CA

85d•$136,500 - $253,500

About The Position

At Cadence, we hire and develop leaders and innovators who want to make an impact on the world of technology. We are seeking a highly skilled and experienced AI Systems Engineer to join our team. This is a hands-on, senior individual contributor role that will be pivotal in leading the development, operations, and support of our entire AI infrastructure. You will be responsible for the entire lifecycle of our AI systems, from architecting and building high-performance GPU clusters to deploying and optimizing our most advanced AI models and agentic services.

Requirements

10+ years of experience in a senior technical role, with at least 5 years focused on building and operating high-performance computing or AI infrastructure.
Expert-level knowledge of NVIDIA GPU architecture and technologies like CUDA and cuDNN.
Proven experience with public cloud AI services, specifically managing access, usage, and billing for Azure OpenAI and Google Cloud Platform (GCP) services.
Extensive hands-on experience with Docker: image management, container orchestration, and troubleshooting.
Proficiency in scripting languages such as Python, Bash, or Perl.
Deep expertise in Linux system administration (RHEL preferred), including networking, storage, and performance tuning.
Familiarity with user authentication and integration using systems like LDAP or Active Directory.
Strong problem-solving and communication skills with the ability to work in a multi-platform, cross-functional, and geographically distributed team.

Nice To Haves

Understanding of AI job profiling and tuning (memory, GPU, I/O).
Experience administering LSF clusters in a production or research environment.
Familiarity with other job schedulers like Slurm is a plus.
Experience with LSF Docker integration and job submission using container images.
Experience with macOS/AppleSilicon system admin tasks and troubleshooting.

Responsibilities

Lead the design and implementation of our next-generation AI infrastructure to support our Agentic AI initiatives.
Support and secure the use of public cloud AI services, including Azure OpenAI services and Google Cloud Platform (GCP) services like Gemini.
Take a leadership role in the configuration, installation, and optimization of GPU server clusters.
Architect and deploy a robust and scalable AI tech stack.
Lead the deployment, serving, and optimization of Large Language Models (LLMs).
Architect and build production-grade Agentic AI workflows and services.
Develop and maintain automation scripts using languages like Python, Bash, or Perl.
Act as the final escalation point for the most complex technical issues related to our AI infrastructure.
Develop and implement security best practices for our AI systems and data.

Benefits

Paid vacation and paid holidays
401(k) plan with employer match
Employee stock purchase plan
A variety of medical, dental and vision plan options
Incentive compensation: bonus, equity, and benefits

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Senior

Industry

Ambulatory Health Care Services

Number of Employees

5,001-10,000 employees

AI Senior Staff Systems Engineer

About The Position

Requirements

Nice To Haves

Responsibilities

Benefits

What This Job Offers

Job Search Resources

Tools

Career Hubs

Guides

Company