Software Intern (GPU Optimization)

KLA•Milpitas, CA

16d•$47 - $57

About The Position

KLA is a global leader in diversified electronics for the semiconductor manufacturing ecosystem. Virtually every electronic device in the world is produced using our technologies. No laptop, smartphone, wearable device, voice-controlled gadget, flexible screen, VR device or smart car would have made it into your hands without us. KLA invents systems and solutions for the manufacturing of wafers and reticles, integrated circuits, packaging, printed circuit boards and flat panel displays. The innovative ideas and devices that are advancing humanity all begin with inspiration, research and development. KLA focuses more than average on innovation and we invest 15% of sales back into R&D. Our expert teams of physicists, engineers, data scientists and problem-solvers work together with the world’s leading technology providers to accelerate the delivery of tomorrow’s electronic devices. Life here is exciting and our teams thrive on tackling really hard problems. There is never a dull moment with us. Group/Division With over 40 years of semiconductor process control experience, chipmakers around the globe rely on KLA to ensure that their fabs ramp next-generation devices to volume production quickly and cost-effectively. Enabling the movement towards advanced chip design, KLA's Global Products Group (GPG), which is responsible for creating all of KLA’s metrology and inspection products, is looking for the best and the brightest research scientist, software engineers, application development engineers, and senior product technology process engineers. The Broadband Plasma Division (BBP) provides market-leading patterned wafer optical inspection systems for leading-edge IC manufacturing. Logic, foundry, and memory customers depend on BBP products to detect yield-critical defects for process debug and excursion monitoring at advanced process nodes. BBP flagship products include the 29xx and 39xx series which leverage Broadband Plasma technology to capture a wide range of defects with ultimate sensitivity at the optical inspection speeds needed for inline defect monitoring. Job Description/Preferred Qualifications We are seeking a summer intern to join our advanced wafer inspection algorithm team, with a primary focus on GPU acceleration and CUDA code optimization. The intern will work on profiling, tuning, and redesigning CUDA kernels to maximize throughput and minimize latency in image processing and machine learning workflows used in semiconductor wafer inspection. This role offers hands-on experience in high-performance computing. The intern will contribute to production-grade algorithm deployment, collaborating closely with algorithm engineers to integrate optimized GPU modules into real-time inspection pipelines.

Requirements

Requires less than 1 year of related experience

Nice To Haves

Education: Currently enrolled in a Master or PhD program in Computer Science, Electrical Engineering, Applied Physics, or a related field.
Technical Skills: Strong proficiency in CUDA, C/C++, and Python.
Deep understanding of GPU architecture, memory hierarchy, and parallel computing principles.
Demonstrated expertise in GPU performance analysis and optimization using advanced profiling tools such as NVIDIA Nsight Compute, Nsight Systems, and CUDA Visual Profiler, with a proven ability to identify and resolve memory bottlenecks and kernel inefficiencies.
Domain Knowledge: Background in image processing, computer vision, or machine learning.
Other: Strong analytical and problem-solving skills.
Ability to work independently and collaboratively in a fast-paced environment.
Excellent communication and presentation skills.

Responsibilities

Design, implement, and optimize CUDA kernels for high-throughput image analysis and defect detection.
Profile and tune GPU workloads using NVIDIA Nsight Compute, Nsight Systems, and other performance tools.
Collaborate with algorithm engineers to integrate GPU/CUA optimized modules into inspection pipelines.
Explore advanced optimization techniques such as memory optimization, thread and block configuration optimization, instruction-level optimization, kernel fusion and launch overhead reduction, algorithmic optimization and hardware-specific tuning.
Analyze performance bottlenecks and propose architectural improvements.
Document technical findings and present results to cross-functional teams.

Benefits

KLA’s total rewards package for employees may also include participation in performance incentive programs and eligibility for additional benefits including but not limited to: medical, dental, vision, life, and other voluntary benefits, 401(K) including company matching, employee stock purchase program (ESPP), student debt assistance, tuition reimbursement program, development and career growth opportunities and programs, financial planning benefits, wellness benefits including an employee assistance program (EAP), paid time off and paid company holidays, and family care and bonding leave.
Interns are eligible for some of the benefits listed.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume