Software Engineer

Super Micro Computer, Inc.•San Jose, CA

37d•$153,000 - $167,000•Onsite

About The Position

This role contributes directly to the development of GPU-accelerated software used in internal tooling pipelines and in customer-facing solution deployments. The engineer produces optimized code paths targeting NVIDIA and AMD GPUs, implements kernel-level improvements, and integrates software components with hardware engineering platforms. The position operates in a multi-vendor environment, supporting performance-critical workloads that require precise tuning, profiling, and validation across architectures. This role will be based in our headquarters located in San Jose, CA.

Requirements

BS in EE/CS/CIS
3-5 years of experience.
Proficiency with CUDA, HIP, GPU compute fundamentals, and parallel programming constructs.
Experience with PyTorch, TensorFlow, or comparable ML frameworks for GPU-backed execution.
Competence with kernel profilers and performance analysis toolchains (Nsight Systems/Compute, rocprof, OmniPerf).
Understanding of GPU memory models, shared-memory usage, warp/wavefront execution, and optimization for latency and throughput.
Strong C/C++ and Python development capability with clean, maintainable coding practices.
Familiarity with Linux development environments, build systems, and driver-level debugging fundamentals.

Nice To Haves

Master degree in EE/CS/CIS
Exposure to ROCm, Triton, OpenCL, or domain-specific kernel DSLs.
Understanding of GPU virtualization, containerized execution environments, and distributed or multi-GPU communication (NVLink, Infinity Fabric, PCIe topology).
Experience in customer-oriented engineering roles, pre/post-sales technical support, or solution-integration environments.
Knowledge of mixed-precision strategies and architecture-specific tuning for inference or training workloads.
Background working with system-level performance bottlenecks involving memory bandwidth, NUMA layouts, and accelerator interconnects.

Responsibilities

Implement and optimize GPU-accelerated code using CUDA, HIP, and vendor SDKs.
Port workloads and kernels between NVIDIA and AMD GPU platforms with minimal regression.
Develop internal benchmarking suites, diagnostics, and performance tooling for engineering teams.
Collaborate with hardware engineering, solution architects, and customer-facing groups to align software behavior with system-level constraints.
Profile, debug, and validate performance of GPU workloads using Nsight, rocprof, OmniPerf, and related tools.
Maintain clear documentation for kernels, toolchains, and multi-GPU execution paths.
Contribute to continuous integration pipelines for GPU-targeted builds and tests.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume