About The Position

We are sharing a specialised part-time consulting opportunity for CUDA and GPU programming professionals experienced in kernel optimization, C++ engineering, profiler-guided performance analysis, GPU hardware utilization, and technical review. This role supports current and upcoming remote consulting opportunities focused on GPU kernel optimization, performance evaluation, CUDA/HIP review, profiler metric analysis, C++ and Python workflows, and high-quality project execution. Selected professionals will apply their GPU programming expertise to analyze kernels, identify performance bottlenecks, improve implementation quality, and document optimization decisions across modern hardware environments.

Requirements

  • Strong practical experience with GPU programming and kernel optimization
  • Fluency in core C++ features through C++17
  • Working knowledge of Python and Git
  • Fluency in at least one GPU programming model, such as CUDA, HIP, Slang, HLSL, GLSL, or related kernel programming
  • At least 1 year of professional or graduate-level research experience working with GPUs
  • Strong understanding of GPU profiler performance metrics and how to use them to optimize kernels
  • Ability to work independently on technical review and optimization tasks
  • Availability to work at least 20 hours per week depending on project scope

Nice To Haves

  • Experience with CUDA, HIP, CUDA C++ Core Libraries, inline PTX assembly, or tensor core-level optimization
  • Experience optimizing kernels for NVIDIA Blackwell hardware or other modern GPU architectures
  • Familiarity with Nsight Compute or comparable GPU profiling tools
  • Prior experience with GPU hardware organizations such as NVIDIA, AMD, Qualcomm, or similar technical environments
  • Open-source contributions related to GPU kernel optimization, HPC, compiler tooling, graphics, or performance engineering

Responsibilities

  • Analyze and optimize GPU kernels for performance, efficiency, and hardware utilization
  • Review kernel implementations and identify bottlenecks in memory access, occupancy, throughput, or execution patterns
  • Improve performance outcomes using CUDA, HIP, shader programming, or related GPU programming models
  • Optimize kernels even when limited background context is available for the underlying algorithm
  • Use profiler metrics such as L2 cache hit rate, L2 throughput, occupancy, memory behavior, and related performance signals
  • Evaluate when specific profiler metrics are useful, misleading, or secondary to other optimization factors
  • Document optimization decisions clearly and explain tradeoffs in technical terms
  • Calibrate performance judgments against structured benchmarks, hardware constraints, and project-specific criteria
  • Write, modify, and reason about C++17, Python, and GPU programming code
  • Review code for correctness, performance impact, maintainability, and optimization potential
  • Use Git-based workflows to manage technical materials and project submissions
  • Apply practical GPU programming expertise across CUDA, HIP, Slang, HLSL, GLSL, or related kernel programming environments

Benefits

  • Competitive hourly compensation
  • Remote structure
  • Flexible scheduling
  • Weekly payments via Stripe or Wise
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service