About The Position

NVIDIA’s Analytics and Data Intelligence (ADI) organization is building the next generation of GPU-accelerated data analytics, data science, and vector search systems, spanning libraries, engines, and end-to-end reference architectures. As a NVIDIAN, you will find yourself immersed in a diverse, encouraging environment where everyone is encouraged to do their best work. Come join the team and see how you can make a lasting impact on the world! We are seeking a Senior Systems Software Engineer focused on performance architecture for GPU-accelerated structured data processing. This is a high-impact individual contributor role for someone passionate about developing coordinated SQL and user-friendly interfaces across diverse CPU and GPU query engines. It involves improving performance, reliability, and workload optimization. The ideal candidate has deep experience in systems performance, compiler/runtime design, and database or dataframe execution engines. This role will focus on compiler and JIT-based execution techniques for cuDF and related analytics runtimes.

Requirements

  • Bachelor's or Master's degree in Computer Science, Computer Engineering, or a related field, or equivalent hands-on experience.
  • 12+ years of validated experience in systems performance engineering or performance-focused architecture.
  • Proven skills in profiling, instrumentation, and optimization for CPU and GPU systems, applying tools like tracing, counters, flame graphs, and kernel-level profiling.
  • Experience with compiler, JIT, code generation, query execution, or runtime optimization techniques.
  • Experience optimizing analytic database engines and/or query runtimes, including vectorized execution, join strategies, and columnar formats like Arrow and Parquet.
  • Proficient in C++ and/or Python, with a strong ability to analyze performance-critical code and implement effective solutions.
  • Experience with cuDF, RAPIDS, CUDA, Numba, LLVM, MLIR, NVRTC, or other JIT/codegen systems.
  • Experience with benchmarking frameworks, performance dashboards, and CI/CD regression gating, along with a proven grasp of modern analytics and machine learning workflows.

Nice To Haves

  • Deep familiarity with NVIDIA GPUs and GPU programming (CUDA), including memory hierarchy, concurrency, and profiling toolchains such as Nsight Systems.
  • Experience with TPC-style benchmarking (TPC-H, TPC-DS, or analogous), Click-Bench-like workloads, and building credible, repeatable performance narratives.
  • Prior work on database execution engines, especially operator fusion, query compilation, vectorized execution, or adaptive execution.
  • Demonstrated open-source contributions to performance-critical systems, including libraries, runtimes, databases, and ML or data tooling.

Responsibilities

  • Extend JIT and compiler-based execution support in cuDF and related GPU-accelerated structured data processing systems.
  • Design approaches for lowering expressions, ASTs, or query fragments into optimized GPU execution paths.
  • Investigate kernel fusion strategies across cuDF operations to reduce materialization, memory traffic, launch overhead, and end-to-end query latency.
  • Analyze structured analytics workloads to identify performance bottlenecks in expression evaluation, joins, aggregations, scans, data movement, and memory management.
  • Build benchmarks and regression tests that capture real dataframe and SQL-like workloads, from micro-benchmarks to end-to-end pipelines.
  • Collaborate with cuDF, CUDA, compiler/runtime, and query engine teams to translate workload analysis into implementation plans and architecture decisions.
  • Prototype and evaluate execution strategies inspired by high-performance database engines, including fused operators, code generation, vectorized execution, and adaptive planning.

Benefits

  • equity
  • benefits
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service