About The Position

Coupang Intelligence Cloud (CIC) builds and operates the compute platform powering Coupang's AI/ML and large-scale workloads. We are now extending the platform from serverless container-based workload support to virtualization. We are building a VM offering on top of KVM with a tuned Linux kernel. We leverage hardware and software assisted virtualization to get baremetal-like performance within our VMs. We are building a OVN/OVS-based SDN that can be offloaded to the DPU. We are looking for a senior technical leader to drive it. As a Senior Staff Engineer, you will own the hypervisor, host kernel, and DPU layers of our multi-tenant VM platform. You will architect the QEMU-KVM stack, use Nvidia-DOCA for virtualization, design the SDN data plane using OVS and OVN, lead the GPU passthrough strategy for GPU and Infiniband, including NVSwitch and Shared NVLink topologies. You will be partnering closely with engineering leadership in the US, Korea, and China. This is a hands-on individual contributor role with significant technical scope and the opportunity to shape the architecture from the ground up.

Requirements

  • 12+ years of systems software engineering experience, with at least 6 years focused on virtualization, hypervisors, and/or the Linux kernel
  • Deep, hands-on expertise with QEMU-KVM internals — virtio, vhost-user, machine types, CPU topology, NUMA pinning, hugepages, live migration
  • Strong Linux kernel proficiency — KVM, vfio, vhost, namespaces, cgroups, netfilter, eBPF, scheduler, memory management; comfortable reading and patching kernel code
  • Experience building, customizing, or maintaining a production Linux kernel — config tuning, patch management, backports, and (ideally) upstream contributions
  • Production experience with Open vSwitch — OpenFlow pipeline design, datapath performance tuning (DPDK or kernel datapath), conntrack, debugging at scale
  • Strong working knowledge of OVN — logical switches/routers, ACLs, distributed gateway routers, NB/SB databases
  • Solid networking fundamentals: VXLAN, GENEVE, BGP, EVPN, L2/L3 routing, multicast, MTU/MSS handling
  • Strong systems programming in C and/or Go; contributes to large open-source codebases
  • Track record of leading complex, cross-team technical initiatives end-to-end

Nice To Haves

  • Upstream contributions to the Linux kernel (KVM, vfio, vhost, networking, scheduler, or mm subsystems)
  • Experience with KubeVirt or other Kubernetes-native virtualization platforms
  • GPU virtualization experience — SR-IOV, vGPU, PCIe passthrough, IOMMU groups, NVSwitch/NVLink topology on NVIDIA H100/H200/B200
  • Production experience operating BGP EVPN fabrics (Arista EOS, Cumulus, or SONiC)
  • Upstream contributions to OVS, OVN, QEMU, libvirt, or DPDK
  • Experience with cloud-init, Cloud Hypervisor, or Firecracker
  • Experience designing for hyperscale environments — thousands of hypervisors, tens of thousands of VMs, multi-region

Responsibilities

  • Own the hypervisor stack end-to-end — QEMU-KVM, libvirt, host kernel, and design such that the virtualization logic can be entirely offloaded to the DPU.
  • Drive Linux kernel strategy for hypervisor hosts — kernel version selection, custom patches, KVM/vfio/vhost subsystem tuning, scheduler and memory tuning for VM workloads, backporting fixes, and contributing patches upstream
  • Debug and resolve issues across the full virtualization stack — guest, QEMU, KVM, host kernel, and hardware — including performance regressions, livelock, and corner cases that surface only at fleet scale
  • Architect and own the multi-tenant VM and BareMetal platform, including VM/BareMetal lifecycle. Design and implement the SDN data plane using OVS and OVN — OpenFlow pipeline design, VXLAN/Geneve tunneling, distributed routing, and per-tenant network isolation.
  • Lead the GPU virtualization strategy: SR-IOV, PCIe passthrough, IOMMU/NUMA topology, and Shared NVLink via NVIDIA Fabric Manager on B200/GB300/RV200.
  • Drive technical decisions across squads — write design docs, lead design reivew sessions, partner with networking, storage, and platform teams in the US, Korea, China, and India
  • Set the technical bar for code review, design, and operational excellence; mentor senior and staff engineers
  • Own production reliability for the virtualization platform — define SLOs, drive incident response, and meet a 99%+ availability target as the platform scales

Benefits

  • Patent opportunities
  • conference talks
  • upstream open-source contributions are actively encouraged
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service