Senior Software Engineer, Kubernetes Platform and Fabric Integration

Cornelis Networks, Inc.Austin, TX
23hRemote

About The Position

Cornelis Networks delivers the world’s highest performance scale-out networking solutions for AI and HPC datacenters. Our differentiated architecture seamlessly integrates hardware, software and system level technologies to maximize the efficiency of GPU, CPU and accelerator-based compute clusters at any scale. Our solutions drive breakthroughs in AI & HPC workloads, empowering our customers to push the boundaries of innovation. Backed by top-tier venture capital and strategic investors, we are committed to innovation, performance and scalability - solving the world’s most demanding computational challenges with our next-generation networking solutions. We are a fast-growing, forward-thinking team of architects, engineers, and business professionals with a proven track record of building successful products and companies. As a global organization, our team spans multiple U.S. states and six countries, and we continue to expand with exceptional talent in onsite, hybrid, and fully remote roles. Cornelis Networks is seeking a talented and experienced Senior Software Engineer, Kubernetes Platform and Fabric Integration to join our team and lead the integration of our advanced fabric management software with modern, cloud-native orchestration platforms. In this role, you will be responsible for architecting and developing the bridge between our existing cluster management tools and the Kubernetes ecosystem. You will design, build, and maintain Kubernetes operators, controllers, and other components necessary to ensure our high-performance interconnect solutions can be seamlessly deployed, managed, and scaled in containerized environments. This is a critical role that will directly impact our customers' ability to leverage Cornelis Networks' technology in large-scale, modern data centers.

Requirements

  • 5+ years of professional software development experience.
  • Proven experience in designing and developing solutions for Kubernetes, including building custom operators/controllers using tools like the Operator SDK or Kubebuilder.
  • Strong proficiency in Go. Experience with C++ or Python is also valuable.
  • Deep understanding of Kubernetes architecture, including the control plane, networking (CNI), and storage (CSI) interfaces.
  • Hands-on experience with container technologies such as Docker or containerd.
  • Demonstrable experience in integrating existing software platforms or services with Kubernetes.
  • Bachelor's or Master’s degree in Computer Science, Computer Engineering, or a related technical field.

Nice To Haves

  • Experience with high-performance computing (HPC) or high-performance networking.
  • Familiarity with performance-sensitive environments and low-latency application requirements.
  • Experience with monitoring and observability stacks like Prometheus, Grafana, and Fluentd.
  • Knowledge of CI/CD principles and experience building deployment pipelines.
  • Contributions to open-source projects in the Kubernetes or cloud-native ecosystem.

Responsibilities

  • Architect and Design: Lead the design of robust, scalable solutions for integrating Cornelis Networks' platform and fabric management software with Kubernetes.
  • Develop Kubernetes Operators: Build and maintain custom Kubernetes Operators and Controllers in Go to manage the lifecycle of our software and hardware components within a cluster.
  • Cloud-Native Integration: Develop solutions that allow for the seamless orchestration of our high-performance fabric services and platform management tools alongside other containerized workloads.
  • Cluster Management: Work on extending Kubernetes for managing specialized hardware, scheduling, and networking requirements unique to HPC and AI workloads.
  • Collaborate: Partner with the core platform, fabric, and hardware teams to ensure a cohesive and performant end-to-end solution.
  • Upstream Contribution: Engage with the open-source community and contribute to relevant projects within the cloud-native ecosystem.
  • Documentation and Best Practices: Author high-quality technical documentation and champion best practices for software development in a cloud-native environment.
  • Leverage AI-powered tools to accelerate software development workflows, including intelligent code generation, refactoring, and performance optimization.
  • Apply AI-driven techniques for automated code review, testing, and quality assurance to improve reliability and reduce development cycles.

Benefits

  • equity
  • cash
  • incentives
  • health and retirement benefits
  • medical, dental, and vision coverage
  • disability and life insurance
  • dependent care flexible spending account
  • accidental injury insurance
  • pet insurance
  • paid holidays
  • 401(k) with company match
  • Open Time Off (OTO)
  • sick time
  • bonding leave
  • pregnancy disability leave
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service