Sr. Solution Engineer - DevOps Software Solution (27728)

Super Micro Computer, Inc.San Jose, CA
54d$170,000 - $190,000

About The Position

We are seeking a highly skilled and motivated Senior Solution Engineer to lead efforts in benchmarking, performance tuning, and platform automation for HPC and AI workloads. This role is critical to ensuring our systems meet performance targets for RFQs and support scalable, automated deployment across diverse environments.

Requirements

  • Bachelor or Master degree in Computer Science or a related field
  • Minimum 8 years of professional experiences with Python and Shell script
  • Proven experience with HPC and AI benchmarking tools (e.g., MLPerf).
  • Strong proficiency in Ansible, Terraform, Docker, and Python.
  • Hands-on experience configuring and troubleshoot Linux OS, servers and network switches.
  • Solid understanding of software-defined storage, networking (DNS, DHCP, PXE), system provisioning and state-of-the-art datacenter operations.
  • Excellent problem-solving, documentation, and collaboration skills.
  • Ability to work independently and lead technical initiatives.

Responsibilities

  • Execute performance benchmarks for HPC and AI workloads, including MLPerf, across various GPU systems.
  • Analyze and optimize system configurations to meet RFQ performance requirements.
  • Identify bottlenecks and implement tuning strategies for compute, memory, and I/O performance.
  • Scale and maintain large compute clusters for high-demand workloads.
  • Automate deployment and configuration using Ansible, Terraform, and Docker.
  • Develop infrastructure-as-code solutions to support reproducible and scalable environments.
  • Build backend services and middleware to support large-scale distributed deployments.
  • Ensure reliability, modularity, and performance of backend systems.
  • Design and maintain CI/CD pipelines to support agile development and minimize downtime.
  • Collaborate with DevOps teams to streamline software delivery and system updates.
  • Perform hands-on installation, tuning, and troubleshooting of Linux systems, especially Red Hat-based environments.
  • Manage software-defined storage and networking components including DNS, DHCP, PXE, and cluster provisioning.
  • Maintain and configure Proof-of-Concept (PoC) system components.
  • Support the system certification processes required by ISV partners.
  • Deploy and manage containerized workloads using Kubernetes.
  • Integrate container orchestration with benchmarking and automation workflows.
  • Maintain clear and comprehensive technical documentation.
  • Work closely with cross-functional teams including hardware, QA, and product management.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Industry

Computer and Electronic Product Manufacturing

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service