About The Position

Nscale is seeking an Infrastructure Engineer with a specialization in OpenStack Ironic to join their Infrastructure Engineering team. This role is crucial for designing, implementing, and managing the infrastructure stack that supports Nscale's GPU cloud services. The specialist will focus on OpenStack bare metal provisioning and lifecycle management, emphasizing Ironic and its integrations for large-scale physical infrastructure. Responsibilities include ensuring automated provisioning, hardware onboarding, lifecycle operations, and hardware fault management. The role also involves engaging with the upstream OpenStack community to contribute to and benefit from the development of Ironic and the bare metal ecosystem. The team aims to maintain high levels of availability, scalability, automation, and security, acting as a key escalation point for support and providing subject matter expertise.

Requirements

  • Strong Linux systems administration and troubleshooting experience.
  • Deep hands-on experience deploying, operating, upgrading, and troubleshooting large-scale OpenStack environments.
  • Strong specialist knowledge of OpenStack Ironic and the surrounding provisioning ecosystem.
  • Strong understanding of bare metal provisioning concepts including PXE/iPXE, DHCP, TFTP/HTTP boot, BMC technologies, RAID configuration, firmware management, disk imaging, and node lifecycle states.
  • Strong experience with out-of-band management technologies such as Redfish, IPMI, or vendor management interfaces.
  • Strong experience designing and building automation for physical and virtual infrastructure using tools such as Ansible.
  • Strong Python and Bash skills.
  • Experience troubleshooting complex provisioning and hardware integration issues across server, network, and management layers.
  • Experience operating production infrastructure at scale with a strong focus on reliability, repeatability, and operational safety.
  • Ability to collaborate across infrastructure, support, and architecture teams to solve complex technical problems.

Nice To Haves

  • Experience contributing to or working closely with upstream open-source communities is highly desirable, particularly within OpenStack, Ironic, Metal3, or related infrastructure projects.
  • Ability to evaluate upstream changes, influence technical direction, and translate community developments into practical outcomes for production bare metal platforms.
  • Experience with GPU server platforms, hardware qualification, or large-scale bare metal cloud environments would be highly desirable.
  • Knowledge of Neutron, networking for provisioning, and the integration points between networking and bare metal deployment would be beneficial.

Responsibilities

  • Designing, implementing, and operating scalable and resilient bare metal provisioning platforms with a strong focus on OpenStack Ironic.
  • Owning the lifecycle of physical infrastructure through automated discovery, enrolment, provisioning, cleaning, deprovisioning, and hardware state management.
  • Managing and improving integrations between Ironic and related OpenStack services such as Nova, Neutron, Glance, Keystone, Placement, and supporting automation tooling.
  • Building and maintaining robust provisioning workflows for a wide range of hardware profiles, including GPU-enabled and high-performance server platforms.
  • Driving automation for hardware onboarding, firmware and BIOS configuration, deployment workflows, validation, and recovery using infrastructure-as-code and configuration management tools.
  • Troubleshooting complex issues across provisioning pipelines, PXE/iPXE, BMC interfaces, out-of-band management, image deployment, network boot, and hardware compatibility.
  • Acting as a 3rd/4th line escalation point for advanced bare metal and provisioning incidents, carrying out root cause analysis and implementing long-term fixes.
  • Supporting platform upgrades, lifecycle management, and operational improvements across Ironic and its dependencies.
  • Collaborating closely with network, compute, data centre, and support teams to ensure efficient and reliable delivery of physical infrastructure services.
  • Contributing specialist input to infrastructure roadmap planning, capacity expansion, standard builds, and hardware platform qualification.
  • Supporting pre-sales and solution design efforts by providing expert guidance on bare metal capabilities, operational models, and deployment constraints.
  • Contributing to upstream OpenStack bare metal communities, particularly Ironic and related projects, through bug reports, code contributions, testing, reviews, and design discussions where appropriate.
  • Tracking upstream roadmaps, release changes, and community direction to help shape Nscale's bare metal strategy, upgrade planning, and platform standards.
  • Representing Nscale's operational requirements, hardware use cases, and scaling challenges in upstream discussions to help drive practical improvements for both the business and the wider community.
  • Ensuring provisioning platforms and operational processes adhere to security, compliance, and operational standards.
  • Participating in on-call rotations and incident response activities for critical infrastructure services.

Benefits

  • medical
  • dental
  • vision
  • flexible paid time off
  • parental leave
  • retirement plan participation
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service