Infrastructure Support Engineer

Nscale•Seattle, WA

8d•$100,000 - $140,000•Hybrid

About The Position

Nscale is seeking an Engineer with strong people, leadership, and technical skills to ensure the efficiency, reliability, and scalability of data center infrastructure. This role involves problem-solving complex issues with ambiguity in a results-driven environment, influencing without authority, and building relationships with senior stakeholders. The ideal candidate can quickly grasp technical concepts, possesses strong analytical skills, is organized, diligent, a self-starter, curious, and learns quickly. Nscale is the GPU cloud engineered for AI, providing cost-effective, high-performance infrastructure for AI start-ups and large enterprise customers. Our Support and Operations team is critical in maintaining service availability, driving service reliability, and providing rapid response to customer tickets. We foster a culture of relentless innovation, ownership, and accountability, where team members take pride in their work and drive it with excellence and urgency. As an Nscaler, you'll build trust through openness and transparency, contributing to the technology that powers the future.

Requirements

2–4 years of experience in a support, operations, or infrastructure engineering role, ideally within a cloud, data center, or managed services environment.
Growth mindset. Curious, dependable, and collaborative. You seek feedback, ask questions, and invest in learning to progress toward Senior.
Platform and DC fundamentals. Awareness of servers, networks, storage, and virtualization concepts, ideally from a support or operations background.
Linux fundamentals. Comfortable with the CLI, services via systemd, filesystems, permissions, and basic networking tools. Able to troubleshoot common issues and know when to escalate.
Networking basics. Solid grasp of IP addressing, subnets, VLANs, routing at a high level, DNS, and firewalls.
Kubernetes exposure. Understand core concepts like nodes, pods, services, and logs. Can perform basic troubleshooting and follow runbooks.
GPU awareness. Familiar with basic diagnostics such as nvidia-smi.
Observability foundations. Able to use dashboards and alerts to identify symptoms, gather evidence, and follow runbooks.
Scripting and automation basics. Comfortable reading and writing simple Bash or Python snippets and using Git for version control.
Cloud and virtualization basics. Familiarity with common hypervisor or cloud troubleshooting flows.

Nice To Haves

Advanced networking topics like BGP or VXLAN.
Cluster-level administration experience with Kubernetes.
Experience with Ansible or Terraform.
Hands-on exposure to Kubernetes administration, operators, and storage or networking add-ons.
Deeper GPU/HPC concepts such as RDMA/InfiniBand, performant distributed workload basics, or job schedulers. Awareness and used NCCL for performance troubleshooting.
Infrastructure as Code and config management tools like Ansible or Terraform.
GitOps and CI/CD participation. Contributing to pipelines and modernizing scripts using GitHub Actions or similar.
Experience with access and security tooling used at Nscale, such as Teleport or Vault.
Progress toward relevant certifications over time (e.g., Linux, Kubernetes, cloud, or security).

Responsibilities

Join the Support duty rotation and handle day-to-day tickets and alerts, escalating early and appropriately.
Collaborate with Engineering with guidance when incidents or changes require it.
Accurately record, update, manage and resolve tickets using the ticketing system whilst keeping all parties informed of the tickets progression.
Follow established runbooks to resolve common issues. Propose improvements and contribute incremental fixes with review.
Keep tickets up to date with clear notes, next steps, and customer communications via the agreed channels.
Learn the Platform fundamentals so you can help customers get value from our services, asking for support when deeper expertise is needed.
Participate in monitoring, troubleshooting, and triage. Capture logs and facts to enable efficient handover.
Deliver assigned tasks and project work to agreed quality and timelines. Flag blockers early and seek help when needed.
Share knowledge by documenting steps you’ve validated and by contributing to training materials. Shadow seniors during complex work to build capability.
Take part in incident reviews as a contributor and help track preventative follow-ups in your scope.
Identify areas for implementation for automation to optimize processes.
Constantly endeavour to learn and upskill.
Collaborate with cross-functional teams for service improvements.
Be the escalation point for onsite operations staff.
Participate in on-call or out-of-hours work when scheduled and after onboarding.
Availability to travel to Nscale or Customer locations to assist with deployments, trouble shooting and operational tasks and attendance of supplier related training courses.

Benefits

Highly competitive package (base + equity) with reviews every 12 months.
Flexible workplace trusts Nscalers to deliver, giving you the autonomy to shape your day around life's moments.
Medical, dental, vision, flexible paid time off, parental leave, and retirement plan participation.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume