Senior Cloud Native Platform Engineer

NscaleNew York, NY
$200,000 - $225,000Hybrid

About The Position

We’re hiring a Senior Cloud Native Platform Engineer to build, operate, and improve the cloud-native platform foundations that support AI applications and services at scale. In this hands-on platform engineering role, you’ll work on shared Kubernetes-based platforms, deployment patterns, observability foundations, infrastructure automation, and operational tooling that help internal teams run services safely and efficiently on GPU-backed infrastructure. You’ll partner closely with software engineering, infrastructure, and SRE teams to ensure platform capabilities meet real developer and operational needs. This role is important to the reliability, scalability, and usability of Nscale’s platform. You’ll take ownership of significant platform components, deliver complex technical work independently, and raise the quality of operations and engineering through practical improvements, sound technical judgement, and mentoring.

Requirements

  • Strong hands-on experience operating and improving Kubernetes-based platforms in production.
  • Solid experience with infrastructure automation, CI/CD, configuration management, or GitOps-style workflows.
  • Strong understanding of reliability engineering principles, including observability, incident response, failure analysis, and operational readiness.
  • Experience writing production-quality automation, tooling, or backend code in Go, Python, Bash, or similar languages.
  • Good Linux fundamentals, including processes, filesystems, cgroups, service behaviour, and system debugging.
  • Good networking fundamentals, including TCP/IP, DNS, routing, load balancing, and container or overlay networking concepts.
  • Experience debugging complex production issues across multiple system layers.
  • Ability to work independently on substantial technical problems while collaborating effectively with adjacent teams.
  • Experience mentoring or supporting less experienced engineers through practical technical guidance.

Responsibilities

  • Build and improve shared cloud-native platform capabilities used by internal engineering teams to run AI applications and services.
  • Own significant parts of the platform area, including Kubernetes cluster operations, workload runtime configuration, deployment workflows, observability foundations, or environment automation.
  • Improve the reliability, scalability, and supportability of platform services through practical engineering and operational enhancements.
  • Develop automation, tooling, and configuration that reduce manual effort, improve consistency, and make the platform easier to use and operate.
  • Apply software engineering where it creates leverage, including scripts, services, CI/CD automation, operational tooling, and platform integrations.
  • Improve incident prevention, detection, response, and recovery across the platform areas you support.
  • Build and refine observability for platform services, including metrics, logs, tracing, dashboards, alerts, and other useful operational signals.
  • Strengthen rollout safety, capacity awareness, failure handling, and recovery procedures for production environments.
  • Debug and resolve complex issues spanning Kubernetes, Linux, networking, storage, workload runtime behaviour, and cloud or datacentre infrastructure dependencies.
  • Enhance operational playbooks, runbooks, and engineering practices to reduce toil and increase service resilience.
  • Contribute to design discussions, code reviews, and operational standards within the platform engineering team.
  • Collaborate with software engineering, infrastructure, and SRE teams to deliver platform capabilities that are practical, supportable, and aligned to operational needs.
  • Define sensible defaults, paved roads, and supportable patterns for service deployment and runtime operations.
  • Mentor less experienced engineers in platform engineering fundamentals, operational judgement, and good automation practices.

Benefits

  • Highly competitive US compensation package (base + bonus + equity)
  • Performance reviews every 12 months
  • Dynamic progression plan tailored to your ambitions
  • Flexible workplace
  • Medical insurance
  • Dental insurance
  • Vision insurance
  • Flexible paid time off
  • Parental leave
  • Retirement plan participation
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service