About The Position

Core42, a leader in AI-powered cloud and digital infrastructure, is driving transformative technology solutions globally. Leveraging advanced resources and partnerships, Core42 empowers clients to harness sovereign AI infrastructure, especially in sectors with stringent regulatory needs. With a mission to redefine digital transformation, we combine sovereign capabilities with scalable, high-performance compute infrastructure, positioning itself at the forefront of AI innovation in the Middle East and beyond. We are building an enterprise-grade GPU compute platform that abstracts large-scale GPU and HPC infrastructure into a secure, multi-tenant, developer-friendly service. This platform runs on top of Kubernetes- and Slurm-backed infrastructure and exposes a unified control plane for provisioning, scheduling, authentication, billing, and observability. This is a deep platform engineering role at the intersection of distributed systems, Kubernetes internals, and infrastructure automation. You will design, build, and operate core services that directly control how GPU resources are allocated and consumed at scale. You will own production systems end-to-end and work closely with infrastructure and performance engineers operating GPU clusters at hardware level.

Requirements

  • 4–7 years of software engineering experience in backend, platform, or infrastructure roles.
  • Strong backend engineering experience in Python (FastAPI), Go, or Node.js.
  • Hands-on experience with Kubernetes in production environments.
  • Experience building and operating REST and/or gRPC APIs.
  • Strong understanding of microservices architecture and cloud-native systems.
  • Experience with PostgreSQL schema design, performance, and migrations.
  • Familiarity with authentication/authorization systems (OAuth2, SAML, JWT, RBAC).
  • Experience working on systems that require high reliability and correctness under failure conditions.
  • Ability to operate independently in ambiguous or greenfield environments.

Nice To Haves

  • Experience with GPU infrastructure, HPC environments, or AI/ML platforms.
  • Experience with Kubernetes controllers, operators, Helm, or cluster lifecycle tooling.
  • Exposure to Slurm or hybrid Kubernetes/HPC scheduling systems.
  • Experience with observability stacks (Prometheus, Grafana, OpenTelemetry).
  • Experience building developer platforms or internal infrastructure tools.
  • Familiarity with MLOps tooling (Kubeflow, MLflow, PyTorch pipelines).
  • Experience with GitOps workflows (ArgoCD, Flux, etc.).
  • Experience working at cloud providers or infrastructure-heavy SaaS companies.
  • Exposure to distributed scheduling systems or resource orchestration platforms.
  • Experience with high-scale multi-tenant systems.

Responsibilities

  • Design, build, and operate core GPUaaS control plane services.
  • Develop backend APIs and microservices (Python, Go, or Node.js).
  • Integrate deeply with Kubernetes APIs for provisioning, scheduling, and multitenancy.
  • Build and maintain authentication, authorization, and identity systems (OAuth2, SSO, RBAC, LDAP).
  • Design and implement usage tracking and billing systems with strong correctness guarantees.
  • Design PostgreSQL schemas optimized for scale, auditing, and reliability.
  • Build CI/CD pipelines and deployment automation for platform services.
  • Collaborate with infrastructure teams to surface GPU and system telemetry.
  • Own systems in production including reliability, failure modes, and performance.

Benefits

  • With a diverse team of 1,100+ employees from 68 nationalities, we foster an inclusive, innovative, and collaborative environment.
  • At Core42, we are grounded in trust, accountability, and high performance.
  • We are united by our values: Grit, Passion, and Impact—driving resilience, excellence, and meaningful progress across everything we do.
  • Core42 is committed to building a diverse and inclusive workplace.
  • As an equal opportunity employer, Core42 does not discriminate based on race, national origin, gender, gender identity, sexual orientation, protected veteran status, disability, age, or any other legally protected status.
  • In compliance with the Americans with Disabilities Act (ADA), we provide reasonable accommodations to qualified individuals with disabilities throughout the application and employment process.
  • If you need assistance or a reasonable accommodation, please contact [email protected], including the role you are applying for and the accommodation required.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service