Software Engineer - AI Infra Visibility

Clockwork.ioPalo Alto, CA
6d

About The Position

We are looking for a strong Software Engineer to help design, build, and scale backend systems for AI and GPU cluster observability . In this role, you will work on high-performance distributed systems that power telemetry ingestion, data processing, and APIs for monitoring large-scale GPU clusters and AI workloads.

Requirements

  • 2+ years of industry experience building and operating production software systems.
  • Strong foundation in data structures, algorithms, and software design.
  • Fluency in one or more programming languages: C, C++, Go, Java, or Python .
  • Solid understanding of operating systems fundamentals (threads, scheduling, synchronization; kernel programming is a plus).
  • Experience with databases , including design, development, or scaling.
  • Excellent debugging, problem-solving, and communication skills.

Nice To Haves

  • Knowledge of networking protocols ; familiarity with NIC architecture and operation.
  • Understanding of GPU or AI infrastructure (e.g., DCGM, PyTorch).
  • Familiarity with observability systems (metrics, logs, traces); experience with OpenTelemetry, Prometheus, or distributed tracing is a bonus.
  • Experience designing, building, and scaling large distributed systems .
  • Hands-on experience with service-oriented architectures and cloud platforms (AWS, GCP, Azure)
  • Enjoy Challenging projects.

Responsibilities

  • Design and build scalable backend systems for metric collection, processing, and analysis.
  • Develop robust methods to detect complex infrastructure issues that impact AI workloads.
  • Build large distributed systems running in production environments.
  • Collaborate across teams to deliver reliable, performant, and maintainable systems.

Benefits

  • A friendly and inclusive workplace culture.
  • Competitive compensation.
  • A great benefits package.
  • Catered lunch.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

No Education Listed

Number of Employees

11-50 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service