Controls and Systems Software Engineer

General MatterLos Angeles, CA
2h$100,000 - $200,000

About The Position

We are seeking a highly capable DevOps / Site Reliability Engineer to help build and operate the software systems underpinning uranium enrichment R&D and production infrastructure. This role is foundational to our reliability, safety, and developer velocity. You will be responsible for designing and maintaining observability, alerting, and developer productivity systems, and for ensuring that critical internal and production services are correctly instrumented and monitored. We are only interested in candidates with strong fundamentals, sound judgment, and the ability to operate with rigor in a production environment where failures matter.

Requirements

  • Strong fundamentals in web service development and distributed systems
  • Solid understanding of networking concepts, DNS, TLS/certificate management, and HTTP
  • Experience operating and debugging production systems
  • Familiarity with observability tools (metrics, logging, alerting) and incident response
  • Ability to write clear, maintainable code and automation scripts
  • Demonstrated ownership, attention to detail, and sound technical judgment

Nice To Haves

  • Experience with modern observability stacks (e.g., Prometheus, Grafana, OpenTelemetry, Datadog)
  • Hands-on experience with cloud infrastructure and infrastructure-as-code
  • Exposure to CI/CD pipelines and developer tooling at scale
  • Experience supporting safety-critical or high-reliability systems
  • Strong debugging skills across application, OS, and network boundaries
  • Prior on-call experience in a production environment

Responsibilities

  • Design, implement, and maintain observability and alerting systems across critical services and infrastructure
  • Ensure all production and internal services are properly instrumented with metrics, logs, and traces
  • Own and maintain developer productivity tools, CI/CD systems, and internal platforms
  • Participate in an on-call rotation and respond to production incidents with urgency and discipline
  • Lead incident reviews and drive long-term reliability improvements
  • Automate operational workflows to reduce manual toil and improve system resilience

Benefits

  • access to medical, vision & dental coverage
  • access to a 401(k) retirement plan
  • long-term incentives, in the form of stock options

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

No Education Listed

Number of Employees

11-50 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service