Sr. Software Engineer/DevOps

OmegaHiresSan Francisco, CA
2d

About The Position

you’ll work closely with our Infrastructure and Platform teams to manage, improve, and scale the systems that power our products. Your focus will be on ensuring our infrastructure is reliable, observable, and easy to operate — with an emphasis on automation, operational excellence, and cross-functional collaboration. You’ll help build and maintain the foundational infrastructure that supports our SaaS applications, including Kubernetes, Terraform-managed cloud resources, and GitHub-based CI/CD pipelines. While incident response is part of the role, the primary focus is on proactive improvements: reducing operational toil, improving visibility into system behavior, and enabling product teams to move fast with confidence.

Requirements

  • Observability: Experience with metrics, logs, and traces using tools such as Grafana, Prometheus/Mimir, OpenSearch, Sentry, or similar.
  • Infrastructure as Code: Proficient with Terraform, Kubernetes, and containerization tools.
  • Programming Skills: 5+ years of experience with Python.
  • Linux Systems: Comfortable working with Linux-based environments and writing shell scripts.
  • Communication: Strong collaboration skills with a focus on asynchronous, written communication.
  • Documentation: Commitment to clear, comprehensive documentation and process standardization.
  • Initiative: Self-starter mindset with a proactive approach to solving operational challenges.
  • Version Control: Skilled in Git/GitHub-based workflows.

Responsibilities

  • Infrastructure Management: Build, manage, and optimize infrastructure using Terraform, GitHub CI/CD, and Kubernetes.
  • Monitoring & Observability: Create visualizations and alerts that provide actionable insights using tools like Grafana, Prometheus/Mimir, OpenSearch, and Sentry.
  • Automation & Reliability: Identify manual or error-prone processes and replace them with automated, repeatable systems.
  • Production Troubleshooting: Diagnose and resolve production issues across application and infrastructure layers.
  • Documentation: Capture knowledge in runbooks, setup guides, and architecture diagrams to support operational maturity.
  • Collaboration: Partner with engineers across teams to drive adoption of DevOps and infrastructure best practices.
  • Scalability Planning: Help scale infrastructure and monitoring systems to meet growing demands.
  • Incident Participation: Participate in an on-call rotation and support incident response processes as needed.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service