About The Position

We are seeking a motivated and detail-oriented Infrastructure Automation & Observability Engineer to join our core Developer team. In this role, you will be instrumental in automating our infrastructure deployments, maintaining a robust observability posture, and actively leading our agile ceremonies. This is a fantastic opportunity for an engineer who is passionate about Infrastructure as Code (IaC), telemetry, and modern DevOps practices to work with enterprise-scale technologies and grow their engineering skillset.

Requirements

  • 4+ years of experience in DevOps, Infrastructure Automation, Platform Engineering, or Site Reliability Engineering (SRE).
  • Hands-on experience building and maintaining Grafana dashboards, telemetry visualizations, and observability solutions.
  • Practical experience developing and maintaining Ansible playbooks for infrastructure provisioning, configuration management, and automation.
  • Experience configuring monitoring and alerting using Prometheus and Grafana.
  • Proficiency with Git version control, peer code reviews, and CI/CD workflows.
  • Working knowledge of Python and/or Bash scripting.
  • Solid Linux administration skills (Ubuntu, RHEL, or similar).
  • Experience using Jira in Agile/Scrum environments.

Nice To Haves

  • Experience with Kubernetes and containerized workloads.
  • Familiarity with metrics storage platforms such as Prometheus, Mimir, or Thanos.
  • Basic understanding of GitLab CI or other CI/CD platforms.
  • Knowledge of Infrastructure as Code (IaC) and DevOps best practices.
  • Familiarity with ITIL processes, change management, and enterprise operational workflows.
  • Certified ScrumMaster (CSM), Professional Scrum Master (PSM I), or equivalent Scrum Alliance/Scrum.org certification.

Responsibilities

  • Design, build, and maintain Grafana dashboards and telemetry visualizations to monitor system performance, latency, error rates, saturation, and overall platform health.
  • Configure and maintain observability solutions, including Prometheus monitoring and alerting, ensuring critical service metrics and SRE Golden Signals are effectively tracked.
  • Develop, test, and maintain modular Ansible playbooks to automate infrastructure provisioning, application configuration, patching, and operational workflows.
  • Support enterprise automation through AWX by managing centralized execution, automation templates, RBAC, and reusable infrastructure workflows.
  • Maintain Infrastructure as Code (IaC) repositories using Git, following best practices for version control, peer code reviews, and CI/CD-driven automation.
  • Actively participate in Agile ceremonies, including Sprint Planning, Daily Stand-ups, Backlog Refinement, and Retrospectives, contributing to sprint execution and continuous improvement.
  • Collaborate with Product Owners, Scrum Masters, and engineering teams to translate business requirements into actionable user stories while driving automation, observability, and platform reliability initiatives.

Benefits

  • health insurance
  • language courses
  • relocation program
  • professional development opportunities
  • certification programs
  • mentorship and talent investment programs
  • internal mobility
  • internship opportunities
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service