Observability Engineer

Agile DefenseArlington, VA
83d

About The Position

The Observability Engineer will be responsible for ensuring operational visibility of services running within our AWS environment. This role involves designing, implementing, and maintaining observability solutions, including monitoring, logging, and alerting, to ensure our services are highly available, performant, and reliable.

Requirements

  • Experience with various root-cause analysis methodologies.
  • Proficiency in scripting and automation tools.
  • Solid understanding of cloud platforms, especially AWS.
  • Experience with monitoring and logging tools like Prometheus, Grafana, ELK stack, or Splunk.
  • Strong problem-solving skills and the ability to troubleshoot complex system issues.

Nice To Haves

  • Experience with cloud platforms and microservices.

Responsibilities

  • Design and implement observability solutions to provide end-to-end visibility into the health and performance of services running in AWS.
  • Develop and maintain monitoring, logging, and alerting systems using AWS native services and third-party tools.
  • Collaborate with development, operations, and security teams to define and implement observability best practices.
  • Troubleshoot and resolve issues related to service performance, availability, and reliability.
  • Create and maintain dashboards and reports to provide real-time insights into system health and performance.
  • Conduct root cause analysis of incidents and implement improvements to prevent recurrence.
  • Automate observability tasks to improve efficiency and reduce manual effort.
  • Stay current with industry trends and emerging technologies related to observability and cloud infrastructure.
  • Provide guidance and training to team members on observability tools and practices.

Benefits

  • Competitive and comprehensive benefits package.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service