Grafana Observability Architect

Wipro Ltd.Tampa, FL
29d

About The Position

Job Summary: We are seeking a highly skilled and motivated Grafana Observability Architect with experience in design, implementation, and optimization of observability solutions using the Grafana ecosystem. The ideal candidate will work closely with platform engineers, SREs, developers, and business stakeholders to ensure end-to-end visibility into system performance, reliability, and user experience across distributed systems.

Requirements

  • Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience).
  • experience in DevOps, SRE, or infrastructure automation roles.
  • hands-on experience with Grafana and dashboard development.
  • Strong proficiency in scripting languages (Python, Bash, Go).
  • Experience with monitoring tools (Grafana Cloud, Prometheus, Loki, Dynatrace, Splunk, etc.).
  • Deep understanding of CI/CD, and cloud platforms (AWS and Azure).
  • Expertise in Kubernetes, Docker, and container orchestration.
  • Familiarity with security and compliance in automated environments.
  • Hands-on experience with OpenTelemetry instrumentation and data collection.

Nice To Haves

  • Grafana certification or equivalent experience.
  • Experience with custom Grafana plugins or panel development.
  • Knowledge of business intelligence tools and data visualization principles.
  • Contributions to open-source DevOps or observability projects.
  • Strong communication and stakeholder management skills.
  • Experience with OpenTelemetry Collector configuration and integration.
  • Familiarity with distributed tracing concepts.

Responsibilities

  • Architect and implement observability platforms using Grafana, Tempo, Loki, Mimir, and Prometheus.
  • Design and maintain scalable telemetry pipelines using OpenTelemetry and Grafana Agent.
  • Define and enforce observability standards, SLIs/SLOs, and alerting strategies.
  • Collaborate with application and infrastructure teams to instrument services for metrics, logs, and traces.
  • Develop reusable dashboards and templates for performance monitoring and incident response.
  • Design and implement visually compelling and data-rich Grafana dashboards for Observability.
  • Integrate Grafana Cloud with data sources such as Prometheus, Loki, ServiceNow, PagerDuty, Snowflake, AWS
  • Integrate telemetry data sources such as Tomcat, Liberty, Ping, Linux, Windows, and databases (Oracle, PostGres) and REST API.
  • Create alerting mechanisms for SLA breaches, latency spikes and transaction anomalies.
  • Develop custom panels and alerts to monitor infrastructure, applications, and business metrics.
  • Collaborate with stakeholders to understand monitoring needs and translate them to define KPIs and visualization needs.
  • Optimize dashboard performance and usability across teams.
  • Implement and manage OpenTelemetry instrumentation across services to collect distributed traces, metrics, and logs.
  • Integrate OpenTelemetry data pipelines with Grafana and other observability platforms.
  • Develop and maintain OpenTelemetry collectors and exporters for various environments.
  • Develop and implement monitoring solutions for applications and infrastructure to ensure high availability and performance.
  • Collaborate with development, operations, and other IT teams to ensure monitoring solutions are integrated and aligned with business needs.
  • Architect, design and maintain CI/CD pipelines using tools such as Jenkins, Bitbucket, and Nexus.
  • Implement Infrastructure as Code (IaC) using Terraform and Ansible.
  • Automate deployment, scaling, and monitoring of both cloud-native and on-premises environments.
  • Ensure system reliability, scalability, and security through automated processes.
  • Collaborate with development and operations teams to streamline workflows and reduce manual intervention.
  • Act as a technical advisor on automation and observability best practices.
  • Lead initiatives to improve system performance, reliability, and developer productivity.
  • Conduct training sessions and create documentation for internal teams.
  • Stay current with industry trends and emerging technologies in DevOps and observability.
  • Advocate for and guide the adoption of OpenTelemetry standards and practices across engineering teams.
  • Optimize monitoring processes and tools to enhance efficiency and effectiveness.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Industry

Professional, Scientific, and Technical Services

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service