Cloud Monitoring/Observability Engineer

EquifaxAlpharetta, GA
115dRemote

About The Position

Equifax is where you can power your possible. If you want to achieve your true potential, chart new paths, develop new skills, collaborate with bright minds, and make a meaningful impact, we want to hear from you. We are looking for an experienced professional to join the Operational Resilience organization to work on the Global Monitoring and Observability team. In this exciting role you will assist Equifax SREs and Developers with ensuring the stability of our critical applications and infrastructure, set global standards, develop related automation and compliance reporting. The ideal candidate will have a strong background in observability and a passion for building robust, automated and proactive monitoring capabilities. This role requires participation in a two-week 24/7 on-call rotation every 6-8 weeks. Equifax has a hybrid work schedule that allows for 2 days of remote work (Monday and Friday), with 3 required onsite days (Tuesday, Wednesday, Thursday) every week. This role will work the required onsite days at our Equifax office in Alpharetta, Georgia. This position does not offer immigration sponsorship (current or future) including F-1 STEM OPT extension support. This position is not open to third-party vendors or C2C.

Requirements

  • BS degree in Computer Science or related technical field.
  • 5-7 years of related experience in Site Reliability Engineering or general monitoring and observability.
  • Strong experience with GCP Cloud Services and architecture.
  • Expertise with observability tools (Datadog, GCP Cloud Monitoring) for full-stack monitoring, including hands-on experience with PagerDuty for incident management.
  • Solid understanding of SRE principles (SLOs, SLIs) and a track record of automating tasks with scripting (Python, Go) and CI/CD tools (Terraform, Jenkins).
  • Experience with visualization tools (Grafana, Looker Studio).

Nice To Haves

  • Experience with ITIL processes, particularly in incident, problem and change management.
  • Experience working in a regulated industry, such as financial services.
  • Familiarity with containerization and orchestration technologies (e.g., Docker, Kubernetes).
  • Cloud Certification Strongly Preferred.

Responsibilities

  • Design and implement monitoring solutions for GCP environments, including Compute Engine, GKE, Cloud Functions and Cloud Storage.
  • Configure and maintain monitoring tools (Datadog, GCP Cloud Monitoring, Cloudwatch) for comprehensive application and infrastructure monitoring, including metrics, logs and traces.
  • Establish and improve operational resilience by creating, implementing and maintaining governance processes and participate in on-call rotations, monitoring and alerting.
  • Develop reporting processes to track adherence to policies across our tooling, such as PagerDuty, Datadog and other cloud monitoring platforms.
  • Enforce Site Reliability Engineering (SRE) principles, focusing on governance of reliability, observability and performance.
  • Automate monitoring tasks, alerting configurations and incident response workflows.
  • Collaborate with development, operations and security teams to improve system reliability and availability.

Benefits

  • Comprehensive compensation and healthcare packages.
  • 401k matching.
  • Paid time off.
  • Organizational growth potential through our online learning platform with guided career tracks.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Industry

Administrative and Support Services

Education Level

Bachelor's degree

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service