Cloud Monitoring Engineer (Remote)

Oxley Enterprises®, Inc.Stafford, VA
$102,075 - $156,237Remote

About The Position

Provide the eyes-on-glass excellence a mission-critical Department of Veterans Affairs (VA) platform demands. As a Cloud Monitoring Engineer, you will build, tune, and maintain the observability stack tracking latency, error rate, saturation, volume, and incident-free availability across 300+ applications. The Cloud Monitoring Engineer builds and maintains the Capabilities and Services Dashboard and supports monitoring infrastructure, ensuring automated alerting detects production issues before user-reported tickets arrive.

Requirements

  • 5 years of experience in cloud monitoring and observability engineering
  • Excellent experience building and maintaining dashboards displaying latency, error rate, saturation, volume, and incident-free availability in real-time (e.g., Dynatrace, Splunk)
  • Excellent knowledge of the four Golden Signals
  • Excellent ability to individually monitor and track latency, error rate, saturation, volume, and incident-free time
  • Excellent ability to implement dependency tracking within monitoring dashboards including latency, error rates, and transaction volumes
  • Excellent experience configuring automated alerts reflecting meaningful degradation or disruption while minimizing false positives
  • Excellent ability to maintain an accurate, complete, and auditable log of all alerts including alerted system, cause, timestamps, corrective actions, and responsible system
  • Above average experience supporting 24/7 monitoring operations and coordinating with on-call Site Reliability Engineers (SREs) during active incidents
  • Above average knowledge of AWS CloudWatch and integration with third-party observability tools in a GovCloud environment
  • Experience supporting federal government programs and enterprise-scale applications operating in cloud-based or hybrid environments
  • Excellent verbal and written communication skills
  • Active Federal Civilian Public Trust clearance
  • U.S. Citizenship or Permanent Resident that has lived in the United States for at least 3 years

Nice To Haves

  • Dynatrace Associate certification or equivalent observability platform certification

Responsibilities

  • Builds and maintains the Capabilities and Services Dashboard displaying real-time latency, error rate, saturation, volume, and incident-free availability for all capabilities and services
  • Implements dependency tracking within the dashboard including latency, error rates, and transaction volumes for all capability and service dependencies
  • Configures and tunes automated monitoring and alerting mechanisms ensuring personnel are alerted to production issues prior to receipt of user-reported tickets
  • Ensures all capabilities and services are individually monitored and tracked for latency, error rate, saturation, volume, and incident-free time
  • Maintains an accurate, complete, and auditable alert log including alerted system, description, timestamps, corrective actions, and responsible system
  • Continuously tunes alert thresholds to reflect meaningful degradation or disruption while minimizing false positives
  • Supports the Capabilities and Services Monitoring Plan defining alert conditions, thresholds, notification mechanisms, and escalation paths
  • Coordinates with on-call SREs and the Monitoring and Incident Manager during active incidents to provide real-time dashboard data and historical trend analysis
  • Implements additional or revised dashboard metrics

Benefits

  • Medical, dental, vision and prescription drug coverage for you and your family.
  • Life Insurance, short-term disability and long-term disability paid for by the Company.
  • Supplemental coverages including Accident, Critical Illness, and Hospital.
  • Additional Life insurance coverage for you and your dependents.
  • 401k plan with various options to select based on your retirement goals.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service