Site Reliability Engineer

CADRE GOVERNMENT SOLUTIONS

About The Position

The Site Reliability Engineer will support a large federal technology modernization effort focused on improving the reliability, visibility, and performance of cloud-native applications and services across a national benefits platform. This role focuses heavily on observability, telemetry, monitoring, and performance engineering within a modern serverless environment. You’ll work closely with development, operations, and platform teams to help build the standards, tooling, and engineering patterns used across serverless services. This role goes beyond writing Lambda code. You’ll help define how services are instrumented, deployed, monitored, and optimized across the platform.

Requirements

  • 3+ years of experience supporting cloud-native applications, performance engineering, or observability platforms
  • Experience with AWS serverless technologies including Lambda, CloudWatch, API Gateway, and related services
  • Experience with observability and monitoring platforms such as Dynatrace, Splunk, Grafana, or similar tools
  • Familiarity with OpenTelemetry or AWS ADOT instrumentation practices
  • Experience supporting CI/CD pipelines and automated deployment workflows
  • Understanding of distributed systems, logging, tracing, and performance analysis concepts
  • Experience troubleshooting performance issues across cloud environments
  • Familiarity with scripting or development languages such as Python, JavaScript, TypeScript, or Java
  • Understanding of DevOps and Agile software delivery practices
  • Strong communication and collaboration skills
  • Ability to obtain and maintain a Public Trust or other required government clearance

Nice To Haves

  • Federal government experience is a plus

Responsibilities

  • Build and maintain observability, telemetry, logging, and monitoring solutions for serverless applications and services
  • Support performance analysis, troubleshooting, and optimization efforts across distributed cloud environments
  • Develop and maintain engineering patterns and standards for AWS Lambda services
  • Implement instrumentation using AWS Distro for OpenTelemetry (ADOT)
  • Support monitoring and alerting capabilities using Dynatrace, Splunk, and related observability tools
  • Work with development and DevOps teams to integrate monitoring and telemetry into CI/CD pipelines
  • Assist with diagnosing system bottlenecks, latency issues, and application performance concerns
  • Support automated testing, deployment, and operational readiness activities
  • Help improve operational visibility, tracing, and logging consistency across environments
  • Participate in Agile delivery activities, release coordination, and operational support efforts

Benefits

  • 401(k) Safe Harbor Plans with Matching & Immediate Vesting
  • Medical, Dental, & Vision Plans
  • Paid Time Off: Holidays, Vacation, Wellness, & Personal Leave Plans
  • Continuing Education & Training Budget
  • Office & Technology Budget
  • Cell Phone Budget
  • Wellness & Healthy Living Budget
  • Awards & Bonuses
  • Profit Sharing Plans
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service