Senior SRE (Onsite) Overland Park, KS

NetSmart-posted 3 months ago

Full-time • Senior

Overland Park, KS

1,001-5,000 employees

Professional, Scientific, and Technical Services

Resume

Match Score

Upload and Match ResumeTrack Jobs with Teal

The Senior Site Reliability Engineer (SRE) will serve as a senior technical contributor, responsible for advancing observability and operational maturity across hundreds of application teams. This is not a product deployment or configuration role. The SRE will work directly with application engineers and external infrastructure partners to implement distributed tracing, profiling, structured logging, and metrics collection strategies that support reliability at scale. This role requires strong software engineering fundamentals, deep knowledge of observability tooling, and the ability to work across a wide range of technology stacks and organizational boundaries. The ideal candidate is comfortable with high ambiguity, varied application environments, and time-sensitive incident response involving external stakeholders.

Partner with application teams to implement observability best practices: distributed tracing, profiling, structured logging, and metrics collection
Support instrumentation and telemetry integrations across legacy and modern architectures
Implement and support enterprise observability platforms, including Grafana, Zabbix, Splunk, and related tooling
Build and maintain centralized dashboards and alerts to improve monitoring quality and reduce operational noise
Collaborate with development teams and vendors to define SLIs, SLOs, and alert thresholds for key services
Participate in on-call rotations and serve as an escalation point during complex incidents involving external partners
Lead and contribute to post-incident reviews with a focus on observability gaps, telemetry accuracy, and long-term remediation
Create and maintain documentation, templates, and onboarding materials for standardized observability integration
Provide mentorship to mid-level engineers and guide application teams through complex observability challenges

5+ years in Site Reliability Engineering, DevOps, or Infrastructure Engineering roles supporting production systems
Strong software development experience in Python, Go, Java, or C#
Demonstrated success implementing observability solutions in production environments
Hands-on experience with Grafana, Zabbix, Splunk, OpenTelemetry, or comparable tools
Deep understanding of telemetry data structures (logs, metrics, traces) and their use in troubleshooting distributed systems
Experience participating in incident response and remediation
Strong communication skills and ability to work directly with third-party vendors and managed service providers

Experience supporting observability in mixed technology environments (.NET, Linux, Windows Server, Kubernetes, monoliths and microservices)
Familiarity with CI/CD systems and Git-based workflows
Familiarity with OpenTelemetry Collector and custom instrumentation patterns
Experience onboarding large application portfolios into centralized observability platforms
Understanding of operational SLIs/SLOs and alerting strategies across heterogeneous systems

Track Jobs with Teal

Job Search Resources

•

Resume Builder

•

Resume Examples

•

Cover Letter Examples

Senior SRE (Onsite) Overland Park, KS

Job Search Resources

Tools

Career Hubs

Guides

Company