Senior Performance and Observability Engineer

The Hartford•Columbus, OH

22d•Hybrid

About The Position

We’re determined to make a difference and are proud to be an insurance company that goes well beyond coverages and policies. Working here means having every opportunity to achieve your goals – and to help others accomplish theirs, too. Join our team as we help shape the future. Hartford is seeking an experienced and highly motivated Sr. Performance & Observability Engineer to design and implement solutions for complex applications and infrastructure observability needs, ensuring production stability and visibility. This role focuses on driving performance and stability improvements for critical business applications, architecture, and integrations to deliver an optimal end-user experience. The engineer will collaborate closely with application development, infrastructure, database, and middleware teams (Disclaimer: This will not be traditional Performance Testing Lead position) This role will have a Hybrid work schedule, with the expectation of working in an office (Columbus, OH, Chicago, IL, Hartford, CT or Charlotte, NC) 3 days a week (Tuesday through Thursday).

Requirements

Bachelor’s or advanced degree in Information Technology, Computer Science Engineering, or related field.
7+ years of experience in performance engineering or related roles.
Expertise in Dynatrace, LoadRunner, and Splunk for performance analysis, alerting, and monitoring solutions.
Strong proficiency in performance testing processes and automation.
Experience with cloud-native technologies, including containerization, microservices, and serverless computing.
Skilled in APM tools, application server instrumentation (Java), Real User Monitoring, and Synthetic Monitoring.
Familiarity with Agile, DevOps, and Site Reliability Engineering practices.
Exceptional analytical, problem-solving, and debugging skills.
Strong communication and collaboration abilities, with experience coaching engineering teams
Candidate must be authorized to work in the US without company sponsorship.
The company will not support the STEM OPT I-983 Training Plan endorsement for this position.

Nice To Haves

Knowledge of Event Management in ServiceNow (preferred).
Dynatrace Certified Associate
AWS Solution Architect – Associate
AWS Certified AI Practitioner or higher
Splunk Core Certified Power User
Splunk Core Certified Advanced Power User
Secondary AWS Certification (optional)

Responsibilities

Observability Configuration & Maintenance: Configure and maintain observability capabilities for applications and infrastructure in partnership with SRE and AIOps teams. Assess code, configuration, and infrastructure changes for production readiness.
Performance Strategy & Optimization: Strategize, analyze, and optimize applications for performance, scalability, and availability using DevSecOps principles and modern technologies. Continuously validate load, stability, and reliability standards.
Monitoring & Analytics Implementation: Design, create, and maintain dashboards. Implement APM, log analytics, error analytics, and business analytics solutions using tools such as Dynatrace, Splunk, Akamai, and other observability platforms.
Application Stability & Trend Analysis: Monitor cloud and on-prem application stability trends and proactively identify opportunities to improve performance and availability.
Automation & Resiliency: Automate system scalability and enhance resiliency, performance, and efficiency. Recommend design changes for improved reliability and reduced operational risk.
Incident & Problem Management Support: Collaborate with Incident and Problem Managers to drive blameless RCA, postmortems, and RCRs, leveraging engineering and tool expertise.
AI-Driven Analytics: Configure and fine-tune AI-based analytics for effective root cause analysis while minimizing false positives and improving detection accuracy.
Smart Monitoring & Alerting: Develop automated monitoring and alerting solutions for software delivery and production environments using tools/frameworks like Dynatrace, ServiceNow, Ansible, Splunk, Akamai, OpenTelemetry, and AWS CloudWatch.
Innovation & Self-Healing Solutions: Contribute to next-generation alerting, problem-solving, and self-healing IT assets, reducing manual toil and improving efficiency in performance testing and engineering processes.
Continuous Improvement: Stay current with emerging technologies and best practices in observability, performance engineering, and automation, applying them to enhance system reliability and operational excellence