Lead Site Reliability Engineer- Omaha

First National Bank of OmahaOmaha, NE
3dOnsite

About The Position

We're seeking a Lead Site Reliability Engineer to guide our team's evolution from traditional monitoring and incident response toward true SRE practices. You'll be a key member on a team focused on monitoring operations and incident management while strategically building capabilities in automation, reliability engineering, and proactive system optimization for our critical banking infrastructure and services.

Requirements

  • Bachelor's degree in Computer Science , Engineering, or related field or equivalent experience
  • 7 + years of experience in software development or site reliability
  • Proven track record managing high-availability, mission-critical systems
  • Expert-level experience with monitoring platforms
  • Strong troubleshooting and crisis response capabilities
  • Broad software development capabilities from scripting to application development to infrastructure as code
  • Familiarity with DORA metrics and DevOps measurement frameworks
  • Understanding of network protocols, databases, and system architecture
  • Understanding of deployment pipeline reliability and release engineering practices
  • Strong communication skills for executive reporting and cross-team coordination
  • Candidates must possess unrestricted work authorization and not require future sponsorship.

Responsibilities

  • Lead the design and implementation of advanced monitoring frameworks, establish monitoring best practices, and mentor teammates on leveraging monitoring tools for proactive identification and resolution of issues
  • Partner with application teams to improve observability and incident prevention
  • Architect and oversee system reliability optimization strategies with the objectives of enhancing system capabilities, ensuring superior performance for our customers, and fostering a culture of continuous innovation/improvement
  • Spearhead the growth and evolution of the Site Reliability Engineering practice at FNBO, establishing technical standards and driving adoption of SRE principles
  • Direct critical incident response activities, providing technical leadership during high-severity incidents to ensure rapid resolution and minimal service disruption
  • Identify opportunities to automate manual monitoring and response tasks
  • Champion infrastructure-as-code and automation initiatives where applicable
  • Drive toil reduction initiatives through automation and process improvement
  • Participate in architecture reviews with focus on operability , monitoring , and resiliency
  • Serve as a key participant in post-incident reviews and drive implementation of corrective actions
  • Analyze system performance trends and capacity utilization , identifying opportunities for improvement
  • Collaborate with security and engineering teams to improve system reliability
  • Support 24/7 monitoring operations for critical banking systems and applications
  • Optimize alerting strategies to reduce noise and improve mean time to detection
  • Maintain and enhance monitoring dashboards and reporting capabilities
  • Direct critical incident response activities, providing technical leadership during high-severity incidents to ensure rapid resolution and minimal service disruption
  • Provide technical governance for Change Control processes, evaluating complex changes and establishing risk mitigation strategies
  • Foster operational knowledge sharing and documentation of tribal knowledge
  • Begin establishing SLIs/SLOs for critical services in partnership with product teams
  • Introduce error budget concepts and reliability metrics and facilitate reliability/velocity trade-off discussions

Benefits

  • Medical, Dental, Vision Insurance
  • 401k, With Matching Contributions
  • Time Off Programs
  • Health Savings Account (HSA)/Dependent Care
  • Employee Banking
  • Growth Opportunities
  • Tuition Assistance
  • Short-Term/Long-Term Disability Insurance
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service