Lead Software Engineer - SRE

Wells Fargo & CompanySt. Louis, MO
Hybrid

About The Position

Wells Fargo is seeking a Lead Site Reliability Engineer (SRE) to join the WIMT Platform team. This role is responsible for driving the stability, resiliency, performance, and security of mission-critical platforms that support Wells Fargo Advisors, First Clearing firms, and FINET practices. As a Lead SRE, you will provide hands-on technical leadership across incident management, automation, observability, and reliability engineering, with a strong focus on proactive risk mitigation and continuous improvement. You will help define and enforce reliability standards while partnering closely with Application Development, Product, Business, and Enterprise teams to ensure operational excellence throughout the full-service lifecycle. This role is ideal for a highly motivated engineer with deep experience operating large-scale, production systems who takes ownership, values accountability, and is passionate about building resilient, enterprise-grade platforms.

Requirements

  • 5+ years of Software Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • 5+ years of experience leading observability and monitoring tooling - Splunk, AppDynamics, Splunk Observability, Grafana, Open Telemetry
  • 5+ years in infrastructure (windows and Linux) support
  • 5+ years proven success in toil reduction initiatives
  • 5+ years in cloud application management especially OpenShift Container Platform

Nice To Haves

  • 5+ Years’ experience in SRE, public & private cloud technologies, Java performance tuning, capacity optimization for mission critical applications
  • Working knowledge of multiple programming languages (e.g., Java, JavaScript, Ruby, Python, JSON, Angular, NodeJS)
  • Hands-on experience with cloud and platform technologies such as AWS, PCF, PKS, Kubernetes, OpenShift, Linux, Azure, Windows, and VMware
  • Strong verbal, written, and interpersonal communication skills for effective collaboration across teams
  • Ability to engage with and influence stakeholders at various organizational levels
  • Expert experience on monitoring tools – Prometheus, Grafana, AppDynamics, Glassbox, Splunk
  • Advanced experience in one or more scripting languages - Python, Shell scripting etc
  • Strong knowledge of Kubernetes, OCP and troubleshooting skills
  • Strong grasp of Java performance concepts (heap, GC) and critical monitoring metrics for Java apps
  • Ability to identify manual tasks in the processes and automating them to reduce toil

Responsibilities

  • Design and implement scalability, reliability, and observability strategies for cloud and on-premise environments
  • Define SLIs (Service Level Indicators), SLOs (Service Level Objectives), and Error Budgets to improve system reliability
  • Provide vision, direction and expertise to leadership on implementing innovative and significant business solutions
  • Maintain knowledge of industry best practices and new technologies and recommend innovations that enhance operations or provide a competitive advantage to the organization
  • Strategically engage with all levels of professionals and managers across the enterprise and serve as an expert advisor to leadership
  • Review and analyze complex, large-scale technology solutions for tactical and strategic business objectives, enterprise technological environment, and technical challenges that require in-depth evaluation of multiple factors, including intangibles or unprecedented technical factors
  • Drive adoption of NFRs, best practices-quality and compliance across observability and performance engineering
  • Ensure high availability and performance of production systems through proactive monitoring and incident response
  • Collaborate and consult with key technical experts, senior technology team, and external industry groups to resolve complex technical issues and achieve goals
  • Lead projects, teams, or serve as a peer mentor

Benefits

  • Health benefits
  • 401(k) Plan
  • Paid time off
  • Disability benefits
  • Life insurance, critical illness insurance, and accident insurance
  • Parental leave
  • Critical caregiving leave
  • Discounts and savings
  • Commuter benefits
  • Tuition reimbursement
  • Scholarships for dependent children
  • Adoption reimbursement

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Senior

Education Level

No Education Listed

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service