Lead Infrastructure Engineer (SRE)

Wells Fargo & Company•Chandler, AZ

1d•Hybrid

About The Position

Wells Fargo is seeking a Lead Infrastructure Engineer in Technology as part of COO Tech. The team will drive technology transformation and adoption of SRE aligned enterprise capabilities and products, launch new tooling enablement, automate away complex issues and integrate with the latest technology. Site Reliability Engineers leverage their experience as software and systems engineers to ensure applications onboarded to SRE are available, have full stack observability, introduce continuous improvement through code and automation, provide operational insight through analytics, continuously test, are integrated with CI/D and work with application teams to ensure products and service we provide are always on.

Requirements

5+ years of Technology Infrastructure Engineering and Solutions experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
3+ years of experience using Observability Tools
2+ years of application production support experience
2+ years of experience with Confluence or Jira

Nice To Haves

Experienced with Site Reliability Engineering (SRE)
2+ years of database logging and monitoring concepts experience
2+ years of experience with Application performance, monitoring and optimization using Blazemeter, JMeter, Splunk and AppDynamics
2+ years of experience with scripting languages such as Bash, PowerShell, Python, Shell, VBScript, or JavaScript
Experience and understanding of AIOPS and related tools such as MoogSoft or Big Panda
Experience with one or more automation tools such as Ansible.
Experience with Container technologies: Kubernetes, Docker, PKS

Responsibilities

Help drive Site Reliability Engineering capabilities at Wells Fargo Collection Services igniting the practice, principles, and culture leading by example.
Assist in training skilled engineers by growing the practice within Collection Services and partnering with peer platform embedded SRE teams
Leverage enterprise capabilities, tools, and innovation improving availability in a complex ecosystem by evolving observability, monitoring, logging, synthetic monitoring and chaos engineering
Evolve our environment introducing self-healing and autonomic capabilities solving for complex operational and systemic issues with precision including building and training models, automating cognitive processes to improve availability of products we provide to customers
Automate key SRE metrics and IT Service Operations processes including customer impact, % availability of critical business flows, SLO/SLI adherence, error budget, automate incident process for IT Service Operations through data integrating with unified communications, and alerting/notification systems
Share support responsibilities for critical applications and customer journeys onboarded to SRE including remediation of issues through Agile, conduct blameless post mortems, root cause analysis and introduce continuous improvement solving problems once and for all with the goal of no repeats

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume