Lead Site Reliability Engineering - Network

JPMorgan Chase & Co.Columbus, OH
1d

About The Position

Assume a critical role in defining the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability. As a Lead Site Reliability Engineer at JPMorgan Chase within the Network Product, you hold a leadership role in your team, demonstrate strong knowledge across multiple technical domains, and advise others on the technical and business issues facing them. Take lead and conduct resiliency design reviews, break up complex problems into digestible work for other engineers, act as a technical lead for medium to large-sized products, and provide advice and mentoring to other engineers.

Requirements

  • Advanced proficiency in network reliability engineering, including Permit to Operate, FMEA, and operational readiness processes.
  • Experience leading technologists to manage and solve complex network issues at a firmwide level.
  • Ability to influence team culture by championing innovation and change for success.
  • Proficiency in SD-WAN, cloud platforms (AWS, Azure, etc.), and major network technologies (Palo Alto, Juniper, F5, Broadcom, Arista, Cisco, etc.).
  • Proficiency in observability and monitoring tools such as Grafana, SevOne, Prometheus, Kibana, ThousandEyes, and Splunk.
  • Demonstrated proficiency in troubleshooting and supporting complex networking environments, including Tier-3 operational support for major incidents.
  • Experience with continuous integration and delivery tools (e.g., Jenkins, GitLab, Terraform, etc.).
  • Formal training or certification in network engineering concepts and 5+ years of applied experience.
  • 10+ years of experience leading technologists to manage and solve complex technical items within your domain of expertise.
  • Experience in scalable networking design, including high availability, redundancy, failover, and load balancing.
  • Experience troubleshooting networking protocols such as TCP/IP, HTTPS, and BGP.
  • Experience in customer-facing migration, including service discovery, assessment, planning, execution, and operations.

Nice To Haves

  • CCIE
  • Load-balancing
  • SD-WAN
  • Observability tools
  • eBPF
  • Cloud certs

Responsibilities

  • Demonstrates expertise in network reliability principles, including Permit to Operate, FMEA, and operational readiness, balancing new features, efficiency, and stability.
  • Collaborates closely with network engineering teams (Datacenter, Firewall, Proxies, DMZ, Load Balancing, etc.) and Lines of Business to ensure alignment and optimal outcomes.
  • Drives the adoption of network reliability best practices and robust observability across the organization, empirically demonstrating improvements through stability and reliability metrics.
  • Acts as the bridge between Engineering, Operations, DevOps, and customers to build and maintain resilient, scalable, and secure network services.
  • Tier-3 network support, providing operational support for major incidents and ensuring rapid resolution and root cause analysis.
  • Fosters a culture of continual improvement, soliciting real-time feedback to enhance the customer and user experience.
  • Ensures knowledge sharing and collaboration across teams, avoiding duplication of work and promoting innovation.
  • Conducts blameless, data-driven post-mortems and regular team debriefs to enable learning from both successes and failures.
  • Documents and shares knowledge, innovations, and best practices via internal forums, communities of practice, and industry conferences.
  • Works with internal specialists, product, and engineering teams to package approaches, best practices, and lessons learned into thought leadership, methodologies, and published assets.
  • Interacts with business, partners, and customer technical stakeholders to manage project scope, priorities, deliverables, risks and issues, and timelines for successful client outcomes.
  • Demonstrates and champions site reliability culture and practices and exerts technical influence
  • Leads initiatives to improve the reliability and stability of your team’s applications and platforms using data-driven analytics to improve service levels
  • Collaborates with team members to identify comprehensive service level indicators and stakeholders to establish reasonable service level objectives and error budgets with customers
  • Demonstrates a high level of technical expertise within one or more technical domains and proactively identifies and solves technology-related bottlenecks in your areas of expertise
  • Acts as the main point of contact during major incidents for your infrastructure and demonstrates the skills to identify and solve issues quickly to avoid financial losses
  • Documents and shares knowledge within your organization via internal forums and communities of practice

Benefits

  • We offer a competitive total rewards package including base salary determined based on the role, experience, skill set and location.
  • Those in eligible roles may receive commission-based pay and/or discretionary incentive compensation, paid in the form of cash and/or forfeitable equity, awarded in recognition of individual achievements and contributions.
  • We also offer a range of benefits and programs to meet employee needs, based on eligibility.
  • These benefits include comprehensive health care coverage, on-site health and wellness centers, a retirement savings plan, backup childcare, tuition reimbursement, mental health support, financial coaching and more.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

No Education Listed

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service