About The Position

Senior Manager Software Engineering, Global Payment Network (SRE) Do you love building and pioneering in the technology space? Do you enjoy solving complex business problems in a fast-paced, collaborative, inclusive, and iterative delivery environment? At Capital One, you'll be part of a big group of makers, breakers, doers and disruptors, who solve real problems and meet real customer needs. We are seeking a Senior Manager Software Engineering (SRE) who is passionate about finding and fixing inefficiencies to solve our reliability and performance issues. You will lead a team of SRE engineers that focus on availability, latency, performance, efficiency, change and problem management, monitoring, emergency response, and capacity planning of our services. Your projects will deliver enhanced infrastructure, development, and deployment automation at Capital One. What You’ll Do: SRE Manager with hands-on experience of the technical domain to effectively lead a team of high-performing engineers. You will lead a team responsible for providing SRE services to many Card and Bank application teams. Collaborate with digital product managers, and help drive reliability into robust cloud-based solutions. Utilize programming languages like Python, Go, shell scripting (Unix/Linux), and Java with container Orchestration services including Docker and Kubernetes, and a variety of AWS tools and services.

Requirements

  • Bachelor’s Degree
  • At least 6 years of experience in software engineering (Internship experience does not apply)
  • At least 1 year experience with cloud computing (AWS, Microsoft Azure, Google Cloud)
  • At least 4 years of people management experience

Nice To Haves

  • Experience in analyzing applications for resiliency, ensuring appropriate SRE activities are performed throughout the development lifecycle.
  • Experience in running production, including incident management activities: Able to lead major incident bridges, drive root cause analysis and coordinate cross functional teams to quickly restore service.
  • Involvement in 24x7 production support
  • Ability to own uptime & performance SLA’s for large scale distributed systems
  • Hands on experience leading problem management activities, including driving post incident reviews, and identifying solutions that will prevent issues from reoccurring.
  • Experience with monitoring, observability tools used in high availability production environments.
  • Ability to identify and lead initiatives to automate operational activities.
  • An understanding of distributed systems, cloud architecture and modern application development.
  • Experience in one or more general purpose programming languages: Python, Go, shell scripting (Unix/Linux), Java
  • Working knowledge in container technology (OpenShift, Kubernetes), hybrid cloud and AWS

Responsibilities

  • Lead a team of SRE engineers that focus on availability, latency, performance, efficiency, change and problem management, monitoring, emergency response, and capacity planning of our services.
  • Lead a team responsible for providing SRE services to many Card and Bank application teams.
  • Collaborate with digital product managers, and help drive reliability into robust cloud-based solutions.
  • Utilize programming languages like Python, Go, shell scripting (Unix/Linux), and Java with container Orchestration services including Docker and Kubernetes, and a variety of AWS tools and services.
  • Lead major incident bridges, drive root cause analysis and coordinate cross functional teams to quickly restore service.
  • Lead problem management activities, including driving post incident reviews, and identifying solutions that will prevent issues from reoccurring.
  • Identify and lead initiatives to automate operational activities.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service