Senior Site Reliability Engineer

InterContinental Recruiting•San Jose, CA

51d

About The Position

If you are passionate about new technologies, have a strong technical background, and are looking for an engaging environment where you can continuously expand your knowledge, you are the right fit for this role. PayPal SRE is looking for a quality-driven software engineer who is ready to constantly challenge themselves by working on a wide variety of DevOps technologies. The Embedded Site Reliability Engineering (eSRE) Team is a select group of PayPal technologists always striving to do the right thing for all our internal and external customers. The team provides a unique opportunity to learn and impact the SRE organization directly, but also PayPal products teams and initiatives, often working side by side with our product teams to launch new features with a focus on scalability and reliability

Requirements

BS degree in Computer Science or related technical field involving coding (e.g., physics or mathematics), or equivalent practical experience.

Nice To Haves

Experience with algorithms, data structures, complexity analysis and software design.
You have a passion for software development and technology.
Deep understanding and working knowledge of networking principles, internet fundamentals, Operating Systems and application stacks.
You are comfortable using Linux command line.
Familiarity with configuration management and automation (e.g. Puppet, Ansible)
Using the "Scientific Method" to dig into problems is something you love to do.
In addition to well-developed skills in industry standard software development languages (e.g. java, javascript), you should be capable in at least one scripting language (e.g. python).
Familiarity with infrastructure/cloud technologies (GCP, AWS, Azure) preferred
Demonstrated expertise in Devops concepts and the SRE lifestyle
Experience with GitOps and OaC preferred.
Demonstrable knowledge of terraform, Jenkins, artifactory, a strong plus.
Strong debugging and analytical skills
You strive to work in a dynamic, changing environment, with large scale applications across the stack.
Finally, you must be determined to have fun, otherwise it's just a job.

Responsibilities

Work directly with Product Development teams on features, operations and reliability engineering, to improve the outcomes that our customers deserve. Your contributions will make a difference in the production code that serves 240+ million users and 17+ million merchants.
Work independently and within a team to triage and remediate production system and application incidents while practicing balanced incident responses.
Enable our customers by serving as a first-responder for our systems and applications. You will lead constructive retrospective sessions to help us enhance the whole lifecycle of services—from inception and design, through deployment, operation and refinement.
Develop and improve production monitoring and management capabilities using existing platforms and tools.
Work with the Technical Duty Officer and other support teams internally and externally in our Command Center to take on problems, escalate, and resolve critical site incidents.
Be the primary enabler in the reduction of Failed Customer Interactions to raise our availability to all of our users.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume