Senior Site Reliability Engineer

Mastercard•O'fallon, MO

About The Position

The ProCOM team is looking for a Site Reliability Engineering (SRE) who can help us solve problems, build our CI/CD pipeline and lead Mastercard in DevOps automation and best practices. This role engages in and improves the whole lifecycle of services—from inception and design, through deployment, operation and refinement. It involves analyzing ITSM activities of the platform and providing feedback to development teams on operational gaps or resiliency concerns. The role also supports services before they go live through system design consulting, capacity planning, and launch reviews, and maintains services once they are live by measuring and monitoring availability, latency, and overall system health. Additionally, it scales systems sustainably through automation and evolves systems by pushing for changes that improve reliability and velocity. The SRE will support the application CI/CD pipeline for promoting software into higher environments through validation and operational gating, and lead Mastercard in DevOps automation and best practices. This includes practicing sustainable incident response and blameless postmortems, and taking a holistic approach to problem solving during production events to optimize mean time to recover. The role involves working with a global team across multiple geographies and time zones, and sharing knowledge and mentoring junior resources. For team members supporting the Dev Ops pipeline, responsibilities include designing, implementing, and enhancing deployment automation based on Chef, using Jenkins to orchestrate builds and link to other tools, supporting deployments of code into multiple lower environments with an emphasis on automation, and designing and implementing a Git-based code management strategy.

Requirements

BS degree in Computer Science or related technical field involving coding (e.g., physics or mathematics), or equivalent practical experience.
Experience with algorithms, data structures, scripting, pipeline management, and software design.
Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.
Ability to help debug and optimize code and automate routine tasks.
Experience in dealing with difficult situations and making decisions with a sense of urgency is needed.
Interest in designing, analyzing and troubleshooting large-scale distributed systems.
Experience in working across development, operations, and product teams to prioritize needs and to build relationships is a must.
Experience designing and implementing an effective and efficient CI/CD flow that gets code from dev to prod with high quality and minimal manual effort is required.

Nice To Haves

Experience in one or more of the following is preferred: C, C++, Java, Python, Go, Perl or Ruby.
We need team members with an appetite for change and pushing the boundaries of what can be done with automation.
For work on our dev ops team, engineer with experience in industry standard CI/CD tools like Git/BitBucket, Jenkins, Maven, Artifactory, and Chef.
Proven experience writing chef recipes/cookbooks as well as designing and implementing an overall Chef based release and deployment process.
Experience with automation for branch management, code promotions, and version management is a plus.

Responsibilities

Engage in and improve the whole lifecycle of services—from inception and design, through deployment, operation and refinement.
Analyze ITSM activities of the platform and provide feedback loop to development teams on operational gaps or resiliency concerns.
Support services before they go live through activities such as system design consulting, capacity planning and launch reviews.
Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.
Support the application CI/CD pipeline for promoting software into higher environments through validation and operational gating, and lead Mastercard in DevOps automation and best practices.
Practice sustainable incident response and blameless postmortems.
Take a holistic approach to problem solving, by connecting the dots during a production event thru the various technology stack that makes up the platform, to optimize mean time to recover.
Work with a global team spread across tech hubs in multiple geographies and time zones.
Share knowledge and mentor junior resources.
Design, implement, and enhance our deployment automation based on Chef.
Use Jenkins to orchestrate builds as well as link to Sonar, Chef, Maven, Artifactory, etc. to build out the CI/CD pipeline.
Support deployments of code into multiple lower environments.
Design and implement a Git based code management strategy that will support multiple environment deployments in parallel.

Benefits

insurance (including medical, prescription drug, dental, vision, disability, life insurance)
flexible spending account and health savings account
16 weeks of new parent leave
up to 20 days of bereavement leave
80 hours of Paid Sick and Safe Time
25 days of vacation time
5 personal days
10 annual paid U.S. observed holidays
401k with a best-in-class company match
deferred compensation for eligible roles
fitness reimbursement or on-site fitness facilities
eligibility for tuition reimbursement