The ProCOM team is looking for a Site Reliability Engineering (SRE) who can help us solve problems, build our CI/CD pipeline and lead Mastercard in DevOps automation and best practices. This role engages in and improves the whole lifecycle of services—from inception and design, through deployment, operation and refinement. It involves analyzing ITSM activities of the platform and providing feedback to development teams on operational gaps or resiliency concerns. The role also supports services before they go live through system design consulting, capacity planning, and launch reviews, and maintains services once they are live by measuring and monitoring availability, latency, and overall system health. Additionally, it scales systems sustainably through automation and evolves systems by pushing for changes that improve reliability and velocity. The SRE will support the application CI/CD pipeline for promoting software into higher environments through validation and operational gating, and lead Mastercard in DevOps automation and best practices. This includes practicing sustainable incident response and blameless postmortems, and taking a holistic approach to problem solving during production events to optimize mean time to recover. The role involves working with a global team across multiple geographies and time zones, and sharing knowledge and mentoring junior resources. For team members supporting the Dev Ops pipeline, responsibilities include designing, implementing, and enhancing deployment automation based on Chef, using Jenkins to orchestrate builds and link to other tools, supporting deployments of code into multiple lower environments with an emphasis on automation, and designing and implementing a Git-based code management strategy.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior