Standard Chartered-posted 3 months ago
Full-time
Chennai, IN
1,001-5,000 employees
Credit Intermediation and Related Activities

The position focuses on enhancing application service and infrastructure resilience through self-healing and automated failovers, targeting a 99.99% uptime for customers. The role involves assisting in planned disruptions of production infrastructure to ensure accountability for building resilient systems, and influencing design and development teams to consider failure scenarios. Responsibilities also include identifying opportunities to eliminate manual activities through automation, enhancing scalability via capacity management, and monitoring service availability and performance. The role requires participation in post-mortem reviews and optimizing monitoring capabilities to ensure critical user service journeys are traceable.

  • Enhance application service and infrastructure resilience through self-healing and automated failovers.
  • Assist in running planned random disruptions of production infrastructure.
  • Build resilience into applications to handle underlying system failures gracefully.
  • Identify opportunities to eliminate manual and repeatable activities via tooling and automation.
  • Reduce the number of repeat incidents by fixing underlying root causes.
  • Enhance application and infrastructure scalability through capacity management.
  • Continuously monitor capacity for discrepancies or spikes.
  • Design, code, and implement fixes to improve service availability.
  • Participate in post-mortem reviews to ensure blameless opportunities for adjustment.
  • Monitor SLIs/SLOs in partnership with Product Teams.
  • Optimize monitoring to reduce false positive alerts.
  • Deepen monitoring capabilities leveraging logs, metrics, and traces.
  • Ensure all critical user service journeys are traceable end to end.
  • Ensure Production Solutions are fit for purpose and address identified gaps.
  • Lead by example and build appropriate culture and values within the team.
  • Identify key issues in business areas and implement controls to mitigate risks.
  • Ensure understanding of the risk and control environment within Technology Services.
  • Engage actively with audit issues in the support environment.
  • Assess the effectiveness of governance and oversight arrangements.
  • Display exemplary conduct and adhere to the Group's Values and Code of Conduct.
  • Lead the Production Engineering team to achieve outcomes set out in the Bank's Conduct Principles.
  • Relevant degree in Computer Science/Technology.
  • Evidence of continuous professional development in an IT role.
  • SRE Certification.
  • Experience with monitoring tools.
  • Experience with automation scripts.
  • Familiarity with microservices architecture.
  • Experience with Kubernetes.
  • Experience with AWS.
  • Competitive salary and benefits to support mental, physical, financial, and social wellbeing.
  • Core bank funding for retirement savings, medical and life insurance.
  • Flexible and voluntary benefits available in some locations.
  • Time-off including annual leave, parental/maternity leave, sabbatical, and volunteering leave.
  • Flexible working options based around home and office locations.
  • Proactive wellbeing support through a digital wellbeing platform.
  • Continuous learning culture with opportunities to reskill and upskill.
  • Inclusive and values-driven organization that embraces diversity.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service