About The Position

Leverage specialized security governance and risk expertise to identify and address complex security risks, recommending best practices and determining new approaches that have an impact on broader security operations, while aligning strategies with business priorities Partner across teams and key stakeholders to drive security risk and governance initiatives, leading and solutioning complex projects and programs to strengthen overall security posture. Apply advanced analytical skills and sound judgment to assess and mitigate security risks, considering diverse perspectives and innovative solutions. Directly contribute to improvements within the security domain and occasionally beyond, ensuring decisions lead to meaningful enhancements in risk mitigation strategies and overall security practices. Leverage relationships across teams, both within and outside of security, to influence initiatives and integrate feedback into security governance processes and risk management practices. Develop and articulate clear plans and priorities for the team, guiding them to achieve security risk and governance objectives while fostering a collaborative and high-performance environment. Lead by example, providing mentorship and support to ensure the team successfully executes on initiatives and goals. Provide independent second-line oversight and effective challenge across infrastructure reliability domains: change/release management, configuration management, capacity planning, performance optimization, and operational resilience. Review and challenge first-line infrastructure reliability practices including change success rates, risk-based change validation procedures, configuration drift metrics, capacity forecasting models, and high availability architecture decisions. Recognized as an infrastructure reliability and resilience expert, independently addressing complex system stability challenges, performance bottlenecks, and providing strategic direction on infrastructure resilience strategies across distributed and cloud-native architectures. Validate KRIs/KPIs including failed-change rates, RTO/RPO attainment, mean time to recovery (MTTR), system availability metrics, and configuration compliance; synthesize monthly/quarterly trends and themes. Lead targeted deep-dive reviews of high-severity incident patterns, root cause analysis validation, and systemic infrastructure reliability issues; document clear risk statements, opinions, and recommendations. Assess the effectiveness of change management practices, including risk rating methodologies and appropriate validation requirements for different change types (standard, normal, emergency, and high-risk changes). Validate issue remediation plans, post-incident improvement actions, and risk acceptances; escalate where residual reliability risk exceeds appetite and track closure to completion. Prepare committee-ready reporting and dashboards; brief senior technology, security, and risk leaders on infrastructure resilience posture, emerging reliability risks, and systemic operational themes. Contribute to annual risk assessment, maturity assessments, and policy/standard maintenance for change management, configuration management, and infrastructure resilience domains. Partner with first-line infrastructure, architecture, DevOps, and SRE teams while preserving independence; provide consultative guidance that enables prudent, risk-informed infrastructure decisions.

Requirements

  • 5+ years relevant experience and a Bachelor's degree OR Any equivalent combination of education and experience.
  • 5+ years in infrastructure engineering, site reliability, or IT operations
  • 4+ years directly focused on infrastructure reliability, performance management, or operational resilience.
  • Advanced knowledge of change and release management frameworks, including ITIL change management, risk-based change assessment, CI/CD best practices, progressive deployment strategies, and automated release validation methodologies.
  • Demonstrated experience with configuration management tools and practices, infrastructure-as-code principles, configuration drift detection, and automated compliance validation.
  • Understanding of risk-based change management principles, including how changes are categorized by risk level and the corresponding validation and approval requirements for each tier.
  • Deep understanding of high availability architectures, fail-over mechanisms, disaster recovery patterns, and resilience engineering principles for large-scale distributed systems.
  • Expertise in infrastructure observability platforms, monitoring frameworks, capacity planning tools, and performance analytics solutions; experience with root cause analysis methodologies and incident trend analysis.
  • Strong knowledge of infrastructure performance metrics including RTO/RPO objectives, service level objectives (SLOs), error budgets, and reliability scoring frameworks.
  • Strong work ethic with proven ability to learn quickly, prioritize work, and manage complex deliverables to completion under established deadlines.
  • Superb consultative, adjudicative, investigative, and influencing skills, including business acumen, stakeholder empathy, and conflict resolution, as well as general comfort working in a dynamic, global, fluid, and matrixed working environment.
  • Exceptional verbal and written communication and analysis skills, including experience developing high-quality written analysis, strategy, or standards documents.
  • Unquestionable professional and ethical integrity, ideally demonstrated through experience with projects of a sensitive, privileged, or confidential nature.
  • Ability to approach and understand problems from a statistical or quantitative perspective and draw meaningful, accurate conclusions, as well as scrutinize models and inferences for misleading or overlooked considerations.
  • Degree in a relevant discipline, such as computer science, engineering, information systems, or related technical field.

Nice To Haves

  • Experience with cloud infrastructure reliability patterns, container orchestration platforms, and modern infrastructure automation tools (preferred).

Responsibilities

  • Drive security risk and governance initiatives.
  • Lead complex projects and programs to strengthen overall security posture.
  • Assess and mitigate security risks.
  • Improve risk mitigation strategies and overall security practices.
  • Influence initiatives and integrate feedback into security governance processes and risk management practices.
  • Develop and articulate clear plans and priorities for the team.
  • Provide mentorship and support to the team.
  • Provide independent second-line oversight and effective challenge across infrastructure reliability domains.
  • Review and challenge first-line infrastructure reliability practices.
  • Address complex system stability challenges and performance bottlenecks.
  • Provide strategic direction on infrastructure resilience strategies.
  • Validate KRIs/KPIs and synthesize monthly/quarterly trends and themes.
  • Lead targeted deep-dive reviews of high-severity incident patterns.
  • Assess the effectiveness of change management practices.
  • Validate issue remediation plans, post-incident improvement actions, and risk acceptances.
  • Prepare committee-ready reporting and dashboards.
  • Contribute to annual risk assessment, maturity assessments, and policy/standard maintenance.
  • Partner with first-line infrastructure, architecture, DevOps, and SRE teams.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service