Senior SRE

RenishawAlpharetta, GA
3d

About The Position

About the Business: LexisNexis Risk Solutions is the essential partner in the assessment of risk. Within our Business Services vertical, we offer a multitude of solutions focused on helping businesses of all sizes drive higher revenue growth, maximize operational efficiencies, and improve customer experience. Our solutions help our customers solve difficult problems in the areas of Anti-Money Laundering/Counter Terrorist Financing, Identity Authentication & Verification, Fraud and Credit Risk mitigation and Customer Data Management. You can learn more about LexisNexis Risk at the link below, https://risk.lexisnexis.com About This Role: This role directly shapes the reliability and usability of a core internal platform. Your work will reduce operational burden across the organization, enable partner teams to move faster with confidence, and improve the long term health of our Kubernetes ecosystem. If you enjoy solving hard reliability problems, simplifying complex systems, and helping others succeed on a shared platform, this role is a strong fit.

Requirements

  • Strong hands on experience operating Kubernetes in production, ideally Azure Kubernetes Service
  • Practical experience across core SRE practices such as monitoring, alerting, incident response, capacity planning, and automation
  • Solid understanding of distributed systems behavior, failure modes, and dependency management
  • Experience automating infrastructure and operations using tools such as Terraform, Helm, GitHub Actions
  • Proficiency with at least one programming or scripting language used for automation and tooling (Python, Bash)
  • Experience designing systems that favor reliability, simplicity, and clear ownership over ad hoc fixes
  • Comfort participating in on call rotations and leading or supporting incidents in a calm, structured way
  • Ability to influence without authority and work effectively with multiple partner teams
  • A mindset oriented toward root cause analysis, long term fixes, and continuous improvement

Nice To Haves

  • Familiarity with service meshes, ingress patterns, and zero trust networking concepts
  • Experience with cloud cost optimization in Kubernetes environments
  • Prior exposure to internal platform or enablement teams

Responsibilities

  • Own reliability and resilience outcomes for an internal AKS fleet used by multiple partner teams
  • Design, implement, and improve Kubernetes platform capabilities such as cluster lifecycle management, workload isolation, autoscaling, and safe multi tenancy
  • Lead and execute toil reduction initiatives through automation, self service workflows, and strong platform defaults
  • Build and evolve observability across metrics, logs, and traces, with a focus on distributed system dependencies and actionable signals
  • Improve incident response by automating detection, recovery, and mitigation to protect service levels
  • Participate in an on call rotation, act as an incident responder, and support others during high impact events
  • Contribute to SRE processes such as incident reviews, error budgets, and reliability planning using practical experience
  • Provide informal mentorship and technical guidance to junior SREs and engineers on partner teams
  • Collaborate with security, networking, and application teams to align platform standards and reduce cross team friction
  • Continuously identify opportunities to simplify architecture, reduce operational overhead, and optimize cloud cost

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

No Education Listed

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service