Senior Site Reliability Engineer

TrinetAtlanta, GA
75d$87,600 - $175,300

About The Position

This position will be responsible for supporting TriNet's mission critical platforms by identifying and driving improvements in infrastructure & system reliability, performance, high availability, observability, and overall stability of the platform by leveraging the key SRE foundational principles such as operations as code, removing toil, as well as fail fast through proactive monitoring.

Requirements

  • Typically 5+ years experience in Site Reliability Engineering, infrastructure management, or a related field.
  • Typically 3+ years of experience in public cloud (AWS, Azure etc), and container technologies.
  • Typically 3+ years of experience in Java, Python, or other major programming languages.
  • Bachelor's Degree or equivalent experience preferred.

Nice To Haves

  • Hands-on experience with Ansible or Terraform and building services in AWS.
  • Experience working with IaC tools like Terraform, Ansible and managing Kubernetes services, including HELM.
  • Good knowledge of REST APIs, OAuth, OpenID Connect (OIDC), and SAML, with proven experience in implementing secure authentication and authorization mechanisms.
  • Hands on experience with container technologies such as Docker, Kubernetes.
  • Knowledge of various network protocols like IPv4/6 TCP/IP, FTP, SMTP, UDP, SSL and HTTP/HTTPS.
  • Practical understanding of messaging technologies such as ActiveMQ, RabbitMQ etc.
  • Ability to leverage monitoring / logging analytics tools such as Prometheus, Grafana, Splunk and AppDynamics.
  • Ability to architect applications & solutions that are Highly Available, Scalable and Highly fault tolerant.
  • Ability to be cool-headed while troubleshooting Production issues on Incident bridges, ability to focus on problem resolution.

Responsibilities

  • Ability to debug and optimize code written by others and automate routine tasks to improve operational efficiency.
  • Incorporates Observability as part of day-to-day operations.
  • Guides reliability practices through activities including architecture reviews, code reviews, capacity/scaling planning, security vulnerability remediations.
  • Evaluate software versions of the tech stacks and implement upgrades to remediate vulnerabilities, improve security posture, and avail the latest enhancements/features.
  • Conducts, coordinates, and oversees post-incident Root Cause Analysis / Reviews.
  • Participates in on-call rotation for the services owned by the team, effectively triaging and resolving production and development issues.
  • Code level debugging on issues escalated to the team.
  • Creates and updates runbooks and scripts for Tier I/Tier II Operations teams.
  • Performs other duties as assigned.
  • Complies with all policies and standards.

Benefits

  • Medical, dental, and vision plans.
  • Life and disability insurance.
  • 401(K) savings plan.
  • Employee stock purchase plan.
  • Eleven (11) Company observed holidays.
  • PTO and a comprehensive leave program.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Industry

Administrative and Support Services

Education Level

Bachelor's degree

Number of Employees

1,001-5,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service