Site Reliability Engineer (SRE)

Hyundai CapitalPlano, TX
1d

About The Position

The Site Reliability Engineer (SRE) plays a critical role in implementing monitoring solutions, and ensuring the reliability, availability, and performance of the enterprise infrastructure and application services. This role will collaborate with cross functional teams to ensure systems are robust, automated, and capable of supporting rapid growth and innovation.

Requirements

  • Minimum 5-7 years progressive experience in Site Reliability Engineering, DevOps, or Systems Engineering roles. Experience with configuration management and automation tools (e.g., Terraform, Ansible, Chef, Puppet).
  • Bachelor’s degree in Computer Science, Information Systems, Engineering, or related technical field.
  • Hands on experience in observability, monitoring, logging, and alerting tools (e.g., Dynatrace, Solarwinds, Prometheus, Grafana, Loki, FluentD/Bit, OpenTelemetry, etc.).
  • Deep understanding of networking, security, and system architecture principles.
  • Strong programming/scripting skills in languages such as Bash or Python.
  • Demonstrated experience with Enterprise Monitoring and Observability Tools (Commercial or Open Source).
  • Proficiency with cloud platforms (e.g., AWS, Azure, GCP) and containerization technologies (e.g., Docker, Kubernetes).
  • Exposure to Legacy Infrastructure and Application Stacks.
  • Exposure to modern infrastructure and application environment - Docker, Kubernetes, OpenShift, or containerized platforms both on-prem or on the Cloud (AWS, Azure or GCP)
  • Proven experience leading L1/L2/L3 operational and engineering teams in a 24×7 environment.
  • Proven skills in ITIL processes (Change, Incident, Problem, Release).
  • Proven track record in incident management and support.
  • Proven track record of driving RCA, MTTR reduction, and operational excellence.
  • Familiarity with CI/CD tools such as Jenkins, Bitbucket, Git, Nexus, Artifactory, and automated deployment pipelines to build and maintain Tibco deployments.
  • Proficient with Microsoft Office Suites, including PowerPoint.
  • Excellent problem-solving skills.
  • Excellent verbal and written communication, including presentation skills.
  • Excellent interpersonal skills to successfully collaborate with cross functional departments and managing outsourced vendors.
  • Ability to work independently and collaboratively in a fast-paced
  • Strong orientation toward results coupled with reputation for integrity, creativity and good judgment
  • Must have the ability to challenge, when appropriate, existing practices.

Responsibilities

  • Reliability & Availability: Ensure all enterprise systems are highly available, resilient and ensure consistent performance by designing and implementing monitoring solutions across data centers and cloud environments.
  • Observability and Monitoring: Work with the Observability Architect to build and optimize monitoring, alerting and observability. Maximize the use of Observability and Monitoring tools to improve and ensure continued performance, resiliency and availability of enterprise systems.
  • Incident Management & Response: Lead high severity incident response efforts. Troubleshoot and debug complex infrastructure and application issues across servers, storage, databases, networks and cloud services. Perform root cause analysis and blameless incident reviews and implement corrective and preventative measures to prevent recurrence and reduce MTTR.
  • Performance Optimization: Conduct periodic performance optimization reviews for infrastructure and applications to ensure continued service efficiency and scalability of systems. Analyze system performance and identify bottlenecks at all layers (compute, storage, network, application). Optimize configurations and implement tuning improvements to maximize throughput and minimize latency. Work with architecture teams to design scalable solutions that gracefully handle growth and peak loads.
  • Capacity Planning: Forecast infrastructure needs and ensure the environment can handle current and future workloads.
  • Automation: Develop and maintain automation tools for monitoring and operations, reducing manual intervention and increasing efficiency.

Benefits

  • Medical, dental, and vision plans with no-cost and low-cost options
  • Annual employer HSA contribution
  • 401(k) matching and immediate vesting
  • Vehicle purchase and lease discounts, plus monthly vehicle allowances by job level:
  • Associate / Sr. Associate: $350
  • Manager / Sr. Manager: $600
  • Director: $800
  • Executive Director: $900
  • VP or Above: $1,000
  • 100%25 employer-paid life and disability insurance
  • No-cost health and wellbeing programs, including a gym benefit
  • Six weeks of paid parental leave
  • Paid Volunteer Time Off, plus a company donation to a charity of your choice
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service