Site Reliability Engineer (SRE)

VizientChicago, IL
19h

About The Position

When you’re the best, we’re the best. We instill an environment where employees feel engaged, satisfied and able to contribute their unique skills and talents while living and working as their authentic selves. We provide extensive opportunities for personal and professional development, building both employee competence and organizational capability to fuel exceptional performance through an inclusive environment both now and in the future. Summary: In this role, you will design, automate, and maintain infrastructure to ensure the reliability, scalability, and performance of services across production environments. You will partner with engineering teams to improve system resilience, enhance observability, strengthen Continuous Integration/Continuous Deployment (CI/CD) practices, and advance site reliability engineering capabilities that support healthcare clients and business operations.

Requirements

  • Relevant degree in Computer Science or related field preferred.
  • 2 or more years of relevant experience required.
  • Familiarity with SLO/SLI, and Error budget framework required.
  • Experience with observability tools including Dynatrace, Prometheus, Grafana, Datadog, Splunk, or ELK required.
  • Experience with CI/CD engineering practices and Software Development Life Cycle (SDLC) frameworks required.
  • Proficiency in one or more programming or scripting languages such as PowerShell, Python, Go, or Bash required.
  • Experience with configuration management tools like Ansible, Chef, or Puppet required.
  • Hands-on experience with cloud platforms including Azure, AWS, or GCP required.
  • Understanding of distributed systems, networking, and reliability principles and maintaining production environments.
  • You must be authorized to work in the United States without sponsorship.

Nice To Haves

  • Knowledge of Kubernetes, Docker, and orchestration workflows preferred.

Responsibilities

  • Collaborate with engineering and operation teams to design systems that minimize downtime and improve service reliability.
  • Participate in incident response and conduct root-cause analysis to implement long-term remediation.
  • Manage performance tuning, capacity planning, and scaling strategies for production services.
  • Promote adoption of Service Level Objective (SLO), Service Level Indicator (SLI), and Error Budget frameworks to improve reliability practices.
  • Enhance CI/CD pipelines to enable secure, automated, and efficient deployments.
  • Implement and optimize monitoring, logging, and alerting solutions.
  • Build and maintain scalable infrastructure using infrastructure-as-code tools such as Pulumi, ARM, Terraform, or CloudFormation.
  • Develop automation for deployments, configuration management, and system operations.
  • Participate in on-call rotations and troubleshoot complex production incidents.

Benefits

  • Vizient has a comprehensive benefits plan!
  • Please view our benefits here: http://www.vizientinc.com/about-us/careers
  • Equal Opportunity Employer: Females/Minorities/Veterans/Individuals with Disabilities
  • The Company is committed to equal employment opportunity to all employees and applicants without regard to race, religion, color, gender identity, ethnicity, age, national origin, sexual orientation, disability status, veteran status or any other category protected by applicable law.
  • Working at Vizient means making a difference in today’s dynamic health care industry, every day.
  • Our mission is to connect health care organizations and providers with the knowledge, solutions and expertise that enable them to accelerate their clinical and operational performance.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service