Site Reliability Engineer DWS Ohio

Kyndryl•Columbus, OH

1d•Onsite

About The Position

As a Site Reliability Engineer (SRE) at Kyndryl, you'll be at the forefront of ensuring reliability, resiliency, and innovation in our information systems and ecosystems. Your role goes beyond traditional engineering, as you'll have the opportunity to analyze business needs, tackle complex problems, and provide strategic advice and designs. You'll be involved in every stage of the software lifecycle, from building and testing to deploying changes and maintaining robust systems. We're looking for a true visionary who can think strategically and help shape the future of our services. Your expertise in building trusted relationships with customers and partnering with them for success will be instrumental in driving our growth. As an SRE, you'll have the unique opportunity to work on end-to-end services, spanning customer sites and platforms. Collaboration and proactivity are key as you work alongside a talented team of professionals, eager to make a difference. You'll embrace an entrepreneurial mindset, taking ownership of your responsibilities and constantly seeking innovative solutions. With an unwavering focus on quality, robustness, and security, you'll be a driving force in implementing cutting-edge tools that enhance our operations, improve reliability, and gather valuable feedback on our platforms. Your ability to identify and mitigate common operational issues will play a crucial role in delivering seamless experiences to our customers. Kyndryl has a global footprint, which means that as a Site Reliability Engineer at Kyndryl you will have opportunities to work on projects and collaborate with colleagues from around the world. This role is dynamic and influential – offering a wide range of professional and personal growth opportunities that you won’t find anywhere else.

Requirements

Must be in Ohio / Client Facing
10+ years of experience in operational management, including incident management and escalations
Experience with design and implementation of application monitoring to ensure reliability and performance meets or exceeds business goals
Experience implementing strategies to cap operations load and to handle overflow using appropriate tooling and metrics; defining service level indicators and objectives in collaboration with stakeholders, business, development, DevSecOps and Operations teams
Solution and design experience in an enterprise environment: Windows server, Linux server (RHEL is preferred), UNIX (AIX, Solaris)
Experience with Windows server, storage, and Hyperscaler Cloud (AWS, Azure, Google Cloud Platform); public cloud platforms such as AWS, OpenShift, Azure or GCP
Experience working with Data format and Scripting languages JSON, YAML, Bash and/or PowerShell

Nice To Haves

BS degree in Computer Science, Engineering, or other highly technical, scientific discipline
Expertise with Ansible, Terraform, and Python
Experience with distributed technologies as well as dynamic resource management frameworks such as Kubernetes
Expertise in leveraging open-source tooling such as Prometheus, Grafana, or Loki

Responsibilities

Ensuring reliability, resiliency, and innovation in information systems and ecosystems.
Analyzing business needs, tackling complex problems, and providing strategic advice and designs.
Involvement in all stages of the software lifecycle: building, testing, deploying changes, and maintaining robust systems.
Building trusted relationships with customers and partnering with them for success.
Working on end-to-end services, spanning customer sites and platforms.
Collaborating with a talented team of professionals.
Taking ownership of responsibilities and seeking innovative solutions.
Implementing cutting-edge tools to enhance operations, improve reliability, and gather feedback on platforms.
Identifying and mitigating common operational issues to deliver seamless customer experiences.

Benefits

Medical and dental coverage
Disability
Retirement benefits
Paid leave
Paid time off
Discretionary annual bonus program (based on performance)
Opportunities for professional and personal growth
Hands-on experience
Learning opportunities
Chance to certify in all four major platforms
Access to employee learning programs with certifications from Microsoft, Google, Amazon, Skillsoft, and many more
Company-wide volunteering and giving platform