Site Reliability Engineer DWS Ohio

Kyndryl•Columbus, OH

13h•Onsite

About The Position

At Kyndryl, we run and reimagine the mission-critical technology systems that drive advantage for the world’s leading businesses. We are at the heart of progress; with proven expertise and a continuous flow of AI-powered insight, enabling smarter decisions, faster innovation, and a lasting competitive edge. For our people—Kyndryls—that means doing purposeful work that powers human progress. Join us and experience a flexible, supportive environment where your well-being is prioritized and your potential can thrive. Join us as a Site Reliability Engineer (SRE) and embark on an exciting journey of ensuring reliability, resiliency, and innovation in our information systems and ecosystems. As an SRE at Kyndryl, you'll be at the forefront of driving continuous improvement and delivering exceptional service to our customers. Your role goes beyond traditional engineering, as you'll have the opportunity to analyze business needs, tackle complex problems, and provide strategic advice and designs. You'll be involved in every stage of the software lifecycle, from building and testing to deploying changes and maintaining robust systems. We're looking for a true visionary who can think strategically and help shape the future of our services. Your expertise in building trusted relationships with customers and partnering with them for success will be instrumental in driving our growth. As an SRE, you'll have the unique opportunity to work on end-to-end services, spanning customer sites and platforms. Collaboration and proactivity are key as you work alongside a talented team of professionals, eager to make a difference. You'll embrace an entrepreneurial mindset, taking ownership of your responsibilities and constantly seeking innovative solutions. With an unwavering focus on quality, robustness, and security, you'll be a driving force in implementing cutting-edge tools that enhance our operations, improve reliability, and gather valuable feedback on our platforms. Your ability to identify and mitigate common operational issues will play a crucial role in delivering seamless experiences to our customers. If you're passionate about pushing the boundaries of technology, thrive in a collaborative environment, and are motivated by the opportunity to shape the future of reliability engineering, then we want to hear from you. Join our team and be part of a dynamic and forward-thinking organization that values innovation and excellence in everything we do.

Requirements

Must be in Ohio / Client Facing
10+ years of experience in operational management, including incident management and escalations
Experience with design and implementation of application monitoring to ensure reliability and performance meets or exceeds business goals
Experience implementing strategies to cap operations load and to handle overflow using appropriate tooling and metrics; defining service level indicators and objectives in collaboration with stakeholders, business, development, DevSecOps and Operations teams
Solution and design experience in an enterprise environment: Windows server, Linux server (RHEL is preferred), UNIX (AIX, Solaris), Windows server, storage, and Hyperscaler Cloud (AWS, Azure, Google Cloud Platform); public cloud platforms such as AWS, OpenShift, Azure or GCP
Experience working with Data format and Scripting languages JSON, YAML, Bash and/or PowerShell

Nice To Haves

BS degree in Computer Science, Engineering, or other highly technical, scientific discipline
Expertise with Ansible, Terraform, and Python
Experience with distributed technologies as well as dynamic resource management frameworks such as Kubernetes
Expertise in leveraging open-source tooling such as Prometheus, Grafana, or Loki

Responsibilities

Ensuring reliability, resiliency, and innovation in information systems and ecosystems.
Driving continuous improvement and delivering exceptional service to customers.
Analyzing business needs, tackling complex problems, and providing strategic advice and designs.
Involvement in all stages of the software lifecycle: building, testing, deploying changes, and maintaining robust systems.
Building trusted relationships with customers and partnering with them for success.
Working on end-to-end services, spanning customer sites and platforms.
Implementing cutting-edge tools that enhance operations, improve reliability, and gather feedback on platforms.
Identifying and mitigating common operational issues to deliver seamless customer experiences.