Site Reliability Engineer

Qlik
4d$110,000 - $140,000Remote

About The Position

As a Site Reliability Engineer at Qlik, you’ll sit at the heart of our cloud ecosystem, helping power the reliability, security, and scalability of Qlik and Talend Cloud services used around the world. This is your opportunity to work on systems operating at serious scale — supporting millions of transactions across a global cloud environment — while shaping how reliability engineering is done across the business. You won’t just “keep the lights on.” You’ll design, improve, automate, and elevate how modern cloud platforms perform. If you’re motivated by complex distributed systems, Kubernetes at scale, and solving meaningful engineering challenges, this is where you’ll thrive. This is a role for engineers who love depth, autonomy, and impact.

Requirements

  • Cloud engineering skill across AWS and/or Azure, including hands-on experience supporting production systems running on Kubernetes at scale.
  • Infrastructure as Code and microservices experience, using tools such as Terraform, Crossplane or Ansible, with a strong understanding of operating distributed systems in live environments.
  • Automation and engineering mindset, with proficiency in Python, Go or Bash, plus experience building and improving CI/CD pipelines and autoscaling strategies.
  • Observability and incident management depth, including Prometheus, Grafana, OpenTelemetry, distributed tracing, and SIEM tooling — with the ability to turn insights into reliability improvements.
  • Security and networking knowledge, including secret management (e.g., Vault, AWS SSM) and familiarity with infrastructure security and compliance best practices.
  • Cloud-native tooling experience, including Helm (managing and creating charts) and exposure to modern database and ecosystem technologies such as MongoDB.
  • Strong analytical thinking, with the ability to troubleshoot complex issues across infrastructure, networking, and application layers.
  • Curiosity and collaboration at their core; a passion for learning, sharing ideas and insight and comfort with the on-call support rotation – experience here is also welcome.

Responsibilities

  • Increase reliability and availability by implementing resilient infrastructure patterns and performance optimizations.
  • Reduce incidents and recovery time through better observability, automation, and proactive engineering.
  • Strengthen scalability by designing infrastructure that adapts seamlessly to growth.
  • Improve cloud efficiency by driving optimization best practices across AWS and Azure environments.
  • Resolve complex system challenges across infrastructure, networking, applications, and distributed systems.
  • Participate in on-call duties to maintain the availability and performance of our cloud infrastructure, providing regular updates on project status and activities. This includes first-line incident response.
  • Elevate engineering standards by mentoring peers and embedding reliability-first thinking into development workflows.

Benefits

  • Genuine career progression pathways and mentoring programs.
  • Culture of innovation, technology, collaboration, and openness.
  • Flexible, diverse, and international work environment.
  • Giving back is a huge part of our culture. Alongside an extra “change the world” day plus another for personal development, we also highly encourage participation in our Corporate Responsibility Employee Programs.
  • comprehensive benefits, including - but not limited to - medical, dental, and vision coverage life and AD&D, short and long-term disability coverage, paid time off, paid parental / maternity leave, participation in a 401(k) program that includes company match, and many other additional voluntary benefits.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service