About The Position

As a Site Reliability Engineer at Kustomer, you will be building systems and abstractions that will be used by teams across the company. You will join a team of experienced engineers and have ample opportunities to continue learning and growing. This role sits on our Foundation team - part of your responsibilities on this team will include maintaining cloud infrastructure security, capacity planning, staying ahead of software lifecycles, optimizing CI/CD processes, developer productivity, and managing on-call best practices. We believe in ownership and are looking for people driven to continuously ship new, impactful features and capabilities for our users.

Requirements

  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
  • 8+ years experience building and managing large scale, highly available, distributed web applications
  • A working understanding of a high-level programming language like Go, Python, JavaScript, Bash, etc.
  • Strong AWS experience managing infrastructure in a secure, highly available, automated fashion (VPC, ELB, Containers, Auto Scaling)
  • Strong background in Linux/Unix, networking, HTTP/2, DNS, REST, etc
  • Experience with managing large databases and Lucene-based search systems such as Elasticsearch
  • Experience with infrastructure as code and managing Terraform configurations in a sustainable and scalable way
  • Experience with observability tools (ELK/Prometheus/Coralogix/distributed tracing)

Nice To Haves

  • You have Github activity showing thoughtful, relevant contributions
  • You have a working knowledge of writing code and scripts in more than one language
  • You have experience developing internal tools for others
  • You have experience creating SLAs, SLOs, SLIs

Responsibilities

  • Analyze, design, develop, maintain and improve infrastructure to expand its automation capabilities
  • Automate the deployment of testing, staging, and production environments
  • Improve the efficiency of development testing
  • Measure, report and drive improvements on scalability, performance, and availability
  • Support the cloud developer environments and its iterative improvements
  • Lead, plan, and execute large scale system migrations
  • Participate in cross-team initiatives to drive engineering best-practices
  • Conduct code, architecture, and infrastructure reviews across the platform
  • Provide education and support to the engineering team in systems architecture design
  • Staying involved in initiatives around on-call rotations, application performance monitoring, and continuous integration and delivery pipelines
  • Lead various scalability initiatives across the platform and infrastructure
  • Implement and enforce change management best practices
  • Collaborate with the InfoSec team to drive compliance, observability and automation for the security of our platform
  • Work closely with the Security team to optimize infrastructure in order to satisfy compliance requirements
  • Manage secrets and automated key rotations
  • Manage security vulnerabilities and upgrade schedules for EOL (End of Life) software
  • Manage CDN, firewall rules, and other tools to mitigate attacks and threats

Benefits

  • Competitive salaries and stock options
  • 100% healthcare coverage in the U.S.
  • 401K
  • WiFi and Mobile reimbursement
  • Generous vacation policy
  • Pension and supplemental health insurance in the UK
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service