Staff Site Reliability Engineer, Cloud

Kentik
3h$165,000 - $200,000Remote

About The Position

Kentik is the network intelligence platform for modern infrastructure teams. Unlike traditional monitoring and observability tools, we demystify complex network operations, enabling organizations to deliver applications and innovation at scale. Built by network experts to make critical insight accessible to every engineer, Kentik is the real-time source of truth that understands every network in context — from data center to cloud to the internet. This single platform unifies and correlates cloud, device, flow, synthetic data to turn telemetry into action. Market leaders like Akamai, Booking.com, Dropbox, and Zoom rely on Kentik to run, manage, and optimize their networks. Our platform ingests trillions of records and serves hundreds of thousands of queries for our users each day. You will gain experience building a production quality, high performance server-and-client SaaS application that handles uniquely high volumes of data. We have built a team of world-class engineers, network experts, and technology thought leaders in a remote-friendly culture from day one. While prior experience in a remote environment is not required, we highly value strong collaboration and communication skills, as well as a high level of independence and autonomy. Kentik is looking for a Staff level Site Reliability Engineer (Cloud) to join our Product Engineering team to help build and maintain our Synthetics and Cloud product lines. These products have multiple applications deployed in various cloud providers all over the world. We manage these cloud applications using observability tooling, automated build processes, and adherence to configuration as code best practices. We’re looking for an experienced engineer who will work with engineering teams across the company to help grow our hardware and software infrastructure. We operate a well-organized, well-instrumented platform, and offer enormous opportunities for employee growth.

Requirements

  • 8+ years of experience in cloud-based Systems Administration, IT and/or SRE related projects
  • Expertise in public cloud environments such as AWS, GCP, Azure, or OCI.
  • Strong command of containerization and orchestration using Docker and Kubernetes.
  • Solid programming and automation skills using Bash, Python, or Go.
  • Proficiency with Infrastructure as Code (IaC) and configuration management platforms such as Terraform, Ansible, and Puppet.
  • Proficiency in Linux administration and command-line tools (e.g., SSH, grep, awk).
  • Detailed understanding of major internet protocols (TCP/IP, DNS, HTTP, TLS)
  • Networking administration experience: concepts such as routing, firewalls (iptables), peering sound familiar
  • A passion for documenting code, processes, and infrastructure in runbooks and wikis
  • Worked with metrics monitoring solutions such as grafana, prometheus, telegraf, and OpenTelemetry
  • Experience creating and managing tickets with third party vendors and owning cloud vendor partner relationships

Nice To Haves

  • Familiarity with Kubernetes automation tools, specifically managing complex deployments with Helm and Helmfile.
  • Knowledge of scaling Kubernetes workloads and compute infrastructure
  • Experience optimizing CI/CD build and deploy pipelines using GitHub Actions and Jenkins.
  • Exposure to PagerDuty Integrations
  • Knowledge of SRE, DevOps and GitOps practices and principles

Responsibilities

  • Make sure our real-time, scalable, infrastructure is set up for growth and working efficiently. Our infrastructure runs on our own hardware, across multiple locations as well as all major cloud vendors
  • Work on tools and processes to better monitor our platform as well as ensuring its stability through our rapid growth
  • Deep-diving into diverse topics, from firewalls and IP routing, to database replication strategies or automating build processes
  • Collaborate with engineering and infrastructure teams on finding solutions from an operational perspective
  • Assist with expanding our cloud deployments across the major cloud providers
  • Contribute code, code reviews and tools or patches to all kinds of existing code
  • Write design documents or collaborate on colleagues’ docs to introduce new features or changes into our infrastructure
  • Provide valuable feedback on team goals, projects, and processes. We believe in continuously improving our team

Benefits

  • 100% of premiums are paid by company for health, vision and dental coverage for you and your dependents
  • Additionally, an annual Health Reimbursement Account (HRA) of $3,000 for an individual or $4,500 for a family
  • Paid family & medical leave
  • Open PTO, a quarterly Wellness Day, and a minimum of 10 paid holidays
  • 401(k) retirement account
  • Home office reimbursement
  • Stock options
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service