About The Position

We are looking for a dedicated and experienced Kubernetes Engineer to join our OpenShift/Kubernetes Platform team. In this role, you will be at the forefront of our cloud-native strategy, collaborating with a talented team of engineers responsible for the reliability, scalability, and performance of our container platforms. Your contributions will be crucial in building, maintaining, and optimizing our OpenShift clusters hosted on Google Cloud Platform (GCP) and Microsoft Azure, ensuring a stable and efficient environment for our development teams. This highly technical individual contributor position is perfect for someone with a strong background in distributed systems and cloud infrastructure who is passionate about building robust, scalable, and efficient container platforms. Platform Engineering & Operations: Contribute to the architecture, engineering, and day-to-day operations of our OpenShift and Kubernetes clusters on GCP and Azure, focusing on high availability, security, and scalability. Service Reliability: Participate in defining, monitoring, and meeting Service Level Agreements (SLAs), Service Level Objectives (SLOs), and Mean Time to Resolution (MTTx) to ensure the platform consistently meets the needs of our internal customers. Automation and Efficiency: Develop and implement automation solutions to streamline operations, reduce manual tasks, and improve overall efficiency. This includes contributing to and enforcing best practices for Infrastructure as Code (IaC) using tools like Terraform or Ansible. Technical Collaboration: Collaborate with cross-functional teams to contribute to the platform's technical direction and roadmap, aligning its capabilities with business needs and user requirements. Incident Response: Participate in troubleshooting and resolving complex technical issues, and contribute to post-mortem analyses to prevent future occurrences. Tooling & Development: Design, develop, and maintain tools and services to enhance the reliability, observability, and manageability of our Kubernetes environment.

Requirements

  • Proven experience (typically 5+ years) as a Kubernetes Engineer, Site Reliability Engineer (SRE), or DevOps Engineer with a strong focus on container orchestration.
  • Deep expertise with Kubernetes and OpenShift platforms, including cluster operations, networking, storage, and security.
  • Strong hands-on experience with public cloud platforms, specifically Google Cloud Platform (GCP) and Microsoft Azure.
  • Proficiency in Infrastructure as Code (IaC) tools such as Terraform, Ansible, or similar.
  • Experience with scripting and programming languages (e.g., Python, Go, Bash) for automation and tool development.
  • Solid understanding of distributed systems, microservices architectures, networking principles, and security best practices.
  • Experience with monitoring, logging, and alerting tools (e.g., Prometheus, Grafana, ELK stack, Datadog).
  • Excellent problem-solving, debugging, and analytical skills.
  • Strong communication and collaboration skills, with the ability to work effectively within a team environment.

Responsibilities

  • Contribute to the architecture, engineering, and day-to-day operations of our OpenShift and Kubernetes clusters on GCP and Azure, focusing on high availability, security, and scalability.
  • Participate in defining, monitoring, and meeting Service Level Agreements (SLAs), Service Level Objectives (SLOs), and Mean Time to Resolution (MTTx) to ensure the platform consistently meets the needs of our internal customers.
  • Develop and implement automation solutions to streamline operations, reduce manual tasks, and improve overall efficiency. This includes contributing to and enforcing best practices for Infrastructure as Code (IaC) using tools like Terraform or Ansible.
  • Collaborate with cross-functional teams to contribute to the platform's technical direction and roadmap, aligning its capabilities with business needs and user requirements.
  • Participate in troubleshooting and resolving complex technical issues, and contribute to post-mortem analyses to prevent future occurrences.
  • Design, develop, and maintain tools and services to enhance the reliability, observability, and manageability of our Kubernetes environment.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service