Production-grade Apache Kafka operations experience, managing, maintaining and upgrading Kafka clusters in production environments with a focus on high availability, disaster recovery, fail-over and overall reliability Kafka ecosystem tooling experience: Kafka Connect, Schema Registry Proficiency in installing and configuring monitoring systems using Grafana (building dashboards), Prometheus, JMX metrics and Splunk Automation and orchestration experience: Terraform, Ansible, Helm, Kubernetes (EKS/AKS/GKE) or equivalent Scripting and tooling experience: Python or Bash for automation and runbooks Strong Linux system administration experience, including troubleshooting, automation and scripting for efficient infrastructure management. Knowledge of networking concepts across on-prem VMs and cloud environments, ensuring seamless integration and communication between services. Strong understanding of topic management and security best practices for streaming platforms: TLS, ACLs, RBAC, encryption at rest/in transit Experience participating in 24x7 on-call rotations, JVM tuning, GC Analysis, network and disk I/O diagnostics and documenting incidents/postmortems Experience in TCP/IP, routing, switching and firewall configurations relevant to Kafka operations
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Industry
Professional, Scientific, and Technical Services
Education Level
No Education Listed
Number of Employees
5,001-10,000 employees