Senior Infrastructure Kafka Engineer

TechnologentPhoenix, AZ
Hybrid

About The Position

The Opportunity: We are seeking a Senior Infrastructure - Kafka Engineer to join a high-performing data engineering team supporting large-scale, event-driven data platforms. This role is ideal for a seasoned engineer with deep experience in Apache Kafka / Confluent Kafka , messaging platforms , SQL/NoSQL databases , and cloud infrastructure , who can lead engineering, operations, and automation efforts across complex enterprise environments. This is a 6-month contract-to-hire opportunity supporting a hybrid work model in Phoenix, AZ . The ideal candidate is a hands-on infrastructure engineer with strong experience designing resilient Kafka environments, building real-time data pipelines, and supporting production systems in fast-paced enterprise settings. Role: Senior Infrastructure - Kafka Engineer Experience: 7+ Years Work Location: Phoenix, AZ (Hybrid - 4 days onsite / 1 day remote) Project Duration: 6-Month Contract-to-Hire

Requirements

  • 7+ years of experience in infrastructure engineering with a strong focus on:
  • Kafka administration across on-prem and cloud environments
  • Kafka ecosystem components including brokers, topics, consumer groups, replication, and failover
  • Messaging systems such as MQ
  • SQL and NoSQL database integration
  • Proven experience designing, deploying, and scaling Kafka clusters and connector infrastructure in production and DR environments.
  • Hands-on experience building real-time data pipelines using Kafka producers and streaming consumers such as Spark Streaming.
  • Strong proficiency with at least one major cloud platform: AWS, GCP, or Azure.
  • Experience with event-driven architectures, containerization, and DevOps practices.
  • Experience with observability and monitoring tools such as Splunk, Datadog, and Grafana.
  • Solid understanding of networking, Linux/Windows operating systems, and core diagnostic tools.
  • Proficiency with source control tools such as SVN and Git.
  • Scripting and programming experience with tools such as PowerShell, Bash, Python, or Perl.
  • Demonstrated ability to analyze complex issues, make sound decisions with limited information, and drive issues through resolution.
  • Strong communication, customer service, and collaboration skills with the ability to work effectively across cross-functional technical teams.

Nice To Haves

  • Experience with additional enterprise monitoring and infrastructure support tools.
  • Experience working in highly regulated enterprise environments.
  • Prior exposure to large-scale data engineering or integration platforms.

Responsibilities

  • Administer, configure, and troubleshoot Kafka clusters across on-prem and cloud environments, including broker and cluster configuration, partitioning, and performance tuning.
  • Design and implement scalable, highly available Kafka infrastructure, including disaster recovery and multi-environment strategies.
  • Integrate Kafka with upstream and downstream systems using Kafka Connect and related connectors, including MQ, MongoDB, Oracle, SQL Server, PostgreSQL, and MySQL.
  • Build and support real-time data pipelines using Kafka producers and streaming consumers such as Spark Streaming and Kafka Streams.
  • Automate infrastructure provisioning and configuration across environments using Terraform and modern DevOps practices.
  • Deploy and manage Kafka components and clients in production and disaster recovery environments, ensuring resilience and recoverability.
  • Lead a small team of engineers and technicians in monitoring, diagnosis, and remediation of infrastructure issues.
  • Implement and maintain comprehensive monitoring, logging, and alerting using tools such as Splunk, Datadog, and Grafana.
  • Perform proactive health checks and capacity planning to identify and resolve issues before they impact service.
  • Serve as a primary point of contact for daily operations, major incidents, and escalations related to Kafka and associated infrastructure.
  • Develop, maintain, and continuously improve runbooks and playbooks for incident response, maintenance, and recurring operational tasks.
  • Analyze support trends and incident patterns to reduce downtime and drive root-cause resolution.
  • Ensure infrastructure and platform changes comply with internal standards, security policies, and applicable regulatory requirements.
  • Partner with security, networking, application, and data engineering teams to design and operate secure, compliant, event-driven architectures.
  • Contribute to standards, best practices, and technical documentation for Kafka, messaging, and integration patterns.
  • Participate in agile ceremonies and help influence technical direction for streaming and integration platforms.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service