Lead Kafka Platform Engineer

Wells Fargo BankIrving, TX
Hybrid

About The Position

Seeking a highly skilled Lead Platform Engineer with deep expertise in Confluent Kafka and OpenShift (OCP) to design, build, and operate scalable, resilient, and secure event streaming platforms. The ideal candidate will bring strong experience in distributed systems, automation, and enterprise platform engineering, with a proven ability to drive architecture decisions, optimize performance, and enable large-scale application adoption through standardized, self-service solutions.

Requirements

  • 5+ years of experience in distributed systems, platform engineering, or infrastructure engineering, with a strong focus on event streaming platforms
  • Hands-on experience with Apache Kafka / Confluent Platform, including brokers, KRaft/Zookeeper, Schema Registry, Kafka Connect, and related ecosystem components
  • Strong experience working with container platforms such as OpenShift (OCP) or Kubernetes, including operators, namespaces, networking, and storage integration
  • Proven experience designing and operating highly available, multi-node or multi-data-center distributed systems, including replication, fault tolerance, and disaster recovery strategies
  • Experience with automation and Infrastructure-as-Code, including CI/CD pipelines, configuration management tools, and declarative deployment models
  • Solid understanding of networking, DNS, load balancing, and secure service exposure in containerized environments
  • Strong knowledge of security practices, including TLS/mTLS, authentication, authorization (RBAC), and certificate lifecycle management in enterprise environments
  • Experience with observability and monitoring tools, including metrics, logging, alerting, and performance tuning of distributed platforms
  • Proven ability to troubleshoot complex platform issues, perform root cause analysis, and implement long-term solutions
  • Strong understanding of capacity planning, performance optimization, and resource utilization for large-scale platforms
  • Experience working in regulated enterprise environments, with knowledge of risk management, controls, and compliance requirements
  • Excellent collaboration and communication skills, with the ability to work across engineering teams, infrastructure teams, and external vendors

Nice To Haves

  • Experience deploying and managing operator-based Kafka platforms in Kubernetes/OpenShift environments
  • Familiarity with multi-region or neighborhood-aligned Kafka architectures and platform isolation strategies
  • Experience with streaming integrations and connectors, including database CDC and event-driven architectures
  • Knowledge of GitOps tools, CI/CD platforms, or similar automation frameworks
  • Exposure to large-scale platform modernization or migration programs, especially transitioning from legacy messaging systems to Kafka

Responsibilities

  • Lead complex initiatives to design and deliver Confluent Kafka platforms on OpenShift (OCP), enabling scalable, resilient, and secure event streaming solutions for enterprise applications.
  • Design, build, deploy, and maintain Kafka infrastructure on OCP using operator-based frameworks, supporting components such as brokers, KRaft controllers, Schema Registry, Kafka Connect, Fink and Control Center.
  • Drive continuous improvement and modernization efforts, including platform upgrades, automation, and performance optimization across Kafka and OCP environments.
  • Evaluate and integrate Kafka ecosystem tools and OCP-native capabilities, ensuring alignment with enterprise architecture standards and target-state platform design.
  • Develop and maintain automation frameworks using CI/CD pipelines and Infrastructure-as-Code to standardize Kafka cluster provisioning, configuration, and lifecycle management.
  • Architect and implement high availability and disaster recovery solutions, including cross–data center deployments, replication strategies, and cluster linking for multi-region resilience.
  • Define and enforce platform governance standards, including naming conventions, topic management, schema governance, security policies, and data isolation strategies.
  • Analyze and resolve high-impact incidents, performing root cause analysis and implementing corrective actions to improve reliability and prevent recurrence.
  • Make key technical decisions on Kafka architecture and OCP deployment models, including cluster topology, storage integration, networking, and workload placement.
  • Establish and manage operational risk and control processes, ensuring compliance with enterprise security, regulatory, and audit requirements.
  • Optimize platform performance and cost through capacity planning, resource utilization tuning, and workload distribution strategies.
  • Collaborate with application teams to support onboarding, migration, and adoption of Kafka, enabling self-service capabilities and best practices.
  • Partner with internal platform teams (OCP, networking, security) and external vendors to drive platform delivery, resolve issues, and influence roadmap priorities.

Benefits

  • Health benefits
  • 401(k) Plan
  • Paid time off
  • Disability benefits
  • Life insurance, critical illness insurance, and accident insurance
  • Parental leave
  • Critical caregiving leave
  • Discounts and savings
  • Commuter benefits
  • Tuition reimbursement
  • Scholarships for dependent children
  • Adoption reimbursement
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service