System Engineer-Platform/Kafka/Messaging

O'Reilly Auto PartsHeadquarters, KY
Onsite

About The Position

The Systems Engineer Platform – Messaging Platform will play a key role in designing, implementing, and maintaining enterprise messaging systems that support both real-time and asynchronous communication between distributed applications and services. This role focuses on ensuring robust, scalable, secure, and cost-efficient messaging solutions across hybrid cloud and on-premises environments. The engineer will work closely with architects, developers, infrastructure, and DevOps teams to standardize messaging platforms and promote modern integration patterns that align with the organization’s digital transformation goals. This is an on-site position located in Springfield, MO.

Requirements

  • 3+ years with enterprise messaging platforms (Kafka, Pub/Sub), including deep knowledge of Kafka architecture (brokers, partitions, replication, schema management, streams, and connectors).
  • Strong experience designing high-throughput, low-latency event-driven systems with governance, topic management, and access control.
  • Hands-on experience with GCP messaging services (Pub/Sub, Eventarc) and hybrid architectures, including secure connectivity, IAM, and multi-region designs.
  • Proficiency in Infrastructure as Code and automation using Terraform, Kubernetes/Helm, and CI/CD tools, with GitOps-based configuration management.
  • Solid background in security and compliance, including encryption (TLS/mTLS), authentication (SASL, IAM), and regulatory standards (PCI, SOC2, HIPAA).
  • Experience implementing observability with Prometheus, Grafana, ELK, and GCP tools, including monitoring lag, throughput, and distributed tracing.
  • Strong performance tuning and reliability practices, including scaling, backpressure handling, DLQs, retries, and cross-region replication.
  • Focus on developer enablement through reusable tools, self-service platforms, schema governance, and event-driven best practices.
  • Ability to evaluate and prototype messaging technologies (e.g., Pulsar, RabbitMQ), lead POCs, and drive adoption of modern streaming and data integration patterns.

Nice To Haves

  • Experience running Kafka on Kubernetes (Strimzi/Confluent Operator), including scaling, configuration, and multi-tenant architectures.
  • Strong foundation in event-driven architecture, microservices communication, and asynchronous messaging patterns.
  • Hands-on experience with real-time processing tools (Kafka Streams, Flink, Spark) and Kafka Connect integrations (Debezium, JDBC, Elasticsearch).
  • Proficiency in schema management (Avro, Protobuf) and schema registries (Confluent, Apicurio).
  • Familiarity with Docker, Kubernetes (GKE), and service mesh technologies (Istio, Linkerd) for secure, cloud-native deployments.
  • Experience with cloud messaging and analytics tools (GCP Pub/Sub, Dataflow) and migrating from legacy systems (e.g., IBM MQ, RabbitMQ).
  • Strong knowledge of security and access control (ACLs, SASL/SCRAM, Kerberos, IAM) and message delivery semantics (at-least-once, exactly-once).
  • Experience with observability (Prometheus, Grafana, GCP Monitoring), CI/CD automation for Kafka resources, and data governance tools (Dataplex, Collibra).
  • Industry experience, certifications, and ability to lead design discussions, mentor teams, and work within agile environments.

Responsibilities

  • Design and operate scalable messaging platforms (Kafka, Pub/Sub, legacy MQ) supporting high-throughput, low-latency event streaming.
  • Manage topics, schemas (Avro/JSON/Protobuf), connectors, and stream processing (Kafka Streams/KSQLDB).
  • Enforce governance for naming, retention, access control, and multi-tenancy, while ensuring reliable delivery (retries, DLQs, idempotency, exactly-once).
  • Automate messaging infrastructure using Terraform, Helm, and Kubernetes (e.g., Strimzi), with Git-based configuration management.
  • Build CI/CD pipelines for deployments, upgrades, and topic provisioning, including drift detection and self-healing workflows.
  • Implement secure messaging with IAM integration (GCP, LDAP, Kerberos), TLS/mTLS, and ACLs.
  • Align with compliance standards (PCI, HIPAA) and Zero Trust principles, including data classification, logging, and anomaly detection.
  • Establish observability using Prometheus, Grafana, ELK, or GCP tools.
  • Monitor SLAs/SLOs, consumer lag, broker health, and replication.
  • Lead incident response, RCA, and disaster recovery planning.
  • Enable teams with reusable connectors, SDKs, and event-driven patterns.
  • Provide guidance on schema evolution, idempotency, and domain-driven event design, supported by self-service tools and documentation.
  • Drive migration from legacy systems to modern streaming platforms, evaluating new technologies (e.g., Pulsar, Redpanda).
  • Lead migration strategies, dual-write patterns, and multi-region replication.
  • Collaborate with cross-functional teams on architecture and governance, contribute to event-driven and data mesh strategies, and mentor engineers while promoting best practices.

Benefits

  • Competitive Wages & Paid Time Off
  • Stock Purchase Plan & 401k with Employer Contributions Starting Day One
  • Medical, Dental, & Vision Insurance with Optional Flexible Spending Account (FSA)
  • Team Member Health/Wellbeing Programs
  • Tuition Educational Assistance Programs
  • Opportunities for Career Growth
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service