Senior System Engineer - Platform/Messaging

O'Reilly Auto PartsHeadquarters, KY
22hOnsite

About The Position

The Sr Systems Engineer Platform – Messaging Platform will play a key role in designing, implementing, and maintaining enterprise messaging systems that support both real-time and asynchronous communication between distributed applications and services. This role focuses on ensuring robust, scalable, secure, and cost-efficient messaging solutions across hybrid cloud and on-premises environments. The engineer will work closely with architects, developers, infrastructure, and DevOps teams to standardize messaging platforms and promote modern integration patterns that align with the organization’s digital transformation goals. This position is an on site position located in Springfield, MO.

Requirements

  • 5+ years of hands-on experience with enterprise-grade messaging platforms, including Apache Kafka (Confluent/OSS), Google Cloud Pub/Sub, or equivalent.
  • Deep expertise in Kafka architecture including broker management, partitioning, replication, log retention, and consumer group coordination.
  • Proficiency in schema evolution and contract validation using Confluent Schema Registry, Apicurio, or similar tools.
  • Demonstrated experience with Kafka Connect, Kafka Streams, and stream processing applications for real-time data movement.
  • Strong knowledge of topic hierarchy management, naming conventions, access control (ACLs), and governance policies.
  • Experience designing and deploying messaging platforms for mission-critical, high-throughput, low-latency event-driven systems.
  • Experience deploying and managing messaging workloads on GCP using Google Cloud Pub/Sub, Eventarc, or custom solutions.
  • Familiar with hybrid architectures including secure messaging between on-prem and cloud workloads via VPN/Interconnect.
  • Hands-on implementation of GCP IAM, VPC Service Controls, and secure endpoint management for messaging components.
  • Understanding of regional and multi-region messaging designs for high availability and disaster recovery.
  • Skilled in automating infrastructure provisioning and configuration using Terraform, Helm, and Kubernetes (e.g., Strimzi operators).
  • CI/CD experience with GitHub Actions, Jenkins, or Google Cloud Build for promoting messaging configurations and application artifacts.
  • Strong understanding of data protection practices including TLS/mTLS encryption, SASL/SCRAM, and token-based authentication.
  • Integration with enterprise IAM providers (e.g., GCP IAM, LDAP, SSO, RBAC) and audit logging tools.
  • Familiarity with industry regulations such as PCI-DSS, SOC2, HIPAA, and practices such as data classification and lineage.
  • Experience building reusable onboarding assets, SDKs, and best practice guides for application teams.
  • Development of shared connectors, bridges (e.g., API-to-Kafka), and schema-first design pipelines.
  • Exposure to alternative messaging systems like Apache Pulsar, Redpanda, RabbitMQ, and NATS.

Nice To Haves

  • Experience deploying and managing Apache Kafka on Kubernetes (e.g., using Strimzi or Confluent Operator), including custom broker configurations and scaling strategies.
  • Familiarity with containerization using Docker and orchestration using GKE or other Kubernetes-based platforms.
  • Exposure to multi-tenant Kafka architectures with namespace isolation, quota enforcement, and security boundaries.
  • Strong understanding of event-driven architecture (EDA), microservices communication patterns, and asynchronous messaging design.
  • Hands-on experience with real-time data processing frameworks like Apache Flink, Kafka Streams, or Spark Structured Streaming.
  • Experience configuring and tuning Kafka Connect connectors (e.g., Debezium, JDBC, Elasticsearch) for data pipeline integration.
  • Proficiency in schema evolution practices using Avro, Protobuf, and tools like Confluent Schema Registry or Apicurio.
  • Familiarity with service mesh technologies (e.g., Istio, Linkerd) and secure east-west traffic routing within cloud-native messaging environments.
  • Knowledge of enterprise messaging migration strategies from legacy platforms such as IBM MQ or RabbitMQ to cloud-native solutions.
  • Experience using Google Cloud Pub/Sub and Dataflow for scalable event ingestion, transformation, and streaming analytics.
  • Knowledge of access control and authentication mechanisms such as Kafka ACLs, SASL/SCRAM, Kerberos, or GCP IAM.
  • Awareness of message delivery semantics (at-least-once, exactly-once) and how to implement idempotency in distributed systems.
  • Hands-on exposure to observability stacks such as Prometheus/Grafana, Confluent Control Center, and GCP Monitoring for tracking message flow, latency, and system health.
  • Experience building and maintaining automated CI/CD pipelines for Kafka topic provisioning, ACL management, and connector deployment.
  • Familiarity with metadata and governance platforms such as Dataplex, Google Data Catalog, or Collibra in the context of messaging assets.
  • Industry experience in domains such as retail, supply chain, or financial services with large-scale streaming use cases.
  • Certifications related to Apache Kafka (e.g., Confluent Certified Developer/Administrator) or Google Cloud (e.g., Professional Cloud Developer).
  • Understanding of compliance standards (e.g., PCI-DSS, ISO 27001) as they relate to secure messaging.
  • Comfortable working in agile delivery models with tools like JIRA, Confluence, and participating in sprints and architecture reviews.
  • Ability to lead design discussions, provide technical mentoring, and promote platform adoption across enterprise teams.

Responsibilities

  • Design, implement, and support scalable messaging platforms including Apache Kafka (Confluent/OSS), Google Cloud Pub/Sub, and legacy systems such as IBM MQ.
  • Build high-throughput, low-latency event streaming pipelines that support mission-critical workloads, leveraging Kafka brokers, topics, partitions, and consumer groups.
  • Define and enforce schema governance using tools like Confluent Schema Registry or Apicurio; enforce consistent serialization formats (e.g., Avro, JSON, Protobuf).
  • Standardize topic taxonomy and hierarchy across business domains, enforce naming conventions, and implement lifecycle management practices for topics and subscriptions.
  • Manage Kafka Connect connectors (source/sink), KSQLDB flows, and stream processing topologies using Kafka Streams.
  • Define and implement platform-wide policies for message retention, compaction, ACLs, multi-tenancy isolation, and access control.
  • Ensure reliable message delivery through producer retries, dead-letter queues, idempotency handling, and exactly-once semantics where applicable.
  • Automate deployment and configuration of messaging infrastructure using Terraform, Helm, Ansible, and Kubernetes operators (e.g., Strimzi for Kafka).
  • Maintain Git-based configuration-as-code repositories to drive consistency and auditability across environments.
  • Develop CI/CD pipelines to support promotion of configuration artifacts, rolling upgrades of messaging clusters, and dynamic provisioning of topics and consumer policies.
  • Implement proactive drift detection, self-healing scripts, and platform bootstrapping workflows.
  • Integrate messaging platforms with enterprise IAM (e.g., GCP IAM, LDAP, Kerberos, or RBAC for Confluent/Kafka).
  • Implement encryption in transit and at rest using TLS, mTLS, SASL/SCRAM, and Kafka-level ACLs.
  • Implement observability dashboards and alerts using Prometheus, Grafana, Confluent Control Center, ELK Stack, or Google Cloud Operations Suite.
  • Establish performance baselines and configure SLA/SLO-based monitoring for producers, consumers, brokers, and ZooKeeper.
  • Partner with development teams to onboard producer and consumer applications through reusable connectors, API-to-event bridges, and SDKs.
  • Provide technical guidance and documentation on best practices for schema evolution, idempotent messaging, and replayable data streams.
  • Lead efforts to decommission legacy messaging platforms (e.g., MQSeries, RabbitMQ) and consolidate onto modern event streaming technologies.
  • Evaluate emerging messaging technologies (e.g., Pulsar, Redpanda) for specific workloads or cost optimization opportunities.

Benefits

  • Competitive Wages & Paid Time Off
  • Stock Purchase Plan & 401k with Employer Contributions Starting Day One
  • Medical, Dental, & Vision Insurance with Optional Flexible Spending Account (FSA)
  • Team Member Health/Wellbeing Programs
  • Tuition Educational Assistance Programs
  • Opportunities for Career Growth
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service