Wells Fargo Bank-posted 9 days ago
Full-time • Mid Level
Irving, TX
5,001-10,000 employees

About this role: Wells Fargo is seeking a deeply technical Principal Engineer with elite-level expertise in both IBM MQ and Apache Kafka. This is a hands-on-keyboard role for a subject matter expert who will be the ultimate technical authority for our enterprise messaging and data streaming backbone. You will be responsible for architecting, building, securing, and optimizing our most critical data-in-motion platforms to support high-volume, low-latency financial applications. This is a role for a master engineer who solves the most complex distributed systems challenges. In this role, you will: Architecture & Engineering Architect, build, and optimize enterprise-grade IBM MQ and Apache Kafka infrastructure from the ground up. Design and implement resilient, high-availability (HA) and disaster recovery (DR) topologies, including MQ Multi-Instance Queue Managers/Clusters and Kafka cluster replication (e.g., MirrorMaker2). Engineer solutions for diverse messaging patterns: request/reply, pub/sub, transactional, and event streaming. Define and enforce enterprise standards for MQ queue/channel definitions, Kafka topic naming conventions, partitioning strategies, and data schemas (using Avro/Protobuf and Schema Registry). Serve as the technical design authority for all projects integrating with MQ or Kafka. Implementation & Administration Perform expert-level installation, configuration, and tuning of IBM MQ (Queue Managers, Channels, Listeners) and Kafka (Brokers, Zookeeper/KRaft, Connect). Implement advanced security controls: TLS/SSL for both platforms, Channel Authentication (CHLAUTH) and OAM in MQ, and SASL/SCRAM with ACLs in Kafka. Develop and maintain a robust automation framework (using Ansible, Python, Terraform) for provisioning, configuration management, and operational tasks for both MQ and Kafka. Manage and optimize the Kafka Connect ecosystem, deploying and monitoring connectors for data integration. Performance & Troubleshooting Lead performance tuning efforts to maximize throughput and minimize latency for both MQ and Kafka, focusing on buffer tuning, batching, compression, and log management. Conduct deep-dive root cause analysis (RCA) for production incidents, analyzing FDC files and error logs in MQ, and broker/consumer logs and metrics in Kafka. Utilize advanced debugging tools (e.g., tcpdump, Wireshark, JVM profilers) to diagnose complex network, application, and platform issues. Proactively monitor platform health, consumer lag, message throughput, and system resource utilization using tools like Prometheus, Grafana, and enterprise monitoring suites. Developer & Application Support Act as a senior consultant to application development teams on best practices for using MQI, JMS, and Kafka Producer/Consumer APIs. Troubleshoot critical integration issues, including poison messages, stuck consumers, message ordering conflicts, and idempotent producer problems. Champion the adoption of modern practices like event-driven architecture and stream processing (using Kafka Streams or ksqlDB).

  • Architect, build, and optimize enterprise-grade IBM MQ and Apache Kafka infrastructure from the ground up.
  • Design and implement resilient, high-availability (HA) and disaster recovery (DR) topologies, including MQ Multi-Instance Queue Managers/Clusters and Kafka cluster replication (e.g., MirrorMaker2).
  • Engineer solutions for diverse messaging patterns: request/reply, pub/sub, transactional, and event streaming.
  • Define and enforce enterprise standards for MQ queue/channel definitions, Kafka topic naming conventions, partitioning strategies, and data schemas (using Avro/Protobuf and Schema Registry).
  • Serve as the technical design authority for all projects integrating with MQ or Kafka.
  • Perform expert-level installation, configuration, and tuning of IBM MQ (Queue Managers, Channels, Listeners) and Kafka (Brokers, Zookeeper/KRaft, Connect).
  • Implement advanced security controls: TLS/SSL for both platforms, Channel Authentication (CHLAUTH) and OAM in MQ, and SASL/SCRAM with ACLs in Kafka.
  • Develop and maintain a robust automation framework (using Ansible, Python, Terraform) for provisioning, configuration management, and operational tasks for both MQ and Kafka.
  • Manage and optimize the Kafka Connect ecosystem, deploying and monitoring connectors for data integration.
  • Lead performance tuning efforts to maximize throughput and minimize latency for both MQ and Kafka, focusing on buffer tuning, batching, compression, and log management.
  • Conduct deep-dive root cause analysis (RCA) for production incidents, analyzing FDC files and error logs in MQ, and broker/consumer logs and metrics in Kafka.
  • Utilize advanced debugging tools (e.g., tcpdump, Wireshark, JVM profilers) to diagnose complex network, application, and platform issues.
  • Proactively monitor platform health, consumer lag, message throughput, and system resource utilization using tools like Prometheus, Grafana, and enterprise monitoring suites.
  • Act as a senior consultant to application development teams on best practices for using MQI, JMS, and Kafka Producer/Consumer APIs.
  • Troubleshoot critical integration issues, including poison messages, stuck consumers, message ordering conflicts, and idempotent producer problems.
  • Champion the adoption of modern practices like event-driven architecture and stream processing (using Kafka Streams or ksqlDB).
  • 7+ years of Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • deep, hands-on engineering experience with both IBM MQ and Apache Kafka in a large-scale enterprise environment.
  • IBM MQ Expertise: Mastery of MQSC commands, MQ Explorer, and architectural patterns (Clustering, Multi-Instance). Deep knowledge of MQ security (OAM, CHLAUTH) and log management.
  • Kafka Expertise: Mastery of the Kafka ecosystem (Brokers, Zookeeper/KRaft, Connect, Schema Registry). Proven experience with Kafka security (SASL, ACLs, mTLS) and performance tuning.
  • Automation Proficiency: Strong scripting and automation skills using Ansible, Python, Shell, or Terraform are essential.
  • Integration Knowledge: Expert-level understanding of JMS, MQI, and Kafka client APIs.
  • Troubleshooting: Elite-level debugging skills with the ability to analyze everything from network packets to application code and system logs.
  • Operating Systems & Networking: Solid expertise in Linux/UNIX and a strong understanding of TCP/IP, firewalls, and load balancers as they relate to distributed messaging systems.
  • High-Volume Environments: Experience in financial services or another industry with high-throughput, low-latency, and zero-data-loss requirements is a major plus.
  • A Bachelor's degree in Computer Science/Engineering or equivalent real-world experience.
  • A builder's mentality with a passion for automation and infrastructure-as-code.
  • An obsession with performance and reliability.
  • The ability to remain calm and methodical while troubleshooting high-pressure production outages.
  • A natural collaborator who enjoys mentoring developers and other engineers.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service