Software Engineering Manager 1 – Streaming & Cloud Platform Reliability

Hewlett Packard EnterpriseCupertino, CA
1dHybrid

About The Position

We’re looking for a hands‑on Software Engineering Manager to lead a small team (2–4 developers) focused on improving the reliability of Mist’s cloud platform by driving concrete postmortem action items from our incident management process. This team owns follow‑ups from production incidents—especially those involving our streaming data pipelines (Kafka / Flink / Storm) and core APIs. You’ll work closely with senior engineers to turn incident learnings into durable engineering improvements. This is a hybrid role requiring on‑site collaboration multiple days per week in Cupertino, California. Due to the requirements of this position, this role requires a US Citizen or Green Card holder.

Requirements

  • 7+ years total professional software engineering experience. This is a hybrid role requiring on‑site collaboration multiple days per week in Cupertino, California. Due to the requirements of this position, this role requires a US Citizen or Green Card holder.
  • 2+ years in a team lead role (mentors, performance feedback, prioritization), while remaining hands‑on technically.
  • 5+ years building backend or distributed systems in Python, Go, or Java proficiency in at least one of these languages to lead design reviews and contribute production code.
  • 3+ years designing, implementing, and operating distributed, event‑driven systems using: Kafka and at least one of Flink or Storm, or a comparable streaming framework.
  • 3+ years building and operating RESTful APIs (service design, auth, rate limiting, client IP handling, versioning).
  • 3+ years working with cloud‑native infrastructure: Kubernetes, containerized microservices, CI/CD pipelines.
  • 3+ years with production datastores such as: Redis, Postgres, Cassandra/Datastax, S3/GCS, or similar distributed storage systems.
  • 2+ years directly involved in production incident response: On‑call participation, postmortems, and driving remediation work through to completion.
  • Proven ability to debug latency, throughput, data correctness, and availability issues in streaming pipelines and/or APIs.
  • Experience adding or improving metrics, logging, tracing, and alerts for production services.

Nice To Haves

  • 2+ years working with big‑data / analytics or ETL systems (e.g., Apache Spark, Airflow, Snowflake, or similar).
  • Experience with webhook or event‑delivery systems (idempotency, retries, ordering, DLQs).
  • Exposure to multi‑region / DR design: cross‑cloud migrations, DNS and certificate management, environment‑driven configuration.
  • Familiarity with DevOps practices, CI/CD automation, and service ownership.
  • Experience with observability stacks such as Prometheus, Grafana, Kibana/Elasticsearch.

Responsibilities

  • Own and drive post‑incident follow‑ups from our Incident Management process, turning incident reports into design and implementation work.
  • Lead, mentor, and grow a 2–4 person engineering team, while contributing hands‑on code in production services.
  • Design, implement, and harden streaming topologies using Kafka, Storm, and/or Flink (e.g., stats, telemetry, alerts, pcaps).
  • Improve reliability of core APIs (REST API, WebSocket, Webhooks, etc.), including auth, rate limiting, and DR‑sensitive flows.
  • Enhance observability and runbooks: add metrics/alerts, define SLOs, and codify playbooks for recurring incident patterns.
  • Collaborate with SRE, Platform, and Data teams on DR, multi‑region, and multi‑cloud behavior (AWS, GCP, DR regions).
  • Ensure robust testing and deployment practices (unit/integration tests, regression tests for past incidents, safe rollout/rollback).

Benefits

  • Health & Wellbeing We strive to provide our team members and their loved ones with a comprehensive suite of benefits that supports their physical, financial and emotional wellbeing.
  • Personal & Professional Development We also invest in your career because the better you are, the better we all are. We have specific programs catered to helping you reach any career goals you have — whether you want to become a knowledge expert in your field or apply your skills to another division.
  • Unconditional Inclusion We are unconditionally inclusive in the way we work and celebrate individual uniqueness. We know varied backgrounds are valued and succeed here. We have the flexibility to manage our work and personal needs. We make bold moves, together, and are a force for good.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Manager

Education Level

No Education Listed

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service