Sr. Software Engineer Cloud infrastructure

ParamountBurbank, CA
$124,000 - $186,000Hybrid

About The Position

The Applied Intelligence Data Engineering team is seeking a Senior Software Engineer – Cloud Infrastructure. This is a hybrid role that blends deep software engineering with hands-on ownership of cloud infrastructure. It is purpose-built for engineers who are equally writing production Java services and designing multi-cloud platform architecture. In this role, you will own the full lifecycle of cloud-native systems. You will build high-performance streaming applications and architect the infrastructure they run on. You will also ensure both are production-grade, observable, and secure. You will bridge application and platform engineering, reducing the gap between what software teams need and what the cloud platform delivers. This role requires deep expertise in distributed systems, cloud-native architecture, Kubernetes, and software engineering best practices. You should have a track record of operating at both the application and infrastructure layers.

Requirements

  • 7+ years of experience in software engineering and cloud infrastructure, with at least 3+ years in each area.
  • Demonstrated expertise in optimizing data streaming applications on cloud infrastructure.
  • Proven track record building and operating production-grade real-time data platforms.
  • Experience mentoring engineers and collaborating across teams.
  • Bachelor's degree in Computer Science, Engineering, or a related field; advanced degree preferred.
  • Deep expertise in distributed systems, cloud-native architecture, Kubernetes, and software engineering best practices.
  • Track record of operating at both the application and infrastructure layers.
  • Deep expertise in optimizing data streaming applications for throughput, latency, and cost-efficiency across cloud environments.
  • Proficient in tuning Kafka producers, consumers, and brokers — including batch sizing, compression, partition strategies, and consumer lag management — within cloud-hosted deployments.
  • Experience leveraging cloud-native managed services (e.g., Kafka, GCP Pub/Sub, BigQuery Streaming; OCI Streaming) to complement or extend Kafka-based pipelines.
  • Skilled at right-sizing and dynamically allocating cloud compute resources (VMs, node pools, spot/preemptible instances) to match streaming workload profiles and reduce infrastructure spend.
  • Familiarity with cloud-native autoscaling patterns for streaming consumers, including KEDA, HPA, and custom metrics-based scaling in Kubernetes.
  • Ability to identify and address performance bottlenecks at the intersection of application code and cloud resource limits.
  • Experience benchmarking and profiling streaming pipelines from start to finish.
  • Need to turn your findings into improvements for infrastructure and code.

Nice To Haves

  • Advanced degree preferred.

Responsibilities

  • Own end-to-end performance of data streaming applications running on cloud infrastructure — from Kafka topic configuration through consumer processing and downstream delivery.
  • Profile and tune streaming pipelines to maximize throughput and minimize latency, leveraging cloud-native compute, storage, and networking resources.
  • Identify and address bottlenecks at the intersection of application code and cloud resource constraints, including CPU throttling, network saturation, I/O limits, and memory constraints.
  • Design and implement cloud resource utilization strategies. This includes spot/preemptible instances, managed streaming services, and dynamic node pool scaling to balance performance with cost efficiency.
  • Benchmark streaming pipelines end-end and translate findings into actionable infrastructure and code improvements.
  • Collaborate with Data and AI/ML engineering teams to ensure streaming pipelines are optimally provisioned for real-time feature engineering, inference, and analytics workloads.
  • Develop high-throughput, low-latency streaming applications using Java and Kafka.
  • Design event-driven microservices that process, enrich, and route real-time data at scale.
  • Implement reactive, non-blocking architectures to support high concurrency and resilience.
  • Develop reusable streaming frameworks, libraries, and platform capabilities to improve engineering velocity and standardization.
  • Architect, implement, and optimize multi-cloud infrastructure across GCP and OCI to support large-scale data and streaming workloads.
  • Design and implement advanced networking architectures, including VPC peering, VPNs, load balancers, and cross-region failover strategies.
  • Build and maintain Terraform-based infrastructure-as-code frameworks to standardize deployments and enable developer self-service.
  • Define autoscaling, deployment, failover, and resource optimization strategies for high-volume production systems.
  • Contribute to platform-wide architecture decisions related to scalability, resiliency, high availability, and disaster recovery.
  • Deploy and manage containerized microservices within Kubernetes environments (GKE, OKE) across cloud platforms.
  • Implement container orchestration best practices, service-mesh configurations, and rolling-deployment strategies.
  • Partner with platform engineering teams to improve developer tooling, deployment automation, and runtime reliability.
  • Ensure production-grade reliability, observability, and operational maturity across streaming platforms and infrastructure.
  • Implement comprehensive observability using Prometheus, Grafana, centralized logging, distributed tracing, and health monitoring.
  • Optimize systems for throughput, latency, resiliency, resource efficiency, and cloud cost governance.
  • Build automated testing strategies for streaming and infrastructure workflows, including unit, integration, contract, chaos, and performance testing.
  • Lead incident response, root-cause analysis, and postmortems to improve uptime and reduce operational risk.
  • Partner with Data Engineering teams to integrate streaming architectures with batch processing, data lakes, and analytical platforms.
  • Collaborate with Software Engineering, Product Management, and API teams to enable real-time services and data-driven applications.
  • Work closely with AI/ML engineering teams to support real-time feature engineering, inference pipelines, and operational AI workload.
  • Clearly communicate technical tradeoffs to engineering stakeholders. Also, discuss scalability considerations and operational risks.
  • Lead architectural discussions, design reviews, and technical deep dives across distributed systems and cloud infrastructure.
  • Drive engineering standards across code quality, documentation, observability, security, and platform maintainability.
  • Influence long-term technical strategy and modernization initiatives for real-time data infrastructure and cloud platforms.

Benefits

  • medical
  • dental
  • vision
  • 401(k) plan
  • life insurance coverage
  • disability benefits
  • tuition assistance program
  • PTO
  • bonus eligible
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service