Site Reliability Engineer II

PROSHouston, TX

About The Position

The Site Reliability Engineer II optimizes service performance, actively participates in reliability improvements, and conducts in-depth SLO and capacity analysis. This position exists to enhance system reliability and scalability while contributing to automation and self-service tool development.

Requirements

  • 5+ years of experience in enterprise networking, including hands‑on work with routing, switching, firewalls, load balancers, and VPN technologies.
  • Strong understanding of cloud networking architectures across including VPC/VNet design, peering, private link, and hybrid connectivity models.
  • Experience with network security technologies, such as security groups, NACLs, firewall policies, WAF, IDS/IPS, and micro‑segmentation.
  • Proficiency in Layer 2 and Layer 3 network protocols, including BGP, OSPF, EIGRP, DNS, DHCP, NAT, and IP addressing/subnetting.
  • Hands‑on experience with load balancers and ingress technologies, including F5, NGINX, Azure Application Gateway, ALB/NLB, or equivalent.
  • Strong troubleshooting skills using packet analyzers tools, flow logs, and network monitoring platforms.
  • Skilled in analyzing performance trends and identifies optimization opportunities.
  • Skilled in analyzing trends to inform service improvements.
  • Collaborates with teams to align SLOs with user expectations.
  • Develops moderately complex automation tools.
  • Skilled in analyzing capacity data to inform scaling decisions.

Nice To Haves

  • Bachelor’s Degree in Computer Science, Information Technology, or a related field
  • Practical experience with Fortigate firewalls and F5 appliances is highly desirable
  • Understand core AI concepts and apply them ethically to enhance productivity, insights, and decision-making.
  • Craft effective prompts to optimize the quality and relevance of AI-generated outputs.
  • Explore and apply agentic AI systems, using or managing autonomous agents to streamline workflows and automate tasks.
  • Leverage AI tools to boost efficiency, creativity, and innovation in their daily work.
  • Stay curious and adaptable, continuously experimenting with AI-driven solutions to elevate team performance and customer impact.

Responsibilities

  • Monitor service performance, assist in troubleshooting production issues, and learn system architecture.
  • Monitor service reliability, participate in resolving basic issues, and learn disaster recovery testing procedures.
  • Understand SLO concepts, monitor and analyze SLO patterns, and assist in implementing SLO visualization and alerting.
  • Perform basic capacity analysis, identify trends in system capacity, and participate in capacity planning.
  • Deploy and maintain existing automation tools, create simple scripts, and troubleshoot automation scripts.
  • Collaborates with teams to improve monitoring coverage.
  • Ability to participate in structured reliability testing and analysis.
  • Able to evaluate system components for resilience.
  • Contributes to reliability-focused design discussions.
  • Skill in building internal self-service capabilities.
  • Evaluates automation opportunities for operational efficiency.
  • Able to recommend improvements for resource utilization.
  • Ensures scalability is considered in feature development.
  • Follow predefined procedures to deploy PROS products and third-party applications to the Cloud environments.
  • Contribute to the release management documentation.
  • Gain understanding of application architecture and interaction between system components.

Benefits

  • flexible ways of working
  • continuous learning
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service