About The Position

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Principal Site Reliability Engineer - Network in United States. This role focuses on ensuring the reliability, performance, and security of network infrastructures within cloud-based SaaS products. The Principal Site Reliability Engineer will design, implement, and maintain resilient network architectures while automating operational processes to reduce manual work. The position involves collaborating closely with software engineering teams, IT, and cross-functional stakeholders to troubleshoot network issues, implement monitoring and alerting, and ensure systems meet SLAs and SLOs. This role requires a hands-on, proactive approach to problem-solving, continuous learning, and staying ahead of network technologies. The ideal candidate thrives in a fast-paced environment, takes ownership of high-impact projects, and contributes to a culture of operational excellence. You will also serve as a senior escalation point for complex network issues, providing guidance and mentorship to other SREs.

Requirements

  • 5+ years of experience designing and managing cloud-based (Azure) SaaS network infrastructures.
  • Deep expertise in networking protocols (IP, TCP/IP, ICMP, DNS, DHCP, ARP, SSL/TLS) and network traffic analysis tools (e.g., Wireshark).
  • Strong experience with firewall engineering (e.g., Palo Alto), SDWAN (e.g., Silverpeak), and load balancing.
  • Proficiency in scripting and automation using Python, Bash, or PowerShell.
  • Experience with Infrastructure as Code (Terraform or similar), containerization, and managing Kubernetes clusters (AKS/EKS).
  • Strong problem-solving skills, attention to detail, and ability to operate independently in a fast-paced environment.
  • BS in Computer Science or equivalent work experience.
  • Proven ability to collaborate across teams and mentor junior engineers while maintaining operational excellence.

Responsibilities

  • Champion and implement SRE best practices to ensure reliable, secure, and high-performing network infrastructure.
  • Design, configure, and maintain redundant and fault-tolerant networks including firewalls, load balancers, DNS, routing tables, and SDWAN systems.
  • Develop and maintain network diagrams, monitoring systems, and alerting frameworks to prevent client-impacting issues.
  • Identify and remediate network issues using diagnostic tools, traces, and root cause analysis; coordinate with vendors for defect resolution.
  • Automate operational processes and runbooks to minimize manual intervention and improve system reliability.
  • Collaborate with IT and software engineering teams to integrate SaaS product networks into broader enterprise infrastructure.
  • Participate in on-call duties, lead incident triage, and drive post-incident reviews and continuous improvement initiatives.

Benefits

  • Competitive salary and performance-based incentives.
  • Flexible remote work environment.
  • Comprehensive health, dental, and vision insurance.
  • 401(k) plan with company match.
  • Paid time off and parental leave.
  • Professional development opportunities, including certifications and technical training.
  • Collaborative, inclusive, and values-driven work culture.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service