Network Engineer with SRE

Donato TechnologiesPhoenix, AZ
2dOnsite

About The Position

We are seeking a Network SRE to ensure the reliability, scalability, and performance of cloud and hybrid network platforms. This role applies SRE principles to networking by shifting from manual network operations to automated, observable, and resilient network services.The ideal candidate is a network engineer who thinks like a software engineer and SRE.

Requirements

  • Strong networking fundamentals (TCP/IP, DNS, BGP, routing)
  • AWS networking expertise
  • SRE concepts & practices
  • Network observability & monitoring
  • Infrastructure as Code
  • Production incident handling experience

Responsibilities

  • Define SLIs, SLOs, and Error Budgets for network services.
  • Design networks for high availability, fault tolerance, low latency, and predictable performance
  • Improve network reliability while reducing operational toil.
  • Architect and operate AWS networking: VPCs, Subnets, Route Tables, Transit Gateway, NAT, IGW, PrivateLink, VPC Endpoints
  • Design hybrid connectivity: VPN, Direct Connect
  • Support multi-account and multi-region architectures.
  • Build deep network observability using: VPC Flow Logs, CloudWatch, Datadog, Prometheus / Grafana
  • Analyze packet loss, latency, and throughput.
  • Implement proactive alerting based on SLOs.
  • Correlate network signals with application performance.
  • Automate network provisioning and changes using: Terraform / CloudFormation
  • Implement CI/CD for network changes.
  • Reduce manual configuration and human error.
  • Version-control network definitions.
  • Lead network-related incident response.
  • Perform deep root-cause analysis for: Packet drops, Routing issues, DNS failures, Load balancer degradation
  • Participate in on-call rotation and post-incident reviews.
  • Drive permanent fixes rather than workarounds.
  • Design and enforce: Network segmentation, Zero-Trust principles, Firewall rules (Security Groups, NACLs)
  • Implement secure ingress/egress patterns.
  • Support DDoS protection (AWS Shield, WAF).
  • Work with Security teams on audits and remediation.
  • Conduct traffic modeling and capacity forecasting.
  • Tune load balancers (ALB, NLB).
  • Optimize routing and failover strategies.
  • Validate resilience through failure testing.
  • Partner with: Cloud Platform teams, Application SREs, Security & Infra teams
  • Enable application teams with network best practices.
  • Produce architecture diagrams, runbooks, and SOPs.
  • Influence platform design decisions.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service