DNS Engineer - SRE

OptimumTown of Oyster Bay, NY
Hybrid

About The Position

We are Optimum, a leader in the fast-paced world of connectivity, and we're on the hunt for enthusiastic professionals to join our team! We understand that connectivity isn't just a luxury anymore – it's a necessity that empowers lives, fuels businesses, and drives innovation. A career at Optimum means you'll be enabling progress and enhancing lives by providing reliable, high-speed connectivity solutions that keep the world connected. We owe our success to our amazing product, commitment to our people and the connections we make in every community. If you are resourceful, collaborative, team-oriented and passionate about delivering consistent excellence, Optimum is the Company for you! We are Optimum!Job Summary The Role DNS Engineer – SRE is a high-impact role responsible for the architecture, scalability, and reliability of the mission-critical DNS infrastructure powering our ISP and core network services. This position is designed for an engineer who views infrastructure through the lens of Site Reliability Engineering (SRE) prioritizing automation, observability, and self-healing systems over manual intervention. You will combine deep IP networking and DNS expertise with modern security protocols to ensure our platforms remain resilient against evolving threats and perform at the highest level for millions of users.The Impact This is a collaborative and influential role. Beyond core engineering, you will serve as a technical authority, leading cross-functional initiatives with Product, Security, and Service Assurance teams. Your goal is to deliver a carrier-grade DNS ecosystem that balances cutting-edge privacy standards (DoH/DoT) with the uncompromising availability required by Tier-1 network operations.

Requirements

  • Education: Bachelor’s degree in Computer Science, Telecommunications, or a related field (or equivalent practical experience in networking and security)
  • Experience: 5+ years in a networking or systems engineering role, with a focus on SRE principles (automation, reliability, and monitoring) in production environments
  • DNS Fundamentals: Hands-on experience configuring and maintaining at least two of the following: BIND, Unbound, PowerDNS, AWS Route 53, or Azure DNS
  • Networking Protocols: Functional understanding of TCP/IP (IPv4/v6) and DNS-specific protocols including DNSSEC and encrypted transport (DoH/DoT)
  • Systems & Automation: Strong Linux/Unix administration skills and proficiency in at least one scripting language (Python, Bash, or Go) for task automation
  • Observability: Experience using Grafana and OpenTelemetry (or similar tools) to monitor service health and performance

Nice To Haves

  • DNS Systems: Hands-on experience managing BIND, Unbound, or PowerDNS in high-traffic environments, alongside cloud-native solutions (AWS Route 53, Azure DNS, Google Cloud DNS)
  • Protocol Expertise: Mastery of DNS-specific protocols including DNSSEC, DoT, and DoH, with a firm grasp of underlying transport layers (UDP/TCP) and dual-stack (IPv4/IPv6) networking
  • Observability: Experience building dashboards and alerts using Prometheus, ELK, or OpenTelemetry to monitor DNS query latency and error rates
  • Automation: Proven ability to manage "DNS as Code" using Terraform or Ansible and writing scripts (Python/Go) to automate routine zone updates
  • Scale & Security: Background in Tier-1/Tier-2 service provider environments with a focus on service resilience, Anycast distribution, and DDoS protection

Responsibilities

  • Architectural Ownership: Lead the design and evolution of global DNS architectures, ensuring high availability through Anycast routing, multi-provider redundancy, and automated failover mechanisms.
  • Strategic Vendor Relations: Act as the primary technical authority in engagements with DNS and infrastructure vendors, driving roadmaps that align with our long-term reliability and security goals.
  • Lifecycle & Capacity Management: Oversee the full lifecycle of DNS platforms—including automated software deployments, hardware refreshes, and proactive capacity planning—to stay ahead of traffic growth.
  • Standardization & Policy: Optimize, Define and enforce organization-wide standards for DNS record management, security protocols (DNSSEC), and traffic steering policies to optimize user latency.
  • Reliability Engineering: Convert "Strategic Design" into "Operational Reality" by defining Service Level Objectives (SLOs) and Error Budgets for all core name services.
  • Protocol Management: Manage the nuances of UDP/TCP port 53, recursion vs. iteration, and complex record types (A, AAAA, CNAME, MX, TXT, SRV).
  • Security & Mitigation: Implement and manage DNSSEC to prevent cache poisoning; act as a subject matter expert in mitigating DDoS and DNS amplification attacks.
  • Automation (Eliminating Toil): Replace manual updates and "pool" management with automated workflows using Python, Go, Ansible, or Terraform.
  • Performance Tuning: Perform Linux kernel tuning for high-performance network throughput and conduct deep-dive log analysis on systems like BIND, Unbound, or PowerDNS.
  • Observability: Utilize Prometheus, Grafana, and dnstap to monitor query rates and latency, providing actionable insights into error codes (NXDOMAIN, SERVFAIL).
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service