About The Position

At Freddie Mac, our mission of Making Home Possible is what motivates us, and it’s at the core of everything we do. Since our charter in 1970, we have made home possible for more than 90 million families across the country. Join an organization where your work contributes to a greater purpose. Position Overview Join our Network Observability team as we drive the transformation toward a high-performance, automated, and proactive network infrastructure. This role is pivotal in advancing our multi-tiered observability strategy, integrating best-in-class tools NetBrain, ThousandEyes, GigaMon, Extrahop, and ELK stack and ensuring seamless visibility across all business locations, cloud edge points, and interconnects. You will collaborate with engineering, operations, and third-party partners to deliver actionable insights, rapid incident response, and continuous service assurance. A key responsibility will be enabling and maturing autonomous operations for incident response, leveraging automation and AI-driven analytics to minimize manual intervention and accelerate resolution. Our Impact We enable the organization to achieve operational transparency, reduce incident response times, and ensure compliance with industry standards. By consolidating and modernizing our observability toolset, we empower teams to focus on innovation and proactive network health management, supporting business agility and future technology needs. Our commitment to autonomous operations ensures that network incidents are detected, diagnosed, and remediated with minimal human intervention, driving operational excellence. Your Impact Lead the deployment and integration of advanced observability platforms (NetBrain, ThousandEyes, etc.) to close visibility gaps and automate diagnostics Architect and implement autonomous incident response workflows, including automated root-cause analysis, guided remediation, and AI-driven predictive analytics Collaborate with cross-functional teams to design and implement unified dashboards, custom reports, and compliance reviews Drive the consolidation and retirement of legacy tools, streamlining operations and reducing complexity Support third-party assessments, benchmarking, and remediation efforts to ensure continuous improvement and alignment with strategic goals Champion the adoption of predictive analytics, real-time telemetry, and automated incident response workflows

Requirements

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or related field
  • 10+ years related work experience
  • 5+ years of experience in network engineering, observability, or IT operations
  • Hands-on expertise with network monitoring and analytics tools NetBrain ThousandEyes RiverBed Suite Extrahop GigaMon ELK stack
  • Strong understanding of network protocols, SNMPv3, streaming telemetry, and cloud/SaaS environments
  • Experience with automation, incident response, and compliance reporting frameworks
  • Demonstrated experience designing or operating autonomous operations for incident response, including automation playbooks and AI/ML-driven analytics
  • Excellent communication and collaboration skills; ability to work with technical and non-technical stakeholders

Nice To Haves

  • Proactive problem-solving and a passion for continuous improvement.
  • Ability to synthesize complex data into actionable insights for diverse audiences.
  • Adaptability in a fast-evolving technology landscape; willingness to learn and champion new tools.
  • Strong organizational skills to manage tool consolidation, third-party assessments, and cross-team initiatives.
  • Commitment to operational excellence, compliance, and delivering measurable outcomes (e.g., reduced MTTR/MTTD, improved visibility coverage).
  • Vision and drive to advance autonomous operations for incident response, ensuring rapid, reliable, and scalable network assurance.

Responsibilities

  • Lead the deployment and integration of advanced observability platforms (NetBrain, ThousandEyes, etc.) to close visibility gaps and automate diagnostics
  • Architect and implement autonomous incident response workflows, including automated root-cause analysis, guided remediation, and AI-driven predictive analytics
  • Collaborate with cross-functional teams to design and implement unified dashboards, custom reports, and compliance reviews
  • Drive the consolidation and retirement of legacy tools, streamlining operations and reducing complexity
  • Support third-party assessments, benchmarking, and remediation efforts to ensure continuous improvement and alignment with strategic goals
  • Champion the adoption of predictive analytics, real-time telemetry, and automated incident response workflows
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service