Staff AIOps Engineer – Generative AI Platform

InfobloxSanta Clara, CA
19hHybrid

About The Position

At Infoblox, every breakthrough begins with a bold “what if.” What if your ideas could ignite global innovation? What if your curiosity could redefine the future? We invite you to step into the next exciting chapter of your career journey. Bring your creativity, drive, your daring spirit, and feel what it’s like to thrive on a team big enough to make an impact, yet small enough to make a difference. Our cloud-first networking and security solutions already protect 70% of the Fortune 500 , and we’re looking for creative thinkers ready to push that influence even further. Join us and discover how far your bold “what if” can take the world, your community, and your career. How we empower our people is extraordinary: we’re recognized as a Glassdoor Best Place to Work 2025, Great Place to Work-Certified in five countries, and honored by Cigna as a Healthy Workforce honors for three consecutive years; and what we build is world class: named CybersecAsia’s Best in Critical Infrastructure 2024 — clear evidence that when first-class technology meets empowered talent, remarkable careers take shape. So, what if the next big idea, and the next great career story, comes from you? Become the force that turns every “what if” into “what’s next.” In a world where you can be anything, Be Infoblox. We are looking for a Staff AIOps Engineer – Generative AI Platform to join our AI team. In this pivotal role, you will design, build, and operate the core GenAI platform capabilities that power product copilots, agentic workflows, and customer-facing AI services across the organization. You will partner closely with AI/ML engineers, platform teams, security, and product leaders to ensure our GenAI services are scalable, reliable, observable, and governed by design. This role combines distributed systems engineering, SRE discipline, and AI platform expertise to ensure production-ready, enterprise-grade AI infrastructure.

Requirements

  • 10 or more years of software engineering experience, including significant experience building and operating large-scale distributed systems in production environments
  • Strong experience designing and running highly available, scalable backend services in cloud-native environments (e.g., AWS, GCP, or Azure)
  • Hands-on experience implementing CI/CD pipelines, automation frameworks, and lifecycle management processes for distributed or AI-powered systems
  • Proven experience defining and operating services against SLAs/SLOs, leading incident response, and driving structured post-incident improvements
  • Experience building observability, tracing, and monitoring systems for complex distributed platforms
  • Practical experience supporting AI/ML or Generative AI systems in production, including model access control, performance monitoring, governance, and safety enforcement
  • Demonstrated technical leadership, including cross-team architectural influence, platform ownership, and mentorship in SaaS or large-scale distributed environments
  • Bachelor's/Master's degree or a PhD in Computer Science, Computer Engineering

Responsibilities

  • Design, build, and operate core GenAI platform services, including the multi-model access layer, MCP services, and customer-facing AI infrastructure, ensuring reliability, scalability, and policy enforcement
  • Establish and evolve observability, tracing, and telemetry frameworks for GenAI systems across development and production environments
  • Define and operationalize SLAs, SLOs, and automated CI/CD and lifecycle management processes for GenAI services, including model onboarding, rollout, routing configuration, and controlled deprecation
  • Build and maintain guardrails, safety mechanisms, and governance controls to mitigate hallucination, prompt injection, data leakage, and misuse across copilots and agentic workflows
  • Lead incident response, root cause analysis, and systemic reliability improvements to strengthen platform stability and operational excellence
  • Drive architectural standards and cross-team alignment, mentoring engineers and partnering with AI/ML, platform, security, and product teams to deliver secure, production-grade GenAI systems

Benefits

  • Comprehensive health coverage, generous PTO, and flexible work options
  • Learning opportunities, career-mobility programs, and leadership workshops
  • Sixteen paid volunteer hours each year, global employee resource groups, and a “No Jerks” policy that keeps collaboration healthy
  • Modern offices with EV charging, healthy snacks (and the occasional cupcake), plus hackathons, game nights, and culture celebrations
  • Charitable Giving Program supported by Company Match
  • We practice pay transparency and reward performance.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service