Staff AIOps Engineer – Generative AI Platform

Infoblox•Santa Clara, CA

58d•Hybrid

About The Position

At Infoblox, every breakthrough begins with a bold “what if.” What if your ideas could ignite global innovation? What if your curiosity could redefine the future? We invite you to step into the next exciting chapter of your career journey. Bring your creativity, drive, your daring spirit, and feel what it’s like to thrive on a team big enough to make an impact, yet small enough to make a difference. Our cloud-first networking and security solutions already protect 70% of the Fortune 500 , and we’re looking for creative thinkers ready to push that influence even further. Join us and discover how far your bold “what if” can take the world, your community, and your career. How we empower our people is extraordinary: we’re recognized as a Glassdoor Best Place to Work 2025, Great Place to Work-Certified in five countries, and honored by Cigna as a Healthy Workforce honors for three consecutive years; and what we build is world class: named CybersecAsia’s Best in Critical Infrastructure 2024 — clear evidence that when first-class technology meets empowered talent, remarkable careers take shape. So, what if the next big idea, and the next great career story, comes from you? Become the force that turns every “what if” into “what’s next.” In a world where you can be anything, Be Infoblox. We are looking for a Staff AIOps Engineer – Generative AI Platform to join our AI team. In this pivotal role, you will design, build, and operate the core GenAI platform capabilities that power product copilots, agentic workflows, and customer-facing AI services across the organization. You will partner closely with AI/ML engineers, platform teams, security, and product leaders to ensure our GenAI services are scalable, reliable, observable, and governed by design. This role combines distributed systems engineering, SRE discipline, and AI platform expertise to ensure production-ready, enterprise-grade AI infrastructure.

Requirements

10 or more years of software engineering experience, including significant experience building and operating large-scale distributed systems in production environments
Strong experience designing and running highly available, scalable backend services in cloud-native environments (e.g., AWS, GCP, or Azure)
Hands-on experience implementing CI/CD pipelines, automation frameworks, and lifecycle management processes for distributed or AI-powered systems
Proven experience defining and operating services against SLAs/SLOs, leading incident response, and driving structured post-incident improvements
Experience building observability, tracing, and monitoring systems for complex distributed platforms
Practical experience supporting AI/ML or Generative AI systems in production, including model access control, performance monitoring, governance, and safety enforcement
Demonstrated technical leadership, including cross-team architectural influence, platform ownership, and mentorship in SaaS or large-scale distributed environments
Bachelor's/Master's degree or a PhD in Computer Science, Computer Engineering

Responsibilities

Design, build, and operate core GenAI platform services, including the multi-model access layer, MCP services, and customer-facing AI infrastructure, ensuring reliability, scalability, and policy enforcement
Establish and evolve observability, tracing, and telemetry frameworks for GenAI systems across development and production environments
Define and operationalize SLAs, SLOs, and automated CI/CD and lifecycle management processes for GenAI services, including model onboarding, rollout, routing configuration, and controlled deprecation
Build and maintain guardrails, safety mechanisms, and governance controls to mitigate hallucination, prompt injection, data leakage, and misuse across copilots and agentic workflows
Lead incident response, root cause analysis, and systemic reliability improvements to strengthen platform stability and operational excellence
Drive architectural standards and cross-team alignment, mentoring engineers and partnering with AI/ML, platform, security, and product teams to deliver secure, production-grade GenAI systems

Benefits

Comprehensive health coverage, generous PTO, and flexible work options
Learning opportunities, career-mobility programs, and leadership workshops
Sixteen paid volunteer hours each year, global employee resource groups, and a “No Jerks” policy that keeps collaboration healthy
Modern offices with EV charging, healthy snacks (and the occasional cupcake), plus hackathons, game nights, and culture celebrations
Charitable Giving Program supported by Company Match
We practice pay transparency and reward performance.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume