About this role: Wells Fargo is seeking a Lead AI Ops Engineer to own and advance the Commercial Observability Platform. This role provides technical leadership across agentic AI systems, AI‑powered observability, advanced analytics, and enterprise telemetry platforms , enabling proactive monitoring, faster root cause analysis, and improved operational resilience across critical business applications. This position is intended for a senior, hands‑on AI engineer who will serve as a technical role model and bar raiser , setting standards for engineering excellence in AI‑driven observability and operations In this role, you will: Design, build, and maintain production‑grade AI and agentic systems that reason over observability data including logs, metrics, traces, events, and digital experience signals Develop LLM-powered workflows to support automated incident analysis, intelligent alerting, operational insights, and root cause analysis (RCA) summaries Architect and implement agentic or multi‑agent AI workflows that decompose complex operational problems, analyze telemetry across multiple tools, and coordinate actionable recommendations Apply AIOps and machine learning techniques such as anomaly detection, correlation, pattern recognition, forecasting, noise reduction, and predictive insights Write and maintain Python‑based AI services, orchestration logic, and data pipelines deployed in production environments Establish best practices for AI system observability, governance, feedback loops, and continuous improvement Lead the design, implementation, and evolution of enterprise observability platforms supporting commercial applications Own and operate observability tools including Splunk Observability, Splunk (logs, metrics, traces), AppDynamics, and Glassbox Define and enforce standards for telemetry collection, including logging, metrics, distributed tracing, and real user monitoring Perform and lead complex root cause analysis by analyzing application code, logs, metrics, traces, infrastructure signals, and user experience data Act as a senior Splunk query developer , designing highly complex SPL queries that function as analytical programs to correlate large volumes of telemetry data Build and optimize advanced Splunk dashboards using multi‑stage SPL pipelines, statistical functions, joins, lookups, and enrichments Develop Splunk analytics that power real‑time operational insights, advanced alerting, historical analysis, and AI model inputs Design and develop Beacon / Telemetry APIs to collect custom application, platform, and business signals Build and maintain telemetry ingestion services that normalize, store, and enrich data for analytics and AI/ML solutions Partner closely with application engineering, SRE, and platform teams to improve reliability, performance, and operational maturity Provide technical leadership and mentoring, serving as a role model for strong AI, analytics, and observability engineering practices Influence engineering standards and contribute to long‑term observability and AI platform strategy
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
No Education Listed
Number of Employees
5,001-10,000 employees