Oracle-posted 3 days ago
Full-time • Principal
Pleasanton, CA
5,001-10,000 employees

Description AI-Driven Incident Remediation (LARS) Design and implement new LARS remediation workflows across single-tenant (ST) and multi-tenant (MT) OAC instances. Expand automated coverage for service health, capacity, cluster availability, and network-related alarms. Enhance AI-driven diagnostics, triage, pattern detection, and auto-approval pipelines for incident mitigation. Improve observability, cross-pod dashboarding, and multi-pod coordinated incident support. AI Initiatives for OASE DevOps Develop and integrate AI-assisted diagnostics and automated mitigation for high-severity production incidents. Contribute to Agentic DevOps initiatives, including autonomous remediation frameworks and prototype agent workflows. Collaborate with ML teams to incorporate models for anomaly detection, root-cause analysis, and remediation recommendations. AI assisted Automated Change Management Build tooling and CI/CD pipeline extensions to eliminate manual change processes and streamline deployment safety. Design guardrails, approval workflows, and automated rollouts to improve release reliability and reduce operational toil. Agentic DevOps Platform (MCP Servers) Develop MCP servers and agent orchestration workflows enabling end-to-end automated diagnostics and incident resolution. Integrate agent-driven actions with existing automation systems (LARS, CI/CD, service health signals). Contribute to next-generation self-healing and autonomous operations capabilities across OAC services.

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
  • 6+ years of experience in DevOps, Site Reliability Engineering, or related roles.
  • Strong proficiency in Python and Java for automation and system development.
  • Experience working with Oracle Cloud Infrastructure (OCI), preferably with Oracle Analytics Cloud or similar Oracle PaaS services.
  • Solid understanding of containerization and orchestration (Docker, Kubernetes, Helm).
  • Experience with Git-based workflows, artifact repositories, and CI/CD tooling.
  • Hands-on knowledge of Linux/Unix system administration and networking fundamentals.
  • Familiarity with monitoring tools (Prometheus, Grafana, ELK stack, OCI Monitoring).
  • Ability to work in a fast-paced, agile environment with a proactive mindset.
  • Oracle Cloud certifications (e.g., Oracle Cloud Infrastructure DevOps Professional).
  • Experience with secure DevOps (DevSecOps) practices.
  • Background in analytics platforms or data engineering is a plus.
  • Experience contributing to open-source or internal developer platforms.
  • Experience with integrating MCP servers and building RAG based Knowledge Base from semi-structured documents and logs.
  • Medical, dental, and vision insurance, including expert medical opinion
  • Short term disability and long term disability
  • Life insurance and AD&D
  • Supplemental life insurance (Employee/Spouse/Child)
  • Health care and dependent care Flexible Spending Accounts
  • Pre-tax commuter and parking benefits
  • 401(k) Savings and Investment Plan with company match
  • Paid time off: Flexible Vacation is provided to all eligible employees assigned to a salaried (non-overtime eligible) position. Accrued Vacation is provided to all other employees eligible for vacation benefits. For employees working at least 35 hours per week, the vacation accrual rate is 13 days annually for the first three years of employment and 18 days annually for subsequent years of employment. Vacation accrual is prorated for employees working between 20 and 34 hours per week. Employees working fewer than 20 hours per week are not eligible for vacation.
  • 11 paid holidays
  • Paid sick leave: 72 hours of paid sick leave upon date of hire. Refreshes each calendar year. Unused balance will carry over each year up to a maximum cap of 112 hours.
  • Paid parental leave
  • Adoption assistance
  • Employee Stock Purchase Plan
  • Financial planning and group legal
  • Voluntary benefits including auto, homeowner and pet insurance
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service