About The Position

The Oracle Cloud Infrastructure (OCI) AI Services team builds and operates highly scalable, reliable, and secure cloud-based AI platforms. We are focused on delivering resilient infrastructure and automation to support Oracle’s AI-driven services at global scale. Oracle AI Services is seeking a Senior Site Reliability Developer with strong cloud infrastructure and automation experience. This role will focus on improving system reliability, scalability, and operational excellence across OCI AI platforms. The ideal candidate brings deep hands-on expertise in Kubernetes, Infrastructure as Code, automation, and cloud-native development practices.

Requirements

  • 3–5 years of experience in Site Reliability Engineering, DevOps, or Cloud Infrastructure roles.
  • Strong hands-on experience with Kubernetes in production environments.
  • Experience using Terraform for infrastructure provisioning and automation.
  • Proficiency in Python and at least one additional programming or scripting language (e.g., Go, Java, Bash).
  • Experience working with public cloud services and cloud-native architectures.
  • Solid understanding of CI/CD pipelines, monitoring, and system reliability best practices.

Nice To Haves

  • Experience supporting AI/ML infrastructure workloads.
  • Knowledge of distributed systems and microservices architecture.
  • Familiarity with observability tools (Prometheus, Grafana, etc.).
  • Strong troubleshooting and incident response skills.

Benefits

  • True innovation starts when everyone is empowered to contribute. That’s why we’re committed to growing a workforce that promotes opportunities for all with competitive benefits that support our people with flexible medical, life insurance, and retirement options. We also encourage employees to give back to their communities through our volunteer programs.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service