Senior Manager SRE Cloud Operations

Oracle

About The Position

Oracle Cloud Infrastructure (OCI) is seeking an accomplished Senior Manager of Software Development with a strong background in both software engineering and cloud operations. In this role, you will lead a high-performing Software Reliability Engineers (SRE) and DevOps team responsible for designing, building, and operating highly available, scalable, and resilient cloud services operations automation and tools. You will be accountable not just for automation solutions, but also for the 12x7 operational health, performance, and efficiency of your services operation. You will enable world-class customer experiences by setting operational standards, ensuring rapid detection and resolution of incidents, and continually driving for service operation excellence, automation, and efficiency. You will partner closely with Service (Product) and Support teams to deliver new solutions at scale, ensuring robust monitoring, alerting, and operational runbooks are in place. True innovation starts when everyone is empowered to contribute. That’s why we’re committed to growing a workforce that promotes opportunities for all with competitive benefits that support our people with flexible medical, life insurance, and retirement options. We also encourage employees to give back to their communities through our volunteer programs. We’re committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by emailing [email protected] [[email protected]] or by calling 1-888-404-2494 in the United States. Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans’ status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.

Requirements

Bachelor’s or master's degree in computer science, Engineering, or relevant field, or equivalent experience.
3+ years’ technical or people management experience in cloud or SRE organizations.
10+ years’ experience in software engineering, site reliability engineering, or IT operations for large-scale, distributed, multi-tenant services.
Demonstrated ownership of 24x7 operational services, including monitoring, incident response, and continuous improvement.
Knowledge in at least one major language (Java, C, C++, Python) and in operational scripting.
Solid grasp of distributed systems, networking, operating systems, and security fundamentals.
Experience with automation, deployment pipelines, service telemetry, and operational dashboards.
Strong communication and stakeholder management skills.

Nice To Haves

7+ years' operating and supporting cloud infrastructure or large SaaS environments.
Deep hands-on experience in operational tools, runbook development, and incident management frameworks.
Experience with cost management and operational efficiency at scale.
Familiarity with container orchestration, configuration management, and infrastructure-as-code.
Experience building and scaling geographically distributed teams, and managing complex on-call schedules.