Principal Member of Technical Staff

OracleNashville, TN
12dOnsite

About The Position

Join Oracle Cloud Infrastructure (OCI) as a Principle Member of Technical Staff and play a pivotal role in shaping the future of cloud computing. In this role, you will lead the design, development, and operation of compute operability solutions, ensuring the reliability, scalability, and performance of OCI’s compute infrastructure. You’ll work with a team of innovative engineers to build and operate massive-scale, integrated cloud services that power businesses and organizations worldwide. As a software engineer on the OCI Compute team, you will focus on enhancing the operability of our compute services, driving automation, and optimizing system reliability. Your work will directly impact the performance of mission-critical workloads for Oracle’s global customers, solving complex challenges in distributed systems, high-availability computing, and operational excellence. Major focus areas of software maintained by the team include: Creating and maintaining highly available APIs for launching and managing Compute resources. Designing and implementing highly scalable systems capable of functioning across numerous regions in a worldwide cloud footprint. Building systems for orchestrating large-scale fleet management actions Decomposing large, monolithic codebases A key focus of this job will be designing and delivering automated CI/CD pipelines to ensure minimize human intervention and customer impact through proper blast radius control. The successful hire will have a passion for automation and preventing customer impact, but in the event operational issues arise, they will have the ability to quickly assess the impact and appropriate mitigation steps to restore service. This position is expected to be onsite full time in Nashville, TN. This is not a remote position.

Requirements

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field, or equivalent experience.
  • 7+ years of experience in software engineering, with at least 3 years focused on cloud infrastructure or distributed systems.
  • Deep expertise in compute operability, including virtualization, containerization, or orchestration technologies (e.g., KVM, Docker, Kubernetes).
  • Strong programming skills in languages such as Go, Python, Java, or C++.
  • Strong data analysis experience and proficiency in SQL
  • Proven experience with large-scale system design, automation, and operational tools (e.g., Grafana, Terraform, Prometheus).
  • Familiarity with cloud computing concepts, including IaaS, PaaS, or serverless architectures.
  • Excellent problem-solving skills and a track record of resolving complex technical challenges.
  • Strong communication and collaboration skills to work effectively in a globally distributed team.

Nice To Haves

  • Experience with OCI, AWS, Azure, or Google Cloud Platform.
  • Contributions to open-source projects or a strong portfolio of technical innovation.
  • Experience with observability tools (e.g., Grafana, ELK stack) and incident management processes.
  • Background in building automation for zero-downtime deployments or self-healing systems.

Responsibilities

  • Design and implement scalable, reliable, and high-performance compute operability solutions for OCI.
  • Develop tools, frameworks, and automation to enhance the operational efficiency of compute infrastructure.
  • Collaborate with cross-functional teams to define and deliver operability improvements, including monitoring, incident response, and capacity planning.
  • Troubleshoot and resolve complex technical issues in large-scale distributed systems.
  • Drive the adoption of best practices for system reliability, performance tuning, and operational excellence.
  • Mentor junior engineers and contribute to the technical strategy for compute services.
  • Innovate to improve system availability, reduce latency, and optimize resource utilization.
  • Participate in on-call rotations to ensure 24/7 service reliability.

Benefits

  • Medical, dental, and vision insurance, including expert medical opinion
  • Short term disability and long term disability
  • Life insurance and AD&D
  • Supplemental life insurance (Employee/Spouse/Child)
  • Health care and dependent care Flexible Spending Accounts
  • Pre-tax commuter and parking benefits
  • 401(k) Savings and Investment Plan with company match
  • Paid time off: Flexible Vacation is provided to all eligible employees assigned to a salaried (non-overtime eligible) position. Accrued Vacation is provided to all other employees eligible for vacation benefits. For employees working at least 35 hours per week, the vacation accrual rate is 13 days annually for the first three years of employment and 18 days annually for subsequent years of employment. Vacation accrual is prorated for employees working between 20 and 34 hours per week. Employees working fewer than 20 hours per week are not eligible for vacation.
  • 11 paid holidays
  • Paid sick leave: 72 hours of paid sick leave upon date of hire. Refreshes each calendar year. Unused balance will carry over each year up to a maximum cap of 112 hours.
  • Paid parental leave
  • Adoption assistance
  • Employee Stock Purchase Plan
  • Financial planning and group legal
  • Voluntary benefits including auto, homeowner and pet insurance

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Principal

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service