Senior Site Reliability Engineer

OracleAustin, TX
21h

About The Position

Oracle Cloud Infrastructure (OCI) is seeking a talented and motivated Senior SRE (IC3) to join our dynamic team that builds platform capabilities. Our team is responsible for ingesting large volumes of data at scale and run detections at scale on both data at rest and in motion (as in a stream). Responsibilities Work with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas. Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services. Responsible for the design and delivery of the mission critical stack, with focus on security, resiliency, scale, and performance. Authority for end-to-end performance and operability. Partner with development teams in defining and implementing improvements in service architecture. Articulate technical characteristics of services and technology areas and guide Development Teams to engineer and add premier capabilities to the Oracle Cloud service portfolio. Understand and communicate the scale, capacity, security, performance attributes, and requirements of the service and technology stack. Demonstrate clear understanding of automation and orchestration principles. Act as ultimate escalation point for complex or critical issues that have not yet been documented as Standard Operating Procedures (SOPs). Utilize a deep understanding of service topology and their dependencies required to troubleshoot issues and define mitigations. Understand and explain the affect of product architecture decisions on distributed systems. Professional curiosity and a desire to a develop deep understanding of services and technologies.

Requirements

  • Minimum 4 years of hands-on Platform Engineering, DevOps or SRE experience
  • Technical role with a history of embracing automated processes, cloud native application design principles and a CI/CD DevOps model.
  • Experience with production operations and best practices for deploying quality code in production and troubleshooting issues when they arise.
  • Experience with public cloud (OCI, AWS, GCP, Azure).
  • Knowledge of Infrastructure as Code (IaaC), Configuration as Code (CaC), GitOps and tools such as Terraform, Argo CD, Flux, etc.
  • Experience and working knowledge in languages like Java / Python.
  • Experience deploying, configuring, managing and debugging cloud infrastructure and platform software such as OpenStack, Kubernetes, etc.
  • Experience with public cloud managed Kubernetes (such as OCI/OKE, AWS/EKS, GCP/GKE, Azure/AKS).
  • Experience with cloud-native administration and monitoring/alerting technologies such as Docker, Helm, Prometheus, Grafana, EFK/ELK, Jaeger, or similar technologies.
  • Experience designing and implementing CI/CD pipelines, platforms and components such as Jenkins, Argo CD.
  • Knowledge of version control using Git.
  • Experience in Linux/Unix environment
  • Strong trouble shooting capabilities targeting complicated problems in remote systems
  • Excellent team skills, can-do attitude, focus on quality.
  • BS or MS in Computer Science, Computer Engineering, or equivalent

Nice To Haves

  • Experience with application frameworks such as Spring, Helidon, Micronaut, etc. is a plus.

Responsibilities

  • Work with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas.
  • Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services.
  • Responsible for the design and delivery of the mission critical stack, with focus on security, resiliency, scale, and performance.
  • Authority for end-to-end performance and operability.
  • Partner with development teams in defining and implementing improvements in service architecture.
  • Articulate technical characteristics of services and technology areas and guide Development Teams to engineer and add premier capabilities to the Oracle Cloud service portfolio.
  • Understand and communicate the scale, capacity, security, performance attributes, and requirements of the service and technology stack.
  • Demonstrate clear understanding of automation and orchestration principles.
  • Act as ultimate escalation point for complex or critical issues that have not yet been documented as Standard Operating Procedures (SOPs).
  • Utilize a deep understanding of service topology and their dependencies required to troubleshoot issues and define mitigations.
  • Understand and explain the affect of product architecture decisions on distributed systems.
  • Professional curiosity and a desire to a develop deep understanding of services and technologies.

Benefits

  • Medical, dental, and vision insurance, including expert medical opinion
  • Short term disability and long term disability
  • Life insurance and AD&D
  • Supplemental life insurance (Employee/Spouse/Child)
  • Health care and dependent care Flexible Spending Accounts
  • Pre-tax commuter and parking benefits
  • 401(k) Savings and Investment Plan with company match
  • Paid time off: Flexible Vacation is provided to all eligible employees assigned to a salaried (non-overtime eligible) position. Accrued Vacation is provided to all other employees eligible for vacation benefits. For employees working at least 35 hours per week, the vacation accrual rate is 13 days annually for the first three years of employment and 18 days annually for subsequent years of employment. Vacation accrual is prorated for employees working between 20 and 34 hours per week. Employees working fewer than 20 hours per week are not eligible for vacation.
  • 11 paid holidays
  • Paid sick leave: 72 hours of paid sick leave upon date of hire. Refreshes each calendar year. Unused balance will carry over each year up to a maximum cap of 112 hours.
  • Paid parental leave
  • Adoption assistance
  • Employee Stock Purchase Plan
  • Financial planning and group legal
  • Voluntary benefits including auto, homeowner and pet insurance
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service