About The Position

Principal/Consultant Site Reliability Engineer, this is an advanced professional level role for an SRE. Individuals may be responsible for one or more complex reliability and toil reduction projects. At this level, SREs operate as a subject matter expert in the discipline and will provide guidance to others including product and development teams to define and improve reliability within a product group. A Consulting/Principle SRE has a deeper understanding of system and application code and will make data-driven recommendations which balance customer, development, and operational needs. They are champions for shared services, platforms, and architectural standards. Individuals in this role train and/or mentor junior staff.

Requirements

  • Deep knowledge of:
  • Cloud services (e.g., EC2, S3, RDS, Lambda, Azure VMs, Azure Functions).Cloud Engineering with a strong focus on Azure and/or AWS
  • Infrastructure as Code (Terraform, ARM/BICEP).
  • Containerization and orchestration tools (Docker, Kubernetes/EKS).
  • Skilled in scripting languages (Python, Bash, TypeScript, PowerShell).
  • Linux/UNIX/Windows systems and storage.
  • Monitoring tools (Datadog, Coralogix, CloudWatch, Azure Monitor).
  • SRE and DevOps practices.
  • Networking and security best practices.
  • Excellent problem-solving and stakeholder management skills.

Nice To Haves

  • Databricks Knowledge is an added advantage.

Responsibilities

  • Leading Kubernetes deployment and management, including orchestration, architecture, networking, CI/CD, storage, and security.
  • Collaborating with cross-functional teams to design and implement high-quality cloud solutions.
  • Administering and supporting Databricks environments, including permissions, storage, and networking.
  • Troubleshooting complex technical issues using observability tools and root-cause analysis.
  • Implementing infrastructure management best practices and automating repetitive tasks.
  • Supporting program installations, system configurations, and user modifications.
  • Refining system monitoring and reporting in collaboration with support teams.
  • Operating across Agile and Waterfall methodologies to deliver timely solutions.
  • Mentor junior team members and contribute to a culture of continuous learning.

Benefits

  • Comprehensive, multi-carrier health plan benefits
  • Disability insurance
  • Dependent care and commuter spending accounts
  • Life and accident insurance
  • Retirement benefits (salary investment plan/employer stock purchase plan)
  • Modern family benefits, including adoption and surrogacy
  • Health Benefits: Comprehensive, multi-carrier program for medical, dental and vision benefits
  • Retirement Benefits: 401(k) with match and an Employee Share Purchase Plan
  • Wellbeing: Wellness platform with incentives, Headspace app subscription, Employee Assistance and Time-off Programs
  • Short-and-Long Term Disability, Life and Accidental Death Insurance, Critical Illness, and Hospital Indemnity
  • Family Benefits, including bonding and family care leaves, adoption and surrogacy benefits
  • Health Savings, Health Care, Dependent Care and Commuter Spending Accounts
  • In addition to annual Paid Time Off, we offer up to two days of paid leave each to participate in Employee Resource Groups and to volunteer with your charity of choice

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Principal

Education Level

No Education Listed

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service