Site Reliability Engineer

ThalesAustin, TX
1dHybrid

About The Position

We are seeking a Site Reliability Engineer to ensure the high level of service and operation excellence for the development of the innovative and ambitious Telecommunication solution (high availability, strong performance constraints) deployed in the public cloud. This product requires the establishment of a product specific SRE team.

Requirements

  • Education: Engineer or equivalent
  • Experience: at least 1 year experience
  • Skills and Abilities: Java development skill is required.
  • You are familiar with Public Cloud (GCP, AWS), containers and microservices (Docker, Kubernetes, Java), CI/CD and automation (Jenkins, Gitlab, Helm)
  • NoSQL database.
  • Certification GCP cloud architect certification is a plus
  • Must have U.S. or Dual Citizenship and be able to obtain post-hire clearance from the Committee on Foreign Investments in the U.S. (CFIUS) and Department of Treasury

Nice To Haves

  • You have already set up product monitoring and the underlying infrastructure
  • You have development experience in a distributed systems and/or high availability context
  • You are familiar with microservices development
  • You participated in the definition of architectures, data structures, algorithms with performance, security, reliability constraints, etc.
  • Public cloud architect certification
  • You are interested in aspects of Site Reliability Engineer: CI/CD, automation, monitoring and observability, and continuous improvement.
  • You are an accomplished, versatile and multi-tasking developer engineer.

Responsibilities

  • Automation & Infrastructure as Code: Design, build, and maintain scalable infrastructure using tools such as Terraform, Ansible, and Kubernetes. Develop automated CI/CD pipelines via GitLab to reduce manual toil.
  • Availability & Reliability Engineering: Define and monitor Service Level Objectives (SLOs) and Service Level Indicators (SLIs). Manage "Error Budgets" to balance the velocity of new features with the stability of the platform.
  • Incident Management & On-Call Support: Participate in 24/7 on-call rotations to provide emergency response and perform deep-dive troubleshooting for production issues.
  • Performance & Capacity Planning: Conduct system performance analysis, identify bottlenecks, and perform capacity planning to ensure the infrastructure can handle growth and peak loads.
  • Observability & Monitoring: Implement and refine symptom-based alerting and comprehensive monitoring strategies using platforms like Datadog to ensure high visibility into system health.
  • Continuous Improvement & Postmortems: Lead blameless postmortems after incidents to identify root causes and implement long-term technical fixes to prevent recurrence.
  • Security & Compliance Collaboration: Partner with Cloud Security teams to implement security best practices, manage access controls, and respond to security breaches or vulnerabilities.
  • Support customer relationship Interface with other stakeholders to define solution improvement plan
  • You will have the ownership of solution service availability.

Benefits

  • Thales provides an extensive benefits program for all full-time employees working 30 or more hours per week and their eligible dependents, including the following:
  • Elective Health, Dental, Vision, FSA/HSA, Voluntary Life and AD&D, Whole Group Life w/LTC, Critical Illness, Hospital Indemnity, Accident Insurance, Legal Plan, Identity Theft, and Pet Insurance
  • Retirement Savings Plan after 30 days of employment with a company contribution and a match, and with no vesting period
  • Company paid holidays and Paid Time Off
  • Company provided Life Insurance, AD&D, Disability, Employee Assistance Plan, and Well-being Program

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

No Education Listed

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service