About The Position

Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. At Goldman Sachs, SRE is responsible for the availability and reliability of our firm's most critical platform services, and ensures they meet the requirements of our internal and external users. We look for engineers who are motivated to collaborate with our businesses to build and run sustainable production systems, which can evolve and adapt to changes in our fast-paced, global business environment. As a SRE Logging Engineer, you will work with customers, product owners, and SREs to design and develop a large-scale application to process, store and read large volume of log events. You will run a production environment spanning AWS, GCP and on-prem datacentres.

Requirements

  • 3+ years of relevant work experience
  • Proficiency in one or more of the following: Java, Go, Python, JavaScript
  • Excellent programming skills - developing, debugging, testing and optimizing code
  • Experience with algorithms, data structures and software design
  • Experience with distributed systems design, maintenance, and troubleshooting

Nice To Haves

  • Experience with logging solution like Datadog, AWS Cloudwatch, Splunk or Elasticsearch
  • Experience with running workloads in Kubernetes
  • Systems experience in UNIX/Linux and networking, especially in scaling for performance and debugging complex distributed systems
  • Knowledge of cloud native solutions in AWS or GCP
  • Basic understanding of SRE concepts like observability, SLO/SLI, metrics

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Industry

Securities, Commodity Contracts, and Other Financial Investments and Related Activities

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service