Site Reliability Engineer

San R&D Business Solutions LLCMilton, GA
2dHybrid

About The Position

We are seeking a Junior to Mid-Level Site Reliability Engineer (SRE) to support the reliability, performance, and scalability of end-user facing applications. This role combines software engineering, cloud operations, and system reliability, with a strong focus on automation, monitoring, and production support in a hybrid cloud environment.

Requirements

  • 6+ years of experience across software engineering, systems administration, databases, and networking.
  • Strong experience with automation and orchestration tools (Terraform, Chef, Ansible).
  • Hands-on experience with Docker and Kubernetes.
  • 3+ years supporting cloud-native applications.
  • Experience with cloud security concepts including IAM and authorization.
  • 3+ years supporting end-user facing applications (web/mobile).
  • 2+ years development experience with Java or JavaScript/NodeJS.
  • Exposure to frontend technologies such as Angular, JavaScript, or TypeScript.
  • Strong understanding of system architecture, scalability, performance, and security.
  • Experience with application troubleshooting, performance tuning, and production support.
  • Knowledge of RESTful services, JSON, AVRO.
  • Experience with CI/CD tools (Jenkins, Bamboo) and Agile SDLC.
  • Strong written and verbal communication skills.

Nice To Haves

  • Experience with GCP (BigQuery, Dataflow, Pub/Sub, GCS, Composer/Airflow) or AWS (Redshift, SNS, SQS, S3).
  • Experience in large-scale enterprise or regulated environments.
  • Familiarity with SRE practices such as SLIs, SLOs, and error budgets.
  • Experience with monitoring and observability tools.

Responsibilities

  • Ensure reliability, availability, and performance of end-user facing applications (UI, APIs, backend systems).
  • Automate infrastructure and operational tasks using Terraform, Chef, Ansible, and scripting.
  • Manage and support Linux and Windows systems, including Docker and Kubernetes environments.
  • Support and enhance cloud-native applications, including IAM and authorization controls.
  • Troubleshoot application, infrastructure, and performance issues in production.
  • Participate in incident management, root cause analysis, and reliability improvements.
  • Support CI/CD pipelines, release management, and Agile delivery practices.
  • Collaborate with development, QA, and platform teams.
  • Maintain operational documentation and runbooks.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service