Site Reliability Engineer

San R&D Business Solutions LLC•Milton, GA

2d•Hybrid

About The Position

We are seeking a Junior to Mid-Level Site Reliability Engineer (SRE) to support the reliability, performance, and scalability of end-user facing applications. This role combines software engineering, cloud operations, and system reliability, with a strong focus on automation, monitoring, and production support in a hybrid cloud environment.

Requirements

6+ years of experience across software engineering, systems administration, databases, and networking.
Strong experience with automation and orchestration tools (Terraform, Chef, Ansible).
Hands-on experience with Docker and Kubernetes.
3+ years supporting cloud-native applications.
Experience with cloud security concepts including IAM and authorization.
3+ years supporting end-user facing applications (web/mobile).
2+ years development experience with Java or JavaScript/NodeJS.
Exposure to frontend technologies such as Angular, JavaScript, or TypeScript.
Strong understanding of system architecture, scalability, performance, and security.
Experience with application troubleshooting, performance tuning, and production support.
Knowledge of RESTful services, JSON, AVRO.
Experience with CI/CD tools (Jenkins, Bamboo) and Agile SDLC.
Strong written and verbal communication skills.

Nice To Haves

Experience with GCP (BigQuery, Dataflow, Pub/Sub, GCS, Composer/Airflow) or AWS (Redshift, SNS, SQS, S3).
Experience in large-scale enterprise or regulated environments.
Familiarity with SRE practices such as SLIs, SLOs, and error budgets.
Experience with monitoring and observability tools.

Responsibilities

Ensure reliability, availability, and performance of end-user facing applications (UI, APIs, backend systems).
Automate infrastructure and operational tasks using Terraform, Chef, Ansible, and scripting.
Manage and support Linux and Windows systems, including Docker and Kubernetes environments.
Support and enhance cloud-native applications, including IAM and authorization controls.
Troubleshoot application, infrastructure, and performance issues in production.
Participate in incident management, root cause analysis, and reliability improvements.
Support CI/CD pipelines, release management, and Agile delivery practices.
Collaborate with development, QA, and platform teams.
Maintain operational documentation and runbooks.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume