Site Reliability Engineer I

NCR Atleos•Atlanta, GA

1d•Hybrid

About The Position

We are seeking a Site Reliability Engineer (SRE) to join our team, with an initial focus on supporting production operations (AppOps). This is a great opportunity for recent graduates or early-career professionals who are eager to grow in a fast-paced Cloud/SaaS environment. As part of the SRE team, you’ll work alongside experienced engineers to help maintain and improve the reliability, scalability, and performance of our cloud-based services. You’ll gain hands-on experience with automation, monitoring, and incident response, while learning best practices in modern infrastructure and DevOps.

Requirements

Bachelor’s or Master’s degree in Computer Science, Software Engineering, Information Technology, or a related technical field.
Basic understanding of cloud platforms such as Azure, AWS, or GCP, with a strong interest in learning more.
Exposure to programming or scripting languages like Python, Bash, PowerShell, JavaScript, or Java.
Familiarity with CI/CD tools such as Azure DevOps, GitHub Actions, or Jenkins.
Introductory knowledge of container technologies like Docker and Kubernetes.
Comfortable working with Linux and Windows systems; basic shell scripting experience is a plus.
Understanding of networking fundamentals, TLS/SSL, firewalls, and load balancers.
Exposure to monitoring and logging tools such as Prometheus, ELK stack, or Azure Monitor.
Awareness of infrastructure automation tools like Terraform, Ansible, or Helm.
Strong analytical and troubleshooting skills with a willingness to learn root cause analysis.
Ability to work collaboratively in cross-functional teams and communicate technical ideas clearly.
Eagerness to learn new technologies and grow into a reliable engineer in cloud and SaaS operations.

Nice To Haves

Any hands-on experience with distributed systems (e.g., Kafka, Elasticsearch, Cassandra) is a plus.
Cloud certifications (e.g., Azure Fundamentals, AWS Cloud Practitioner) are a bonus but not required.

Responsibilities

Assist in supporting and scaling production services and servers that power cloud-based applications, under the guidance of senior engineers.
Collaborate across development, quality, security, and operations teams to support reliable service delivery.
Help monitor and analyze SaaS services to improve scalability, reliability, and performance.
Contribute to automation tasks for provisioning and managing infrastructure, with opportunities to learn scripting and infrastructure-as-code tools.
Develop foundational skills in software engineering practices focused on reliability and scalability.
Participate in continuous improvement initiatives for software delivery processes within cross-functional teams.
Support configuration, monitoring, and management of systems used by product development teams.
Learn and assist in disaster recovery planning and execution.
Help with patching and maintenance of Windows and Linux servers in private data centers and cloud environments (e.g., Azure).
Collaborate with DevOps teams to promote code using CI/CD pipelines and integrate application security tooling.
Work with senior engineers to define and implement Service Level Indicators (SLIs), Objectives (SLOs), and Agreements (SLAs).
Assist in implementing monitoring alerts, building dashboards, and understanding escalation paths.
Participate in incident response activities, including Post-Incident Reviews (PIRs) and Root Cause Analyses (RCAs), with mentorship.
Join on-call rotations with support and supervision, assisting during off-hours as needed.