Site Reliability Engineer- Application Development(Kubernetes/Linux)

Motive

3d•Remote

About The Position

The Managed Services SRE is responsible for deploying, operating, and maintaining customer applications across Linux bare metal servers and Red Hat OpenShift (OCP) containerized platforms. This role focuses on application deployment, release management, reliability, and operational support in a live production environment. The SRE will participate in on-call rotations, night-time deployments, and support, ensuring systems meet SLA requirements while continuously improving reliability and automation practices.

Requirements

Minimum 5 years Linux system administration experience
Minimum 5 years Kubernetes (K8s) experience
Exposure to RedHat OpenShift
Experience with application servers such as JBoss or WebLogic
Experience with monitoring tools: Zabbix, Prometheus, Grafana
Experience with logging pipelines: Elasticsearch, Logstash, Kibana (ELK)
Exposure to web servers – Apache, Nginx
Experience with Ansible
Basic networking skills
Basic SQL skills
Strong troubleshooting skills and ability to operate in a live production environment
Willingness to: Carry pager for on-call rotations (typically a week at a time)
Support night-time deployments
Work off-hours, including weekends and holidays in emergencies
Learn on the fly and develop new skills
Strong problem-solving and troubleshooting skills
Excellent communication and teamwork abilities
Self-driven, proactive, and willing to take ownership
Ability to operate effectively in a fast-paced, SLA-driven environment

Responsibilities

Deploy, manage, and maintain applications on Linux bare metal servers and OpenShift/Kubernetes clusters
Execute CI/CD pipelines and ensure reliable, repeatable releases across hybrid environments
Build and maintain observability for deployed applications using Prometheus, Grafana, Zabbix
Implement and maintain centralized logging solutions using Grafana Loki, OpenSearch/Elasticsearch, Fluentd/Fluent Bit
Develop automation scripts to streamline deployments and reduce operational toil (Bash, Python, JavaScript)
Participate in incident response and troubleshoot application or platform issues in a live production environment
Support night-time deployments and carry pager on rotation; respond to emergencies, including weekends and holidays
Collaborate with internal teams to continuously improve deployment reliability and efficiency
Learn new technologies, take direction, and develop skills as needed

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume