About Samsung Austin Semiconductor Samsung is a world leader in advanced semiconductor technology, founded on the belief that the pursuit of excellence creates a better world. At SAS, we are Innovating Today to Power the Devices of Tomorrow. Come innovate with us! Position Summary The Engineering System Reliability (ESR) engineer is a key member of the Site Reliability Engineering (SRE) organization, responsible for the 24 × 7 operational health of the EES family of engineering systems and related platforms. In this role you will provide continuous monitoring, rapid incident response, and root‑cause analysis for high‑availability services, while working closely with SRE, Developer, and SIOC teams to drive automation, capacity planning, and seamless system migrations. Your primary focus will be on maintaining and evolving monitoring frameworks (e.g., Ontune, UIM, Splunk, Prometheus, Grafana) and developing CI/CD pipelines that enable reliable, repeatable deployments across environments. Additionally, duties will include end of life architecture planning and other long term system stability tasks, covering any necessary action for high availability of critical MES fab operation systems. You will also design and maintain backend tooling and scripts that automate health‑sensor data collection, performance dashboards, and database operations for Oracle and SQL Server. Leveraging strong troubleshooting expertise, you will lead post‑mortem activities, implement corrective counter‑measures, and ensure that all technical documentation is clear, comprehensive, and written in fluent English. The role demands a collaborative mindset—balancing independent problem‑solving with teamwork—to support Samsung’s manufacturing ecosystem around the clock.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Number of Employees
251-500 employees