IT - Infrastructure Systems - DevOps Engineer II (Remote in CA only)

Golden 1 Talent Acquisition Team•Sacramento, CA

51d•$123,600 - $135,000•Remote

About The Position

The DevOps Engineer 2 is responsible for leading the automation processes for deploying Infrastructure as Code in both Microsoft Azure and On-Premises environments. The engineer will deploy product updates, identify production issues, and implement integrations that meet our customers’ needs. The ideal candidate will have a solid background in DevOps and Site Reliability Engineering, with significant experience in Terraform, Python, and PowerShell. The engineer will lead the infrastructure-as-code process, manage Linux/Kubernetes cluster environments, and support development teams on API integration strategies. The engineer will design, implement, and optimize CI/CD pipelines for faster and more reliable software releases. Additionally, the engineer will monitor systems, create alerts, and ensure application uptime and performance. Responsibilities also include provisioning and setting up metrics, creating alerts and managing alert suppression, and proposing automation solutions to reduce workload. This role is responsible for implementing and operating cloud platform services and standards defined by Cloud Engineering, with a focus on reliability, security, and scalability.

Requirements

Bachelor of science degree (or equivalent) in computer science, engineering, or relevant field.
Over 4 years as a DevOps Engineer in medium to large-scale environments.
Proficient in Windows Server, Linux, and hybrid cloud deployments using Microsoft Azure and VMWare.
Skilled in Git/GitHub workflows, Terraform, Python, PowerShell, and container orchestration (Tanzu, Docker, Kubernetes, OpenShift).
Experienced with CI/CD tools (Jenkins, GitLab CI, Azure DevOps) and observability platforms (Datadog, Prometheus, Grafana, ThousandEyes).
Knowledgeable in log management (ELK Stack) and database technologies (PostgreSQL, MySQL, NoSQL).
Strong background in automating infrastructure provisioning and application deployment using Terraform, Ansible, and Kubernetes.
Proficient in creating and maintaining monitoring dashboards, SLIs, SLOs, and error budgets to ensure application uptime and performance.
Experienced in ensuring infrastructure security, driving automation initiatives, and collaborating across teams to improve reliability and scalability.
Experienced in building observability pipelines and performing advanced queries in log management tools like Splunk for troubleshooting.
Experience implementing and operating Azure-based shared services defined by platform or cloud engineering teams

Nice To Haves

Linux Certification (Desired)
Microsoft Azure DevOps Engineer Expert Certification (Required)
Kubernetes Administration Certification (Required)

Responsibilities

Independently lead infrastructure-as-code development using Terraform and scripting languages such as Python and PowerShell to support scalable and reliable deployments.
Manage Linux/Kubernetes cluster environments.
Deploy solutions in accordance with Change Management Processes.
Support development teams on API integration strategy and standards development.
Ensure systems are secure against cybersecurity threats.
Identify technical problems and develop software updates and fixes.
Strong Splunk skills for administration, query optimization, alerting, and dashboard development.
Build tools to reduce errors and improve customer experience.
Propose ideas and solutions within the Infrastructure Department to reduce workload through automation.
Design, implement, and optimize CI/CD pipelines for faster and more reliable software releases.
Independently conduct root cause analysis and implement corrective actions.
Design and write tests to investigate infrastructure failure and scaling.
Create and maintain response playbooks across incident management and monitoring tools.
Develop automation to ensure repeatability, eliminate toil, and reduce time to action and repair services.
Analyze key operational metrics to identify opportunities to improve availability.
Implement effective monitoring, alerting, and reduction of alert fatigue.
Manage container orchestration environments and optimize deployment workflows to enhance scalability, reliability, and operational efficiency.
Design, build, and manage containerized environments using Docker.
Create and maintain SLIs, SLOs, and error budgets.
Design and optimize monitoring dashboards and alerting systems to proactively detect and address application performance and uptime issues.
Implement code branching strategies using GitHub functions.
Advanced Terraform syntax and GitLab CI/CD configuration, pipelines, jobs.
Provisioning and setting up metrics in Prometheus, Thanos, and Grafana, creating and managing alerts.
Implement cloud engineering standards, reusable modules, and platform patterns in Microsoft Azure
Operate shared cloud platform services according to Cloud Engineering defined architectures
Ensure infrastructure changes comply with reliability, security, and cost controls established by Cloud Engineering
Maintain operational documentation and runbooks for cloud platform services