NOC Engineer

Hard Rock Digital

18h•Hybrid

About The Position

We are seeking a motivated and technically skilled Network Operations Center (NOC) Engineer to play a crucial part in maintaining the reliability and performance of our application and infrastructure while gaining valuable experience in the complex, highly transactional and always-on gaming industry. If you are a problem-solver with a strong desire to learn and grow in a dynamic environment, we encourage you to apply. This position will foster collaborative teamwork while ensuring seamless operations at all hours.

Requirements

2-4 years of experience in a NOC or similar role.
Experience with: incident response processes Jira (or related ticket-tracking system)
Proficient in MS Office, modern communication tools for virtual teams (i.e. Slack, MS Team, Zoom)
Familiarity with: networking protocols and concepts. Linux/Unix operating system Live site KPIs (MTTR, MTTD, Availability, Incident Severities)
Strong: communication and teamwork abilities. problem-solving and troubleshooting skills. attention to detail, documentation ability to multi-task

Nice To Haves

Bachelor’s degree in computer science, Information Technology, or a related field
Understanding of DevOps and continuous integration/continuous deployment (CI/CD) principles.
Solid ability to analyze datasets, detect trends, and provide actionable insights.
Technical experience in/with: observability tools (Prometheus, Grafana, Robust and related) automation tooling and scripting (e.g., Python, Bash, Ansible). containerization technologies (e.g., Docker, Kubernetes). cloud computing platforms (e.g., AWS, Azure, GCP). GIT (version control and collaborative software development)
Certification in relevant areas (e.g., CCNA, AWS Certified DevOps Engineer)

Responsibilities

Monitor application health, network, infrastructure and related systems for performance, availability, and security anomalies.
Respond to alerts and incidents promptly, facilitating call bridges, diagnosing and resolving issues to minimize downtime.
Collaborate with SREs and DevOps to implement automation, tooling, and monitoring solutions.
Assist in capacity planning and optimization efforts.
Document operational procedures and contribute to knowledge sharing within the team.
Continuously improve incident response processes and contribute to post-incident reviews.