Joining Collibra’s Cloud Operations Team Collibra’s Cloud Operations Team is responsible for operating each of Collibra’s Production Cloud environments and maintaining customer instances. As a CloudOps Engineer, you will monitor and respond to critical alerts to production applications in a multi-cloud environment - AWS and GCP. Perform initial triage and investigate issues or incidents reported by the global Customer Support Team or other stakeholders; escalate to appropriate teams internally. Execute change requests following well defined runbooks. Evaluate and classify alerts and events and create runbooks when needed. Identify gaps in the monitoring of customer environments and work cross functionally to ensure proactive measures are taken to restore service. Conduct deployments on the weekends. Work closely with the Customer Support Team to streamline processes and help customers with their requests as quickly as possible. Identify opportunities to create self-service solutions. The candidate will report to the Cloud Operations manager. The working hours for this role will be Wednesday through Sunday from 5AM ET to 1pm ET. Cloud Operations Engineers at Collibra are responsible for Ensuring service uptime by monitoring alerts and events and performing restoration of service processes. Monitoring for potential resource threshold breaches and proactively resolving imminent failures. Working with other teams across the organization including Security, Development, QA, and SRE teams to balance requirements and meet common goals and improve monitoring and observability, primarily for customer facing production environments. Monitoring automated software rollouts and responding to any issues that may arise. Engaging a customer notification process when services have been impacted. You have 5+ years of experience in Cloud Engineering, in one or more of AWS, GCP, or Azure. Or a bachelor’s degree or equivalent experience in Computer Science or Information Technologies. Analytical and methodical problem solving and organizational skills. 5+ years of experience in the Linux OS environment. 5+ years of experience with an Apple Mac laptop. Working knowledge of Bash. Python is a plus. Deploying and administering infrastructure with Terraform. Managing Kubernetes cluster orchestration. Knowledge of monitoring and observability tools such as Grafana, Kibana, ElasticSearch, DataDog, etc. Experience with Jira, Github, Slack, and Confluence. Required to be on camera frequently during Zoom calls. Willingness to work on an on-call rotation with compensation and recovery through a flexible work schedule. A bachelor’s degree or equivalent related working experience is required. This position is not eligible for visa sponsorship. Because this role supports the US government, it is required that this candidate be a US Citizen who resides on US soil. You are Agile-minded, optimistic, passionate, and pragmatic about delivering valuable software to customers. Interested in broadening your skills into technologies you haven’t seen before. Someone who puts quality and the customer experience first. Work productively with a geographically distributed remote team. Team player, focused on collaboration and doing the right thing. Accustomed to a fast paced environment. Reporting to Collibra’s Manager of Cloud Operations, a Cloud Operations Engineer measures of success are Within your first month, you will absorb fundamental knowledge about Collibra processes and tools, and start building team and cross-functional relationships. Within your third month, you will take ownership of escalations, collaborate on and achieve quarterly OKRs (Objectives and Key Results). Within your sixth month, you will drive approaches to proactively resolve customer impacting issues, inform customer success of impending customer facing problems, help generate operational playbooks, and work through process improvements for monitoring and alerting.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Number of Employees
1,001-5,000 employees