Cloud Operations Engineer
Syndigo
·
Posted:
July 25, 2023
·
Onsite
About the position
The job overview for the role of Cloud Operations Engineer at Syndigo is to provide production support, monitoring, and troubleshooting to ensure high uptime for their SaaS platform. This includes managing cloud infrastructure, access, security, compliance, and cost management. The engineer will also be responsible for platform-level support, monitoring, and troubleshooting for both internal and external customers on various technology stacks. Additionally, they will be involved in event and alert management, incident management, problem resolution, and documentation based on industry best practices. The role requires a bachelor's degree in computer science or engineering, at least 4 years of relevant experience, and the ability to work in a 24x7 environment.
Responsibilities
- Provide production support, monitoring, and troubleshooting to ensure 99.9% uptime for the SaaS platform
- Support and administer cloud infrastructure, including access, security, compliance, and cost management
- Work on a 24/7 production environment and on shift schedules
- Provide platform-level support, monitoring, and troubleshooting for internal and external customers on various technology stacks
- Use, customize, and administer monitoring, log management, and APM tools
- Assist with periodic patches, hotfixes, and upgrades for production customers
- Manage events, alerts, incidents, and problem resolution based on service levels
- Handle backup, disaster recovery, and capacity management
- Own and resolve technical and operational issues with root cause analysis
- Drive end-to-end technical resolution of critical incidents
- Report and document based on industry best practices
- Bachelor's degree in Computer Science or Engineering (or equivalent)
- 4+ years of relevant experience in cloud operations and support
- Willingness to work in a 24x7 environment with shift rotations
- Workplace location in Bangalore
Requirements
- Bachelor's degree or equivalent in Computer Science or Engineering
- 4+ years of relevant experience in Cloud Operations & Support
- Willingness to work in a 24x7 environment with shift rotations
- Experience in providing production support, monitoring, and troubleshooting for SaaS platforms
- Knowledge of Cloud Infrastructure support, administration, and escalation management for Access, Security, Compliance & Cost management
- Proficiency in technology stacks such as Linux OS, Kubernetes/Docker Swarm, Elasticsearch, Kafka, Apache Storm, Netty, Nginx, MSSQL
- Familiarity with Monitoring, Log Management, and APM tools like Sensu, Zabbix, Grafana, Prometheus, ELK, Jenkins
- Ability to provide assistance for periodic patches, hotfixes, and upgrades
- Experience in Event & Alert management, Incident Management, and Problem resolution
- Knowledge of Backup and Disaster recovery, Capacity management
- Strong analytical and problem-solving skills
- Excellent communication and collaboration abilities
- Reporting and documentation skills based on industry best practices
Benefits
- Significant opportunity for growth within the company
- Chance to learn about and become part of a growing MDM community
- 24/7 production environment with shift schedules
- Platform level support, monitoring, and troubleshooting for internal and external customers
- Use and administration of monitoring, log management, and APM tools
- Assistance with patches, hotfixes, and upgrades for production customers
- Event and alert management, incident management, and problem resolution
- Technical and operational issue resolution with root cause analysis
- Collaboration and communication with multiple parties for critical incidents
- Reporting and documentation based on industry best practices
- Bachelor's degree or equivalent in Computer Science or Engineering
- 4+ years of relevant experience in Cloud Operations & Support
- Work on public cloud platforms like Azure and AWS
- Experience with Linux OS and open-source systems
- Handling large amounts of data imports/exports and log analysis
- Knowledge of PaaS services for monitoring and troubleshooting
- Regular administration of production and non-production systems
- Troubleshooting software and application systems
- Work in a multi-tenant environment with a focus on customer information security and compliance
- Experience with ticketing tools for customer ticket management
- Diversity, equity, and inclusion are valued and promoted within the organization