About the position
The job overview for the Data Platform Engineer II position is to improve the reliability of Vimeo's data platforms and pipelines, which serve billions of events and terabytes of data daily. The role involves collaborating with different data engineering teams on incident management, post-mortem analysis, and preventing recurring incidents. The engineer will also be responsible for designing business continuity and disaster recovery plans, implementing change and release management processes, and building intelligent monitoring systems for early anomaly detection. Additionally, the engineer will work closely with software developers to build automated testing frameworks and participate in an on-call rotation.
Responsibilities
- Collaborate with engineering teams to improve, maintain, performance tune, and capacity plan for Vimeo's data platforms and infrastructure.
- Design business continuity and disaster recovery plans and processes and work with the engineering team in implementation.
- Drive the incident management process for the data platform, including performing incident post-mortems, root cause analysis, and preventing recurring incidents.
- Lead the standard change and release management process, automate and promote related best practices, and help Vimeo meet and maintain legal compliance status.
- Build intelligent monitoring over data pipelines and infrastructure to achieve early and automated anomaly detection.
- Work closely with software developers to build an end-to-end automated testing framework and system-level testing environment.
- Participate in an on-call rotation.
- Own, manage, monitor, and optimize the reliability and overall health of development and production environments.
- Solve problems in a detailed and systematic manner, taking ownership and driving solutions.
- Take action and deliver high-quality data solutions.
- Have experience working on Linux environment and proficiency with cloud environment (AWS, GCP).
- Use container orchestration platforms, particularly Kubernetes, for managing and deploying data processing and analysis applications.
- Code in one or more of the following programming languages: Python, Java (mandatory), or Scala.
- Have hands-on experience in Reliability Engineering for high-performant, scalable, and distributed data systems.
Requirements
- Production experience with distributed data stores (e.g. Hbase, zookeeper, Kafka)
- Ability to own, manage, monitor, and optimize the reliability and overall health of development and production environments
- Strong problem-solving skills and a sense of ownership and drive
- Passion for delivering high-quality data solutions
- 3+ years of experience working on Linux environment and proficiency with cloud environment (AWS, GCP)
- Experience with container orchestration platforms, particularly Kubernetes, for managing and deploying data processing and analysis applications
- Proficiency in coding in one or more of the following programming languages: Python, Java (mandatory), or Scala
- 3+ years of hands-on experience in Reliability Engineering for high-performant, scalable, and distributed data systems
Benefits
- Collaborate with engineering teams to improve, maintain, performance tune, and capacity plan for Vimeo's data platforms and infrastructure.
- Design business continuity and disaster recovery plans and processes.
- Drive the incident management process for the data platform.
- Lead the standard change and release management process.
- Build intelligent monitoring over data pipelines and infrastructure.
- Work closely with software developers to build an end-to-end automated testing framework.
- Participate in an on-call rotation.
- Own, manage, monitor, and optimize the reliability and overall health of development and production environments.
- Work with peer SREs to roll out changes to the production environment and help mitigate data-related production incidents.
- Attention to detail and quality with excellent problem-solving and interpersonal skills.
- Bonus: Some experience in data warehousing and data engineering.