About The Position

This position requires someone to be in an office work setting in Columbus, OH. This role combines deep expertise in cloud technologies with a strong focus on reliability, scalability, and automation, ensuring that digital services are robust, efficient, and aligned with business objectives. The engineer will work cross-functionally with development, operations, and security teams to implement best practices and drive innovation in cloud infrastructure. Your future duties and responsibilities: . Cloud Adoption Strategy: Collaborate with stakeholders to develop and execute strategies for adopting GCP services, including migration planning, architecture design, and implementation. . Reliability Engineering: Apply SRE principles to GCP environments, focusing on service reliability, availability, and scalability. Develop monitoring, alerting, and automation solutions to prevent outages and reduce manual intervention. . Cloud Infrastructure Management: Build, maintain, and optimize cloud infrastructure using Infrastructure as Code (IaC) tools such as Terraform or Deployment Manager. . Automation & CI/CD: Design and implement automated deployment pipelines and operational workflows to enable continuous integration and delivery of cloud-based applications. . Incident Management: Lead incident response for cloud-related issues, conduct root cause analysis, and implement corrective actions to improve system reliability. . Performance Optimization: Monitor system performance and proactively identify areas for improvement in cost, efficiency, and reliability. . Security & Compliance: Ensure cloud environments adhere to security best practices and compliance requirements. Collaborate with security teams to implement controls and monitor risk. . Documentation & Knowledge Sharing: Create and maintain technical documentation. Mentor and train team members on GCP adoption and SRE practices.

Requirements

  • Bachelor's degree in computer science, Engineering, or a related field.
  • 3+ years of experience in cloud engineering, site reliability engineering, or DevOps, with hands-on expertise in GCP.
  • 3+ years' experience in Infrastructure as Code (IaC) tools (e.g., Terraform, Deployment Manager).
  • 3+ years' experience with monitoring, logging, and alerting tools (e.g., Prometheus, Grafana).
  • 3+ years' experience designing and implementing CI/CD pipelines and automation workflows.
  • 3+ years' experience with troubleshooting and problem-solving skills, especially in distributed systems and cloud environments.
  • 3+ years' experience working with SRE principles, including error budgets, SLIs/SLOs, and incident management.
  • 3+ years' experience with cloud security best practices and regulatory compliance requirements.

Nice To Haves

  • Strong communication and collaboration skills, with the ability to work effectively in cross-functional teams
  • Ability to work independently and multitask within a collaborative work environment
  • Willingness and aptitude for continuous improvement
  • Do the right thing attitude while being a strong team player
  • Strong communication and collaboration skills, focus on customer service
  • GCP Professional certifications
  • Experience migrating workloads from on-premises or other cloud platforms to GCP.
  • Familiarity with Kubernetes, Docker, and container orchestration in GCP.
  • Experience with agile methodologies and project management tools

Responsibilities

  • Cloud Adoption Strategy: Collaborate with stakeholders to develop and execute strategies for adopting GCP services, including migration planning, architecture design, and implementation.
  • Reliability Engineering: Apply SRE principles to GCP environments, focusing on service reliability, availability, and scalability. Develop monitoring, alerting, and automation solutions to prevent outages and reduce manual intervention.
  • Cloud Infrastructure Management: Build, maintain, and optimize cloud infrastructure using Infrastructure as Code (IaC) tools such as Terraform or Deployment Manager.
  • Automation & CI/CD: Design and implement automated deployment pipelines and operational workflows to enable continuous integration and delivery of cloud-based applications.
  • Incident Management: Lead incident response for cloud-related issues, conduct root cause analysis, and implement corrective actions to improve system reliability.
  • Performance Optimization: Monitor system performance and proactively identify areas for improvement in cost, efficiency, and reliability.
  • Security & Compliance: Ensure cloud environments adhere to security best practices and compliance requirements. Collaborate with security teams to implement controls and monitor risk.
  • Documentation & Knowledge Sharing: Create and maintain technical documentation. Mentor and train team members on GCP adoption and SRE practices.

Benefits

  • Competitive compensation
  • Comprehensive insurance options
  • Matching contributions through the 401(k) plan and the share purchase plan
  • Paid time off for vacation, holidays, and sick time
  • Paid parental leave
  • Learning opportunities and tuition assistance
  • Wellness and Well-being programs
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service