Infrastructure Engineer

SchoolAILehi, UT
54d

About The Position

About the role Architect and implement resilient, scalable system designs across our cloud infrastructure (primarily Google Cloud / GCP) Design and evolve our infrastructure architecture with a focus on scalability, observability, and performance optimization Establish architectural patterns and guardrails that promote reliability while enabling rapid development Develop well-architected internal tooling and abstractions to streamline deployment, monitoring, and debugging Lead incident response with an architectural mindset—conducting thorough root cause analysis and architecting systemic improvements Partner with product, ML, and engineering teams to deeply understand their requirements and design appropriate infrastructure solutions Champion infrastructure-as-code, automated testing, observability, and continuous delivery through thoughtful architecture decisions Own and architect key components of our CI/CD pipelines and platform engineering efforts We'd love to hear from you if you: Have strong architectural thinking—demonstrated through designing, implementing, and documenting resilient, scalable systems in production Have 4+ years of experience in site reliability engineering with focus on systems architecture Are experienced with cloud platforms (particularly GCP) and container orchestration (Kubernetes) Are proficient with infrastructure-as-code (Terraform), Git/GitHub, and CI/CD pipelines (especially GitHub Actions) Have experience with containerization (Docker) and PR management tools (Graphite) Are familiar with NPM, NodeJS, JavaScript, and TypeScript environments Excel at monitoring and observability implementation (particularly Datadog) as part of system architecture Have knowledge of networking concepts and security best practices Possess experience with or interest in database management and optimization Have led or participated in large-scale cloud migration projects Can articulate technical trade-offs and architectural vision effectively across teams Demonstrate problem-solving skills for complex infrastructure issues Are passionate about automation—designing comprehensive solutions that reduce toil through elegant code

Requirements

  • Have strong architectural thinking—demonstrated through designing, implementing, and documenting resilient, scalable systems in production
  • Have 4+ years of experience in site reliability engineering with focus on systems architecture
  • Are experienced with cloud platforms (particularly GCP) and container orchestration (Kubernetes)
  • Are proficient with infrastructure-as-code (Terraform), Git/GitHub, and CI/CD pipelines (especially GitHub Actions)
  • Have experience with containerization (Docker) and PR management tools (Graphite)
  • Are familiar with NPM, NodeJS, JavaScript, and TypeScript environments
  • Excel at monitoring and observability implementation (particularly Datadog) as part of system architecture
  • Have knowledge of networking concepts and security best practices
  • Can articulate technical trade-offs and architectural vision effectively across teams
  • Demonstrate problem-solving skills for complex infrastructure issues
  • Are passionate about automation—designing comprehensive solutions that reduce toil through elegant code

Nice To Haves

  • Possess experience with or interest in database management and optimization
  • Have led or participated in large-scale cloud migration projects

Responsibilities

  • Architect and implement resilient, scalable system designs across our cloud infrastructure (primarily Google Cloud / GCP)
  • Design and evolve our infrastructure architecture with a focus on scalability, observability, and performance optimization
  • Establish architectural patterns and guardrails that promote reliability while enabling rapid development
  • Develop well-architected internal tooling and abstractions to streamline deployment, monitoring, and debugging
  • Lead incident response with an architectural mindset—conducting thorough root cause analysis and architecting systemic improvements
  • Partner with product, ML, and engineering teams to deeply understand their requirements and design appropriate infrastructure solutions
  • Champion infrastructure-as-code, automated testing, observability, and continuous delivery through thoughtful architecture decisions
  • Own and architect key components of our CI/CD pipelines and platform engineering efforts
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service