Infrastructure Engineer, Observe by Snowflake

Snowflake•Menlo Park, CA

99d

About The Position

Snowflake is about empowering enterprises to achieve their full potential — and people too. With a culture that’s all in on impact, innovation, and collaboration, Snowflake is the sweet spot for building big, moving fast, and taking technology — and careers — to the next level. Observe by Snowflake is an AI-powered observability platform built on the Snowflake Data Cloud and engineered for scale. We ingest and store logs, metrics, traces, and events on an open, scalable data lake using open formats like Apache Iceberg, delivering deep correlation and long-term analytics at dramatically lower cost. A dynamic Knowledge Graph and chat-based AI SRE provide rich context and guided workflows so teams can move from detection to root cause and resolution significantly faster. The Infrastructure team at Observe by Snowflake is responsible for building, scaling, and operating the development and production environments that power our observability platform. We are a small, highly collaborative team with a broad scope, focused on delivering reliable infrastructure while continuously improving the systems that support our engineers and customers.

Requirements

2+ years of experience in Infrastructure Engineering, Site Reliability Engineering (SRE), DevOps, or related roles.
Experience operating container orchestration platforms such as Kubernetes
Hands-on experience managing cloud infrastructure using Infrastructure-as-Code tools such as Terraform, Ansible, or similar.
Strong programming skills in Go, Python, or similar languages, with a focus on automation and systems development.
Experience supporting production systems at scale, with a focus on reliability and operational excellence.
Strong problem-solving skills and the ability to balance short-term operational needs with long-term infrastructure design.
Experience with AWS, GCP, and Azure

Nice To Haves

Experience operating large-scale distributed systems.
Familiarity with observability platforms, telemetry pipelines, or monitoring infrastructure.
Experience improving developer platform tooling or internal infrastructure platforms.
Experience working in high-growth or rapidly evolving engineering environments.

Responsibilities

Design, build, and operate scalable cloud infrastructure in AWS supporting a high-scale observability platform.
Improve system reliability, performance, and operational visibility across development and production environments.
Develop and maintain CI/CD pipelines and internal tooling to improve developer productivity and deployment safety.
Identify and mitigate security risks, and help maintain internal security standards and compliance requirements.
Build infrastructure that supports high availability, scalability, and operational resilience
Participate in an on-call rotation, contributing to incident response and post-incident improvements.
Partner closely with engineering teams to ensure infrastructure supports evolving product and platform needs.