Senior Cloud/Backend Engineer

Triple Whale•New York, NY

5h•Remote

About The Position

Triple Whale is seeking a Senior Cloud Backend Engineer to join their Infrastructure Team. The ideal candidate will enjoy solving complex problems, working in detail, assisting other developers, and taking ownership of systems from beginning to end. This role is crucial for maintaining the reliability of the platform and includes participation in a weekend on-call rotation. The company is looking for individuals passionate about building reliable, scalable systems and who thrive on open-ended challenges, enjoying the design, build, and ownership of complex infrastructure. Triple Whale is a leading intelligence platform for ecommerce, empowering brands to understand and drive growth with confidence. By consolidating data, providing trusted measurement tools, and leveraging advanced AI, Triple Whale transforms fragmented data into clear insights and actionable recommendations. Their AI agents and automations enhance creative asset generation, optimize marketing channels, and improve the effectiveness of various tools. Over 60,000 ecommerce brands rely on Triple Whale to accelerate growth and revenue efficiently, uncovering opportunities and executing them at an unprecedented scale.

Requirements

Located in the New York tri-state area
3+ years of experience as an independent backend or infrastructure engineer
Ability to design and build scalable, reliable systems
Strong communication skills
Hands-on builder mentality — this is a coding role
Experience with relational and non-relational databases
Experience with major Cloud platforms (GCP, AWS, Azure)
Experience with streaming systems
Experience scaling large systems
Experience with message queues
Experience with monitoring systems like DataDog, Grafana, Groundcover
Experience with CI/CD, Git
Previous real-world experience in production on-call environments
Ability to quickly assess incidents, identify scope/root cause, and understand platform impact
Ability to classify severity and prioritize response appropriately
Capability to deploy safe production hotfixes when needed
Solid judgment under pressure - especially when operating independently
Ownership mindset: from detection to mitigation to resolution

Nice To Haves

GCP is an advantage
Kubernetes and Knative (production experience)
ClickHouse

Responsibilities

Deploy and support service infrastructure in Kubernetes
Identify and build the right tools and technologies for major initiatives
Assist other teams and developers in designing robust and scalable systems
Scale and optimize multiple databases
Build internal tooling to accelerate developer velocity
Provide observability, monitoring, and visibility across systems
Participate in a shared on-call rotation (typically 2-3 times per month, Friday 7:00 AM ET through Saturday 5:00 PM ET)
Assess incidents, identify scope/root cause, and understand platform impact
Classify severity and prioritize response appropriately
Deploy safe production hotfixes when needed
Demonstrate solid judgment under pressure, especially when operating independently
Take ownership from detection to mitigation to resolution of issues