We are actively recruiting a staff software engineer to own the security, reliability, and observability of the fastest growing e-commerce startup. You will be reporting directly to our Head of Engineering and work very closely with many members of our engineering team. Your mission will include establishing and maintaining world-class observability, monitoring and alerting systems, building systems that reduce operational toil for the entire engineering team, and conducting security audits, reviews and mitigations across our entire platform. We take reliability and security seriously. Doing so prepared us to scale to $500M in volume in under a year. You will help us scale the next 100x while keeping our systems secure and reliable. About You You have exceptional high agency and you don't let yourself be stuck on problems: you find creative solutions to complex reliability and security challenges so the business never stops running. When systems fail, you build the automation and tooling that helps the entire team respond effectively, not just heroically fix things yourself. You are a "professional hacker" in the best sense - someone who can operate without much guidance, exercise excellent judgment on when to build vs buy vs configure, and see security and reliability as fundamental enablers of business success rather than obstacles to overcome. 8+ years of experience building, securing, and operating complex distributed systems at scale. You've been on-call, you've debugged production incidents, and you've built the monitoring and automation systems that reduced toil for entire engineering organizations. You are passionate about making systems observable, reliable, and secure. You understand that the best reliability work multiplies the effectiveness of the entire team - better monitoring means faster debugging for everyone, better automation means less manual toil, and better incident response processes mean the whole team can handle issues confidently. We don't believe in heroes; we believe in systems that make heroics unnecessary. You understand our specific technology stack and can hit the ground running: Go microservices running on Google Cloud Run PostgreSQL Redis Google Cloud Platform infrastructure (Cloud Run, Cloud Build, Pub/Sub, Cloud Storage) Terraform for infrastructure as code Blockchain indexing and transaction submission External service integrations You have deep expertise in at least several of these areas: Building comprehensive observability platforms (metrics, logs, traces, dashboards) Designing and implementing effective alerting strategies that minimize noise while catching real issues Creating automation and tooling that reduces operational toil Establishing incident response processes, runbooks, and postmortem practices Conducting security audits and threat modeling for distributed systems Implementing security controls, authentication/authorization systems, and secrets management Performance optimization and capacity planning for high-throughput systems Database reliability, backup/recovery strategies, and data integrity API security, rate limiting, and DDoS mitigation Compliance and audit logging for financial systems You understand that sometimes the rocket must be launched and completed in flight. This means you're comfortable making pragmatic security and reliability tradeoffs when needed, while always having a plan to improve things incrementally. You know when "good enough for now with monitoring" is the right answer, and when "we need to fix this before we ship" is non-negotiable.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
No Education Listed
Number of Employees
11-50 employees