About The Position

We are actively recruiting a staff software engineer to own the security, reliability, and observability of the fastest growing e-commerce startup. You will be reporting directly to our Head of Engineering and work very closely with many members of our engineering team. Your mission will include establishing and maintaining world-class observability, monitoring and alerting systems, building systems that reduce operational toil for the entire engineering team, and conducting security audits, reviews and mitigations across our entire platform. We take reliability and security seriously. Doing so prepared us to scale to $500M in volume in under a year. You will help us scale the next 100x while keeping our systems secure and reliable.

Requirements

  • 8+ years of experience building, securing, and operating complex distributed systems at scale.
  • Experience with Go microservices running on Google Cloud Run.
  • Proficiency in PostgreSQL and Redis.
  • Familiarity with Google Cloud Platform infrastructure (Cloud Run, Cloud Build, Pub/Sub, Cloud Storage).
  • Experience with Terraform for infrastructure as code.
  • Knowledge of blockchain indexing and transaction submission.
  • Experience with external service integrations.
  • Deep expertise in building comprehensive observability platforms (metrics, logs, traces, dashboards).
  • Experience designing and implementing effective alerting strategies.
  • Ability to create automation and tooling that reduces operational toil.
  • Experience establishing incident response processes, runbooks, and postmortem practices.
  • Experience conducting security audits and threat modeling for distributed systems.
  • Knowledge of implementing security controls, authentication/authorization systems, and secrets management.
  • Experience with performance optimization and capacity planning for high-throughput systems.
  • Knowledge of database reliability, backup/recovery strategies, and data integrity.
  • Experience with API security, rate limiting, and DDoS mitigation.
  • Knowledge of compliance and audit logging for financial systems.

Responsibilities

  • Build and maintain comprehensive monitoring across our microservices architecture.
  • Instrument our Go services with meaningful metrics.
  • Create dashboards that tell the story of system health.
  • Ensure every engineer can debug any issue in production with the data we collect.
  • Design alerting strategies that wake people up for real problems, not noise.
  • Build better alerts, better runbooks, and better automation to reduce toil.
  • Conduct regular security reviews of our codebase, infrastructure, and third-party integrations.
  • Identify vulnerabilities before they become incidents and implement mitigations.
  • Establish security best practices and ensure they're followed.
  • Build systems and processes that enable effective incident response across the team.
  • Create runbooks, automate common remediation tasks, and establish postmortem practices.
  • Identify and eliminate single points of failure.
  • Implement circuit breakers, retries, and graceful degradation.
  • Build automation that reduces manual operational work.
  • Secure our GCP infrastructure, manage secrets properly, and implement least-privilege access controls.
  • Own the security of our CI/CD pipelines and deployment processes.

Benefits

  • A dynamic and engaging environment focused on fostering real growth and innovation.
  • Opportunities to create amazing products that our customers truly love and value.
  • Comprehensive health insurance packages with dependent coverage.
  • Competitive salary with ample opportunities for career advancement and development.
  • Flexibility of a fully remote work environment.
  • Access to employee wellness programs designed to support your overall well-being.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service