Backend / Infrastructure Engineer

GovEagleNew York City, NY
14hOnsite

About The Position

Over the past year, GovEagle has become the vendor of choice for most of the top 10 government contractors in the country. Our platform is experiencing 10x usage growth, and our AI agents are helping customers shred requirements, build compliance matrices, and draft proposals in hours instead of weeks. To keep up with this unprecedented scale, we're looking for a Staff Backend / Infrastructure Engineer to own the reliability, scalability, and operational excellence of our platform. You'll be the person who ensures our services stay fast, observable, and resilient as the largest government contractors in the world rely on GovEagle daily. This role is deeply systems-focused. You'll be building the foundational infrastructure that makes everything else possible. This role is in-person in New York City, 4 days per week, to maximize collaboration and speed.

Requirements

  • 5+ years of backend/infrastructure engineering experience, with meaningful time spent on production systems at scale.
  • Deep expertise in Python and cloud infrastructure. You should be very comfortable working with FastAPI, Temporal, Kubernetes, and AWS.
  • Strong IaC and orchestration skills. You've worked extensively with Terraform, Helm, and Kubernetes in production environments.
  • Strong observability instincts. You've built on top of or significantly improved Grafana-based monitoring, logging, and alerting systems.
  • You've dealt with real scaling challenges and can speak concretely about the tradeoffs you made.
  • An ownership mentality. You don't wait to be told what to work on. You see a gap, propose a solution, and execute.
  • Comfortable working in a startup environment. We're a small team moving fast with a lot of autonomy and very little bureaucracy.

Nice To Haves

  • Experience with AWS GovCloud, FedRAMP, or other government compliance frameworks.
  • Experience with on-prem deployment and maintenance.
  • Experience with AI/LLM workloads (long-running tasks, high-memory jobs, async processing).
  • Experience with OpenTelemetry, Loki, Tempo, or Mimir.

Responsibilities

  • Building on top of our existing Grafana stack to improve logging, metrics, tracing, and alerting so we can understand system behavior in production and catch issues before customers do.
  • Designing and implementing infrastructure improvements to keep up with 10x usage growth: more customers, heavier workloads, and increasingly complex AI pipelines.
  • Improving our build, test, and deployment pipelines so the team can ship faster and with more confidence.
  • Owning our on-call processes, SLOs, and incident response playbooks.
  • Making production boring.
  • Working within and improving our security posture, particularly around FedRAMP and GovCloud requirements.
  • Being a thought partner to the CTO on backend architecture decisions.
  • Evaluating tradeoffs, writing RFCs, and raising the engineering bar across the team.

Benefits

  • Vision, dental, and medical insurance
  • Generous PTO
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service