Software Engineer, Infrastructure

Greenlite AISan Francisco, CA
10dOnsite

About The Position

As a Senior Infrastructure Engineer at Greenlite, you'll own the foundation that lets us deploy secure, compliant AI systems at major financial institutions fighting financial crime at massive scale. Our infrastructure is built on a modern container-native architecture, leveraging Docker and Kubernetes orchestrated through platform automation tools like ryvn.ai to deliver consistent, auditable deployments across diverse customer environments. You'll work directly with our biggest customers—institutions serving over a billion people—architecting, automating, and hardening our on-premises and cloud environments to meet the strictest regulatory and performance requirements, including SOC 2 compliance. Your infrastructure work is informed by real customer needs and ships to everyone, so you need to build enterprise-grade systems, work effectively with engineering and customer teams, understand financial services compliance, and adapt quickly. This is a core infrastructure role on our Engineering team. You're an exceptional infrastructure engineer who understands that financial institutions need hyperscale reliability when deploying AI for compliance workflows. You'll lead reliability and scalability initiatives, architect secure infrastructure, and mentor others as we scale. You're not just building infrastructure—you're engineering enterprise-grade systems based on what our most sophisticated customers actually need for regulatory compliance and performance. Please Note: We work in-person Monday through Friday in our SF office.

Requirements

  • 8+ years in infrastructure engineering or DevOps at high-growth or hyperscale companies
  • Experience with Docker and Kubernetes, including production cluster management, Helm, and service mesh technologies
  • Proven track record of architecting and operating AWS (preferred), GCP, or Azure at enterprise scale
  • Experience with observability platforms, preferably Datadog (metrics, logs, APM, distributed tracing)
  • Strong background in Infrastructure-as-Code (Terraform, Helm, Kustomize) and safe deployment practices (progressive delivery, canary deployments, GitOps, automated rollbacks)
  • "Battle scars" from leading outages, capacity events, and large-scale incident reviews
  • Strong programming skills in Python; familiarity with TypeScript a plus
  • Experience mentoring engineers and leading technical initiatives

Nice To Haves

  • Experience with platform engineering tools like ryvn.ai, or similar
  • Direct involvement in SOC 2 or other compliance audit preparation or remediation
  • Direct experience with private-cloud or on-premises deployments for regulated customers
  • Previous experience at startups scaling infrastructure from early stage to enterprise
  • Background in fintech or building systems for highly regulated industries
  • Experience with AI/ML infrastructure and model deployment at scale

Responsibilities

  • Master our Kubernetes-based AI infrastructure platform and container orchestration workflows
  • Get hands-on with our ryvn.ai platform automation tooling, Datadog observability stack, and deployment pipelines
  • Shadow experienced engineers on customer on-premises and private cloud rollouts
  • Lead your first incident response and begin modeling system growth patterns
  • Own end-to-end infrastructure architecture for major bank and fintech deployments
  • Build and implement enterprise-grade CI/CD frameworks with embedded security, SOC 2 compliance gates, and progressive delivery mechanisms
  • Partner with customer engineering teams on complex cloud and on-premises deployments
  • Become the go-to expert for regulated AI infrastructure at scale
  • Own and evolve our Kubernetes infrastructure, including cluster management, service mesh configuration, and container security policies
  • Design and implement progressive delivery pipelines with canary deployments, automated rollbacks, and deployment health validation
  • Build and maintain observability infrastructure in Datadog, including dashboards, monitors, SLOs, and distributed tracing
  • Drive incident response for high-severity outages and proactively model capacity needs for low-latency AI inference
  • Architect and automate secure infrastructure using Infrastructure-as-Code for VPCs, IAM policies, Kubernetes manifests, and private cloud deployments
  • Maintain and improve infrastructure controls supporting our SOC 2 compliance posture
  • Lead customer engagements for enterprise rollouts and mentor mid-level engineers on infrastructure best practices

Benefits

  • Comprehensive healthcare
  • 401k matching
  • Commuter benefits
  • 15 days PTO + holidays, unlimited sick days
  • Flexible leave options
  • Working late? We've got you covered with DoorDash and an Uber home

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

No Education Listed

Number of Employees

11-50 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service