About The Position

Scale GP is building the next generation of enterprise-grade Generative AI products. Our platform provides APIs for knowledge retrieval, inference, and evaluation, enabling customers to build and deploy powerful Agentic workflows for Enterprise use cases. We're looking for a Senior Infrastructure Software Engineer to build and scale our core infrastructure in a fast-paced environment. This team is key to our mission, directly enabling the deployment of these agentic flows for our customers.This is a unique opportunity for an infrastructure leader who is passionate about defining the future of AI deployments. You will be at the forefront of the industry, solving complex, bleeding-edge problems in scalability, security, and developer efficiency. You will architect and implement solutions across multiple cloud providers (GCP, Azure, AWS) for customers in diverse, highly-regulated industries like healthcare, telecom, finance, and retail.

Requirements

  • Proven experience in a senior role, with 5+ years of full-time software engineering experience.
  • Deep understanding of modern infrastructure practices, including CI/CD, IaC (e.g., Terraform, Helm Charts), container orchestration (e.g., Kubernetes) and observability platforms (e.g., Datadog, Prometheus, Grafana).
  • Extensive experience with at least one major cloud provider (AWS, Azure, or GCP).
  • Strong knowledge of security and compliance in enterprise environments, with a focus on access management, data isolation, and customer-specific VPC setups.
  • Proficiency in Python or JavaScript/TypeScript, and SQL.

Nice To Haves

  • Hands-on experience and a passion for working with Agents, LLMs, vector databases, and other emerging AI technologies.

Responsibilities

  • Define the architectural patterns for our multi-cloud infrastructure to support secure, reliable, and scalable Agentic workflows for enterprise customers.
  • Lead the infrastructure roadmap with a strong focus on compliance, privacy, and security standards, including designing change management and data isolation strategies.
  • Own the development and maintenance of our best-in-class Agentic observability platform (logging, metrics, tracing, and analytics) to proactively ensure system health and enable rapid incident response.
  • Drive developer efficiency by building automated tooling and championing Infrastructure-as-Code (IaC) paradigms throughout the engineering organization.
  • Solve the toughest engineering problems related to multi-tenancy, data isolation, and high-performance inference at a massive scale, taking end-to-end ownership across the full product lifecycle.

Benefits

  • Comprehensive health, dental and vision coverage
  • retirement benefits
  • a learning and development stipend
  • generous PTO
  • a commuter stipend

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Industry

Computing Infrastructure Providers, Data Processing, Web Hosting, and Related Services

Education Level

No Education Listed

Number of Employees

501-1,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service