Senior Cloud & Container Infrastructure Engineer

RCH SolutionsRadnor Township, PA
23dRemote

About The Position

RCH Solutions is seeking multiple Senior Cloud & Container Infrastructure Engineer to join our team of scientific computing experts. You will design, implement, automate, and operate scalable, secure, and highly reliable infrastructure that powers mission-critical applications and services. This is a hands-on senior individual contributor role with strong emphasis on Google Kubernetes Engine (GKE), container-native architectures, infrastructure as code, observability, and security best practices in Google Cloud Platform (GCP). You will serve as a GCP subject-matter expert within the team, mentor engineers, and drive platform improvements that enable developer velocity and business scale. If you're passionate about building reliable, scalable, developer-friendly platforms on Google Cloud and solving hard container and infrastructure problems at scale, we'd love to hear from you.

Requirements

  • 6+ years of hands-on experience building and operating production cloud infrastructure
  • 4+ years of deep, production experience with GCP, particularly in a senior or lead capacity
  • 3+ years of strong expertise with Kubernetes in production (preferably GKE), including cluster design, upgrades, troubleshooting, and scaling
  • Expert-level proficiency with Terraform for GCP infrastructure provisioning
  • Strong experience with container technologies: Docker, container registries (Artifact Registry), container security scanning
  • Solid understanding of GCP core services: Compute Engine, Cloud Run, Cloud SQL / AlloyDB, Cloud Storage, BigQuery, Pub/Sub, Cloud Functions, VPC, Cloud Load Balancing, Cloud Interconnect
  • Experience implementing secure IAM strategies, organization policies, and security controls in GCP
  • Proficiency in Linux systems administration, networking fundamentals, and scripting (Bash, Python, Go preferred)
  • Experience with modern CI/CD and GitOps practices in cloud environment
  • Experience supporting or using HPC environments leveraging SLUR
  • Containerization/orchestration (Docker, Kubernetes/GKE)
  • Strong understanding of data governance, cataloging, and lineage tools; basic familiarity with regulated environments (GxP, HIPAA).
  • Experience assessing existing code and workflows and identifying bottlenecks and optimization opportunities
  • Experience in software requirements gathering, documentation, design, and development

Nice To Haves

  • Google Cloud Professional certifications (e.g., Professional Cloud Architect, Professional Cloud DevOps Engineer, Professional Kubernetes Engineer)
  • Experience with Anthos, Config Management, Policy Controller, or multi-cluster management
  • Familiarity with service mesh (Istio/Envoy), ingress controllers (GKE Gateway API / Ingress), and microservices observability

Responsibilities

  • Design, deploy, and operate containerized workloads on GKE across enterprise-scale environments.
  • Manage GCP compute resources (Compute Engine, Cloud Run, GKE Autopilot) for high availability and cost efficiency.
  • Operate and scale Weaviate vector database clusters to support production AI and semantic search workloads
  • Optimize indexing, query performance, and storage configurations as data volumes grow
  • Collaborate with AI/ML teams to define schema strategies and ingestion pipelines
  • Build and maintain monitoring dashboards and alerting pipelines using Grafana
  • Integrate LLM observability tooling (LangFuse / LangSmith) to track model performance, latency, and usage across AI services
  • Drive incident response, root cause analysis, and continuous reliability improvements
  • Implement infrastructure-as-code (Terraform / Deployment Manager) for reproducible, auditable deployments and CI/CD integration.
  • Define and enforce multitenant GKE architecture: cluster security, namespace/tenant isolation, RBAC, network policies, maintenance, and scaling.
  • Mentor engineers and drive platform adoption and best practices.
  • Automate end-to-end provisioning, deployment pipelines, and day-2 operations using CI/CD tools (Cloud Build, GitHub Actions, ArgoCD, etc.)
  • Design and implement observability stacks using Google Cloud Operations Suite (formerly Stackdriver), Prometheus/Grafana, Cloud Logging, Cloud Monitoring, and distributed tracing (Cloud Trace)
  • Troubleshoot complex production issues spanning compute, networking, storage, and Kubernetes layers

Benefits

  • A competitive salary and bonus package based on experience
  • Comprehensive health and wellness benefits, including Medical, Dental, and Vision Insurance
  • Company-provided Life and Long-Term Disability Insurance
  • Company-sponsored 401(k) Plan
  • Company-provided continuing education benefit
  • Team-focused culture and unlimited opportunity for advancement
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service