Senior Cloud Engineer (AWS/GCP)

Lasso Informatics Inc Lasso Informatique Inc
4hOnsite

About The Position

This is a Senior Cloud Engineer role with a strong emphasis on reliability, security, cost control, and operational excellence in production cloud environments. You will own and operate mission-critical cloud infrastructure across multiple environments and customers, ensuring systems are secure, compliant, observable, and scalable. While this role collaborates closely with engineering teams, it is not focused on CI/CD pipelines or developer tooling. Instead, it centers on building and maintaining robust, well-governed cloud systems that support regulated research workloads. The ideal candidate brings deep hands-on cloud and systems expertise, paired with a disciplined, process-driven mindset and a track record of improving stability, cost efficiency, and service quality at scale

Requirements

  • 7+ years in cloud infrastructure, systems engineering, or cloud operations roles
  • Strong, hands-on Kubernetes administration experience in both GKE (required) and EKS
  • Strong PostgreSQL administration experience (Aurora RDS and/or managed PostgreSQL services)
  • Deep Linux systems expertise (patching, hardening, troubleshooting)
  • Proven experience operating multi-environment, multi-customer platforms across AWS and GCP
  • Experience with configuration management tools (Ansible, Puppet, or equivalent)
  • Strong process orientation with experience writing MOPs and SOPs
  • Comfortable using Jira, Confluence, and Bitbucket for planning and documentation
  • Detail-oriented with a strong commitment to operational discipline
  • Experience with Keycloak or similar identity platforms

Nice To Haves

  • Infrastructure-as-Code experience (Terraform, CloudFormation)
  • Experience in regulated environments (ISO, FedRAMP, SOC 2, HIPAA, NIST 800-171)
  • Experience operating Kubernetes workloads across multiple cloud providers
  • Familiarity with Active Directory
  • Experience with secure research compute platforms (SLURM, Open OnDemand)
  • Experience with Globus (Auth, Share, Compute)
  • Deep Datadog experience (dashboards, monitors, logs, APM, synthetic testing)
  • Experience working with external research institutions or partners
  • Familiarity with CrowdStrike or similar endpoint security tooling

Responsibilities

  • Design, operate, and maintain multiple cloud environments (development, staging, production) across AWS and GCP.
  • Manage infrastructure for multiple customers with distinct configurations and compliance requirements.
  • Ensure consistency across environments and actively prevent configuration drift.
  • Plan and execute patches, upgrades, and infrastructure changes with minimal disruption.
  • Define and enforce clear operational rules of engagement for access, changes, and incident response.
  • Perform advanced Linux systems administration across cloud environments, including patching, hardening, troubleshooting, and performance tuning
  • Operate and evolve production Kubernetes platforms across AWS (EKS) and GCP (GKE), including upgrades, scaling, security hardening, and lifecycle management
  • Administer and tune PostgreSQL databases (Aurora RDS and managed PostgreSQL services), supporting performance, reliability, and data integrity
  • Evaluate, standardize, and operationalize cloud-native services across AWS and GCP to improve reliability, security, scalability, and operational consistency
  • Ensure cloud infrastructure adheres to defined standards while accommodating platform-specific best practices
  • Own observability across cloud environments using Datadog, spanning metrics, logs, and APM.
  • Design dashboards, alerts, and reports to enable proactive monitoring across AWS and GCP workloads.
  • Drive cost-awareness and optimization initiatives across both cloud platforms, balancing performance, reliability, and regulatory requirements.
  • Partner with engineering and research teams to identify inefficiencies and guide responsible cloud usage.
  • Collaborate with external institutions’ system and cloud teams to support joint projects and integrations.
  • Support pilot projects, proof-of-concept environments, and future product initiatives.
  • Act as a technical bridge between cloud operations, research teams, and internal engineering.
  • Develop and maintain Methods of Procedure (MOPs) and Standard Operating Procedures (SOPs).
  • Maintain detailed documentation in Confluence to support audit readiness and operational continuity.
  • Use Jira to manage cloud operations backlogs, patching cycles, and infrastructure changes.
  • Champion a documentation-first, process-driven culture for cloud operations.
  • Administer endpoint and workload protection tooling (e.g., CrowdStrike).
  • Own vulnerability management end-to-end: identification, remediation, validation, and documentation.
  • Apply cloud security best practices for IAM, secrets management, encryption, and network segmentation.
  • Support ISO, FedRAMP, NIST, SOC 2, and other compliance frameworks through disciplined cloud operations.
  • Provide technical leadership in cloud reliability and operations.
  • Balance stability with innovation, ensuring cloud infrastructure is prepared for future growth.
  • Influence cloud strategy and standards while maintaining operational rigor.

Benefits

  • Competitive salary and benefits package
  • In-office work culture with required presence Tuesday through Thursday
  • Opportunities for leadership and professional growth
  • Collaborative team committed to innovation, quality, and scientific impact
  • Access to training resources and ongoing professional development
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service