Senior Cloud Engineer (AWS/GCP)

Lasso Informatics Inc Lasso Informatique Inc

27d•Onsite

About The Position

This is a Senior Cloud Engineer role with a strong emphasis on reliability, security, cost control, and operational excellence in production cloud environments. You will own and operate mission-critical cloud infrastructure across multiple environments and customers, ensuring systems are secure, compliant, observable, and scalable. While this role collaborates closely with engineering teams, it is not focused on CI/CD pipelines or developer tooling. Instead, it centers on building and maintaining robust, well-governed cloud systems that support regulated research workloads. The ideal candidate brings deep hands-on cloud and systems expertise, paired with a disciplined, process-driven mindset and a track record of improving stability, cost efficiency, and service quality at scale

Requirements

7+ years in cloud infrastructure, systems engineering, or cloud operations roles
Strong, hands-on Kubernetes administration experience in both GKE (required) and EKS
Strong PostgreSQL administration experience (Aurora RDS and/or managed PostgreSQL services)
Deep Linux systems expertise (patching, hardening, troubleshooting)
Proven experience operating multi-environment, multi-customer platforms across AWS and GCP
Experience with configuration management tools (Ansible, Puppet, or equivalent)
Strong process orientation with experience writing MOPs and SOPs
Comfortable using Jira, Confluence, and Bitbucket for planning and documentation
Detail-oriented with a strong commitment to operational discipline
Experience with Keycloak or similar identity platforms

Nice To Haves

Infrastructure-as-Code experience (Terraform, CloudFormation)
Experience in regulated environments (ISO, FedRAMP, SOC 2, HIPAA, NIST 800-171)
Experience operating Kubernetes workloads across multiple cloud providers
Familiarity with Active Directory
Experience with secure research compute platforms (SLURM, Open OnDemand)
Experience with Globus (Auth, Share, Compute)
Deep Datadog experience (dashboards, monitors, logs, APM, synthetic testing)
Experience working with external research institutions or partners
Familiarity with CrowdStrike or similar endpoint security tooling

Responsibilities

Design, operate, and maintain multiple cloud environments (development, staging, production) across AWS and GCP.
Manage infrastructure for multiple customers with distinct configurations and compliance requirements.
Ensure consistency across environments and actively prevent configuration drift.
Plan and execute patches, upgrades, and infrastructure changes with minimal disruption.
Define and enforce clear operational rules of engagement for access, changes, and incident response.
Perform advanced Linux systems administration across cloud environments, including patching, hardening, troubleshooting, and performance tuning
Operate and evolve production Kubernetes platforms across AWS (EKS) and GCP (GKE), including upgrades, scaling, security hardening, and lifecycle management
Administer and tune PostgreSQL databases (Aurora RDS and managed PostgreSQL services), supporting performance, reliability, and data integrity
Evaluate, standardize, and operationalize cloud-native services across AWS and GCP to improve reliability, security, scalability, and operational consistency
Ensure cloud infrastructure adheres to defined standards while accommodating platform-specific best practices
Own observability across cloud environments using Datadog, spanning metrics, logs, and APM.
Design dashboards, alerts, and reports to enable proactive monitoring across AWS and GCP workloads.
Drive cost-awareness and optimization initiatives across both cloud platforms, balancing performance, reliability, and regulatory requirements.
Partner with engineering and research teams to identify inefficiencies and guide responsible cloud usage.
Collaborate with external institutions’ system and cloud teams to support joint projects and integrations.
Support pilot projects, proof-of-concept environments, and future product initiatives.
Act as a technical bridge between cloud operations, research teams, and internal engineering.
Develop and maintain Methods of Procedure (MOPs) and Standard Operating Procedures (SOPs).
Maintain detailed documentation in Confluence to support audit readiness and operational continuity.
Use Jira to manage cloud operations backlogs, patching cycles, and infrastructure changes.
Champion a documentation-first, process-driven culture for cloud operations.
Administer endpoint and workload protection tooling (e.g., CrowdStrike).
Own vulnerability management end-to-end: identification, remediation, validation, and documentation.
Apply cloud security best practices for IAM, secrets management, encryption, and network segmentation.
Support ISO, FedRAMP, NIST, SOC 2, and other compliance frameworks through disciplined cloud operations.
Provide technical leadership in cloud reliability and operations.
Balance stability with innovation, ensuring cloud infrastructure is prepared for future growth.
Influence cloud strategy and standards while maintaining operational rigor.