Sr. DevOps Engineer - AWS Cloud

PICTOR LABS INCAustin, TX
1d$190,000 - $250,000Remote

About The Position

We're seeking an AWS Cloud Operations Architect to design, build, and maintain the cloud and edge infrastructure that powers our AI-driven virtual staining platform. You'll ensure our infrastructure is secure, compliant, scalable, and performant—supporting both our cloud-based SaaS platform and Pictor Edge device deployments. This role requires deep AWS expertise combined with a strong understanding of regulated healthcare environments and AI/ML production workloads.

Requirements

  • Bachelor's degree in Computer Science, Engineering, or related field (or equivalent practical experience)
  • 7+ years of experience in DevOps, Cloud Operations, or Site Reliability Engineering with focus on AWS infrastructure
  • Demonstrated experience with SOC2, HIPAA, or FDA-regulated software environments
  • 2+ years of hands-on experience with production AI/ML infrastructure and workloads
  • Strong proficiency with AWS services: EC2, S3, RDS, Lambda, EKS, VPC, IAM, CloudFront, CloudWatch, and AWS MSK
  • Expert-level experience with Infrastructure as Code tools (Terraform or CloudFormation)
  • Hands-on experience with Cloudflare services including CDN, DNS, WAF, and DDoS protection
  • Proven experience deploying and managing Apache Kafka or AWS MSK in production environments
  • Production experience with Docker containerization and Kubernetes orchestration (EKS)
  • Strong scripting abilities in Python and Bash for infrastructure automation
  • Experience with CI/CD tools such as GitHub Actions, GitLab CI, Jenkins, or CircleCI
  • Hands-on experience with monitoring and logging tools (Prometheus, Grafana, ELK Stack, CloudWatch)
  • Deep understanding of security best practices for cloud infrastructure, applications, and networking
  • Excellent problem-solving skills, meticulous attention to detail, and strong technical documentation skills

Nice To Haves

  • AWS Certified Solutions Architect, DevOps Engineer, or Security Specialty certification
  • Experience with ML inference optimization and GPU-accelerated workloads
  • Knowledge of FDA 21 CFR Part 11 requirements and medical device software lifecycle
  • Experience deploying containerized applications to edge devices or on-premise infrastructure
  • Familiarity with NVIDIA GPU infrastructure and CUDA optimization
  • Experience with model serving frameworks (NVIDIA Triton, TorchServe, TensorFlow Serving)
  • Background in digital pathology, medical imaging, or healthcare technology
  • Experience with multi-cloud or hybrid cloud architectures and MLOps workflows

Responsibilities

  • Design and maintain highly available, secure, and scalable cloud infrastructure on AWS using Infrastructure as Code (Terraform, CloudFormation)
  • Design and manage AWS services including EC2, S3, RDS, Lambda, EKS, VPC, IAM, API Gateway, CloudFront, and CloudWatch
  • Design infrastructure to support both cloud-based services and containerized edge deployments for Pictor Edge devices
  • Design and maintain inference infrastructure for large-scale pathology image processing and optimize GPU-accelerated workloads for production AI models (Deepstain, Restain, ClearStain)
  • Support model serving infrastructure (TorchServe, NVIDIA Triton, or similar) and set up data pipelines using Apache Kafka and AWS MSK
  • Implement and maintain SOC2 and HIPAA compliance controls across all infrastructure
  • Support FDA medical device software requirements including audit trails, access controls, and validation documentation
  • Manage Cloudflare services (CDN, DNS, WAF, SSL/TLS) and integrate security best practices across all infrastructure layers
  • Build and optimize CI/CD pipelines for efficient code delivery and automate infrastructure provisioning and deployment processes
  • Implement comprehensive monitoring and alerting using Cloud Watch, Prometheus, Grafana and ELK Stack
  • Establish SLAs and SLOs for production services, design disaster recovery procedures, and provide incident response for production systems
  • Partner with Software Engineering, ML Engineering, ML Research, and Edge Device teams to deliver optimal solutions
  • Mentor engineers on cloud architecture, DevOps practices, and security best practices
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service