About The Position

This role combines Cloud DevOps engineering and architectural responsibilities, focusing on designing, automating, and managing secure, scalable, and cost-optimized cloud environments. The position requires expertise in AWS services, Infrastructure as Code (IaC), container orchestration, OS management, AI/ML integration, and modern application architectures.

Requirements

  • 7+ years of experience in Cloud Architecture and DevOps, designing and managing secure, scalable AWS environments.
  •  7+ years of hands-on expertise with AWS IaaS (EC2, VPC, IAM) and PaaS (Lambda, RDS, ECS) services.
  • 5+ years of experience implementing Infrastructure as Code (IaC) using Terraform/OpenTofu, including reusable modules and remote state management.
  • 5+ years of experience deploying and orchestrating containerized workloads using Amazon EKS and Kubernetes.
  • Strong proficiency in CI/CD pipeline design, automation scripting (Python), and integration with tools like Jenkins, GitHub Actions, or GitLab CI.
  • Experience in AI/ML integration, including Amazon SageMaker, Bedrock, and designing AI Landing Zones for predictive and generative AI workloads.
  • Expertise in modern application architectures, including event-driven, serverless (AWS Lambda), and microservices design.
  • Proven ability to lead offshore teams, in case of absence of offshore lead to manage client relationships, and delivering results aligned with SOW.

Nice To Haves

  • AWS Services
  • EC2, VPC, IAM, S3, EBS, ELB, Auto Scaling
  • Lambda, RDS, DynamoDB, CloudFormation, Systems Manager Infrastructure as Code
  • Terraform / OpenTofu: modules, remote state, workspaces
  • YAML/JSON for IaC templates and configurations
  • Containers & Orchestration
  • Docker: image creation, registries, networking
  • Kubernetes: architecture, RBAC, Helm
  • Amazon EKS: provisioning, scaling, upgrades
  • DevOps & CI/CD
  • Git workflows, automated testing, deployment strategies
  • Proficiency in Python for scripting and automation
  • Familiarity with CI/CD tools and version control systems (e.g., Git), AWS CodePipeline
  • Knowledge of infrastructure governance, monitoring, and logging tools (e.g., Prometheus, Grafana)
  • Understanding of security best practices in cloud environments
  • OS Administration & Patching
  • Linux: Ubuntu, CentOS, Amazon Linux Shell scripting, cron jobs, systemd, log rotation
  • Patch management via yum, apt, Ansible, AWS Systems Manager
  • Windows Server: AD, DNS, IIS, PowerShell Patch management via WSUS, SCCM,
  • AWS Systems Manager Group Policy, scheduled tasks, event logsSecurity & Monitoring
  • IAM policies, security groups, NACLs
  • CloudWatch, Prometheus, Grafana, ELK stack
  • Secrets management: AWS Secrets Manager, HashiCorp Vault
  • AI Ops & Integration
  • AI Landing Zone design and implementation.
  • AI/Apps integration using: Amazon Bedrock,  Amazon SageMaker or ML frameworks for predictive and generative AI.
  • Expertise in ML and Gen AI for cloud-native applications.
  • Application Architecture
  • Event-driven architecture for scalable systems.
  • Serverless architecture leveraging AWS Lambda and managed services.
  • Microservices design and deployment.
  • AI-based applications using SageMaker and Bedrock

Responsibilities

  • Architect and design cloud infrastructure solutions leveraging AWS IaaS (EC2, VPC, IAM) and PaaS (Lambda, RDS, ECS).
  • Define high-level architecture diagrams, reference architectures, and best practices for multi-cloud deployments.
  • Ensure scalability, high availability, and disaster recovery in all designs.
  • Automate provisioning and configuration using Terraform or OpenTofu.
  • Deploy and orchestrate containerized workloads using Amazon EKS and Kubernetes.
  • Build and maintain CI/CD pipelines for application delivery and infrastructure updates.
  • Administer Linux and Windows servers, including patching, hardening, and performance tuning.
  • Implement automated patch management using tools like AWS Systems Manager, WSUS, Ansible, or SCCM.
  • Monitor system health and performance using CloudWatch, Prometheus, Grafana, and native OS tools.
  • Ensure compliance with security policies and best practices across cloud and OS layers.
  • Perform deep troubleshooting across all layers:   o    Network (VPC, NACLs, Security Groups)o    IAM permissions and policy conflictso    Kubernetes cluster failures, Helm misconfigurationso    CI/CD pipeline errors and rollback strategieso    OS-level performance bottlenecks and kernel issueso    Root cause analysis and permanent fixes for outages
  • Design and implement IaC using Terraform and OpenTofu across multi-cloud environments.
  • Develop reusable modules and manage state files with remote backends and workspaces.
  • Automate workflows and CI/CD pipelines using Python and tools like Jenkins, GitHub Actions, or GitLab CI.
  • Integrate policy-as-code frameworks such as Open Policy Agent (OPA) or Terraform Sentinel for governance.
  • Collaborate with security and compliance teams to enforce resource policies and automate audits.
  • Optimize cloud resources through tagging, lifecycle policies, and cost management strategies.
  • Document infrastructure designs, scripts, and operational procedures.

Benefits

  • Paid time off based on employee grade (A-F), defined by policy: Vacation: 12-25 days, depending on grade, Company paid holidays, Personal Days, Sick Leave
  • Medical, dental, and vision coverage (or provincial healthcare coordination in Canada)
  • Retirement savings plans (e.g., 401(k) in the U.S., RRSP in Canada)
  • Life and disability insurance
  • Employee assistance programs
  • Other benefits as provided by local policy and eligibility

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

No Education Listed

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service