Tata Consultancy Services-posted 3 months ago
$115,000 - $125,000/Yr
Full-time • Mid Level
Irvine, CA
Professional, Scientific, and Technical Services

The AWS Operations Lead will be responsible for managing cloud infrastructure, ensuring observability and monitoring, and leading incident response efforts. This role requires a strong background in AWS services and a deep understanding of cloud operations and DevOps practices. The ideal candidate will have experience with Datadog and will be expected to automate infrastructure management while optimizing performance and costs. The position also involves collaboration with development teams and providing mentorship to junior members.

  • Lead the design, implementation, and management of scalable, highly available, and secure cloud infrastructure on AWS.
  • Serve as the subject matter expert for Datadog. Configure, maintain, and optimize Datadog dashboards, monitors, and alerts for infrastructure, applications, and logs.
  • Be the primary escalation point for all production incidents related to the AWS environment. Utilize Datadog to quickly identify the root cause of issues, troubleshoot, and resolve them.
  • Automate infrastructure provisioning, configuration, and management using tools like AWS CloudFormation, Terraform, or Ansible.
  • Continuously monitor AWS resource utilization and performance. Use Datadog to identify bottlenecks and implement strategies for cost optimization and efficiency.
  • Implement and enforce security best practices, including IAM, encryption, network security, and compliance with industry standards.
  • Develop and maintain operational policies, procedures, and standards. Proactively identify opportunities for improvement and lead initiatives to enhance the reliability and security of the AWS environment.
  • Work closely with development teams to support CI/CD pipelines and ensure seamless application deployments.
  • Participate in an on-call rotation to provide 24/7 support for critical systems.
  • Provide guidance and mentorship to junior team members, fostering a culture of operational excellence.
  • 5+ years of experience in cloud operations, DevOps, or a similar role, with a strong focus on AWS.
  • Proven experience with a wide range of AWS services, including but not limited to EC2, S3, RDS, Lambda, VPC, ECS/EKS, and CloudWatch.
  • Expert-level proficiency with Datadog, including hands-on experience with APM, Infrastructure and Log Management, Synthetics, Alerting, and dashboarding.
  • Strong understanding of observability principles (metrics, traces, logs).
  • Experience with Infrastructure as Code (IaC) tools such as Terraform or CloudFormation.
  • Proficiency in scripting languages (e.g., Python, Bash) for automation.
  • Solid understanding of networking concepts (VPC, subnets, security groups, routing).
  • Excellent problem-solving skills and a methodical approach to troubleshooting complex issues.
  • Strong communication and collaboration skills.
  • AWS Certification (e.g., AWS Certified SysOps Administrator, AWS Certified DevOps Engineer).
  • Experience with containerization technologies (Docker, Kubernetes).
  • Experience with other monitoring tools and a strong understanding of when and why to use them.
  • Knowledge of incident management frameworks and on-call best practices.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service