Principal Cloud Engineer

FidelityDurham, NC
11dHybrid

About The Position

As a Principal Cloud Engineer , you blend deep public Cloud (AWS & Azure) Container Platform services, Kubernetes services experience and resiliency engineering expertise with a passion for delivering results. Our Container platform management engineering group within Enterprise Infrastructure DCS(Distributed Compute and Storage) combines Container Platform Operations Excellence with the Development Experience to deliver services at high scale, high availability with resilience by using automation and Infrastructure Code. We build reliability into our ecosystem by applying best practices in Resiliency Engineering, Automation, Observability & Chaos Testing. This individual should have hands-on experience on developing and maintaining CICD pipelines, EKS support and maintenance, Maintaining Stash repository, automating manual processes and release process improvement. This individual will be partnering with product delivery teams across the firm to help them with onboarding applications onto the Cloud, providing suggestions and helping with the right infrastructure configurations on the cloud. He/she will be actively involved in the transformation of the support operating model with an aggressively changing application infrastructure from On-prem and AWS native model to Kubernetes (EKS). Creating scalable solutions and automation to monitor the Container Platform health and establish signals to drive understanding of our public Cloud and Container Platform environments. Strengthening event based data-driven operational processes with Fidelity support and incident management teams for our cloud ecosystem. Maintain Kubernetes platform, EKS and AKS clusters creation, rehydration services across enterprise and help with Helm charts, Deployments.

Requirements

  • Bachelor’s Degree or equivalent in a technology-related field (e.g. Computer Science, Engineering, etc.) required.
  • Production experience running Kubernetes workloads EKS/AKS/RKS(Rancher)
  • Experience managing and maintaining Kubernetes Clusters on EKS/AKS and RKS.
  • Experience creating and deploying Helm charts & libraries
  • 10+ years of hands-on experience deploying and/or supporting highly distributed multi-tiered systems at scale.
  • 8+ years of experience in Cloud development (AWS/Azure) and migration skills
  • Experience building and deploying Docker images including Docker Compose
  • Hands-on experience with Jenkins Core, including authoring and maintaining declarative CI/CD pipelines and libraries
  • Proficiency with UNIX operating systems and shell scripting
  • Programming experience, Python and/ or Golang (both preferred).
  • Experience with distributed version control systems, Git preferred.
  • Experience crafting and maintaining logging, monitoring, and alerting capabilities using tools like Datadog and Splunk
  • 10+ years of experience in software development with Python, NodeJS, or Java with a focus on SDLC and automation
  • Practical experience in building cloud hosted and native applications for the enterprise.
  • Maintains a deep understanding of a wide variety of AWS/Azure services that support reliability, observability, and automation/orchestration.
  • Possess at least one cloud certification (AWS or Azure preferred).
  • Experience in incident/crisis management and supporting mission critical applications.
  • Possess at least one cloud certification (AWS or Azure preferred).
  • Experience in incident/crisis management and supporting mission critical applications.
  • Solid understanding of Cloud Computing and DevOps concepts including CI/CD pipelines
  • Hands-on Kubernetes skills and knowledge
  • Hands on experience with one or more observability tools (Prometheus, Grafana, ELK/OpenSearch, OpenTelemetry, Datadog, etc.)
  • Ability to automate with various scripting languages (Python, Shell scripting, etc.)
  • Experience managing systems using infrastructure as code tools (IAM, ARM, Terraform, Chef)
  • Ability to triage, execute root cause analysis, and be decisive under pressure.
  • Experience managing and interpreting large datasets using query languages and visualization tools (Power BI/ Tableau), data driven approach to analyze cloud health events and alerts at scale.

Responsibilities

  • developing and maintaining CICD pipelines
  • EKS support and maintenance
  • Maintaining Stash repository
  • automating manual processes and release process improvement
  • partnering with product delivery teams across the firm to help them with onboarding applications onto the Cloud, providing suggestions and helping with the right infrastructure configurations on the cloud
  • transformation of the support operating model with an aggressively changing application infrastructure from On-prem and AWS native model to Kubernetes (EKS)
  • Creating scalable solutions and automation to monitor the Container Platform health and establish signals to drive understanding of our public Cloud and Container Platform environments
  • Strengthening event based data-driven operational processes with Fidelity support and incident management teams for our cloud ecosystem
  • Maintain Kubernetes platform, EKS and AKS clusters creation, rehydration services across enterprise and help with Helm charts, Deployments

Benefits

  • comprehensive health care coverage and emotional well-being support
  • market-leading retirement
  • generous paid time off and parental leave
  • charitable giving employee match program
  • educational assistance including student loan repayment, tuition reimbursement, and learning resources to develop your career
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service