Application Architect - Microsoft Analytics & Cloud

CTG (Computer Task Group, Inc.)Atlanta, GA
35d

About The Position

CTG is seeking to fill an Application Architect - Microsoft Analytics & Cloud position for our client in Atlanta, GA. Duration: 12 months Duties: Monitor and support production commerce applications to ensure performance, stability, and high availability. Perform first-level triage, validate incidents, and determine impact and urgency. Analyze logs and metrics using ELK, Dynatrace, and Kubernetes tools to identify and diagnose issues. Partner with development, cloud, and platform teams to escalate and resolve incidents quickly. Maintain and optimize observability dashboards, alerts, and monitoring thresholds. Contribute to RCA activities and support ongoing reliability improvements. Create and update runbooks, SOPs, and known-issue documentation. Support performance tuning, availability metrics, and service reliability initiatives.

Requirements

  • Strong SRE foundation within large-scale enterprise environments.
  • Hands-on experience with ELK Stack, Dynatrace, Kubernetes observability, and Azure monitoring tools.
  • Familiarity with Kafka monitoring, microservices, APIs, CI/CD pipelines, and automated alerting.
  • Working knowledge of Java-based architectures and Cassandra operations.
  • Proven experience in system reliability, production support, or application monitoring.
  • Demonstrated ability in triaging production issues, log analysis, and root cause identification.
  • Background supporting distributed, cloud-based, or microservices-driven systems.
  • Excellent verbal and written English communication skills and the ability to interact professionally with a diverse group are required.

Nice To Haves

  • MuleSoft monitoring experience is a plus.

Responsibilities

  • Monitor and support production commerce applications to ensure performance, stability, and high availability.
  • Perform first-level triage, validate incidents, and determine impact and urgency.
  • Analyze logs and metrics using ELK, Dynatrace, and Kubernetes tools to identify and diagnose issues.
  • Partner with development, cloud, and platform teams to escalate and resolve incidents quickly.
  • Maintain and optimize observability dashboards, alerts, and monitoring thresholds.
  • Contribute to RCA activities and support ongoing reliability improvements.
  • Create and update runbooks, SOPs, and known-issue documentation.
  • Support performance tuning, availability metrics, and service reliability initiatives.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Career Level

Mid Level

Industry

Administrative and Support Services

Number of Employees

1,001-5,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service