Position Description: Multiple Openings Available Provides recommendations for Kubernetes configurations, advocating industry best practices to enhance scalability, security, and resource efficiency. Suggests optimizations for cluster management, deployment strategies, and workload balancing, resulting in improved system resilience and reduced operational overhead. Implements fine-tuned memory and CPU configurations, reducing unplanned downtime and improving application performance under heavy load in production. Automates dashboard and monitor creation using Terraform, ensuring consistency, scalability, and infrastructure management across Cloud environments. Automates the creation of Datadog resources, including alert monitors, synthetic tests, log-based metrics, and AWS resources -- EC2 instances, VPC, and other AWS resources using Terraform. Implements continuous integration and continuous deployment (CI/CD) pipelines using Jenkins, automating software delivery processes and reducing time-to-market for new features. Establishes proactive monitoring and alerting for infrastructure and application metrics using Datadog (Cloud apps) and ELK stack (On-prime apps), enabling rapid response to performance degradation or capacity issues. Participates in post-incident reviews (PIRs) and blameless retrospectives, identifying actionable improvements to prevent the recurrence of incidents and enhance system reliability. Uses business knowledge to translate the vision for divisional initiatives into business solutions by developing complex or multiple software applications and conducting studies of alternatives.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Number of Employees
101-250 employees