Site Reliability Engineer – Datadog Specialist The Team: The IT Operations team at S&P Dow Jones Indices (S&P DJI) is tasked with owning and maintaining the Production IT systems that underpin S&P DJI's index platforms and applications, ensuring their high availability. The team prioritizes service availability, service request management, and continuous improvement of support processes through collaborative engagement with business stakeholders, operations, infrastructure, and development teams. Additionally, the team is involved in critical activities such as incident management, emergency response, change management, problem management, and capacity planning to support the robustness of S&P DJI's index platforms. Responsibilities and Impact: Design, implement, and manage end-to-end observability using Datadog APM, DBM, log pipelines, synthetic monitoring, and AI-driven alerting. Maintain production monitoring, respond to incidents, and lead root cause analysis using Datadog, Splunk, and ELK. Enhance automation and testing frameworks using Java, Spring Boot, Selenium, Cucumber, Playwright, and Jenkins. Operate AWS services including EC2, ECS, RDS, S3, DynamoDB, and Secrets Manager. Contribute to CI/CD practices and containerization technologies. Integrate monitoring with PagerDuty and ServiceNow for incident workflows. Participate in post-incident reviews, disaster recovery testing, and SRE process improvements.