This role involves leading the response to production issues, ensuring minimal downtime and adherence to SLAs. The Senior System Engineer will build alerting, monitoring, and dashboards for proactive problem identification. They will use strong analytical and technical skills to diagnose and resolve complex production issues, focusing on immediate impact mitigation and automating recovery processes. The role also includes working with development teams on long-term solutions, creating and maintaining system documentation, and developing scripts and automation tools. A key aspect is identifying and ensuring non-functional requirements like reliability, performance, and scalability are met before production deployment. The engineer will monitor application performance using tools like Dynatrace and App Dynamics, identify bottlenecks, and optimize application performance. Defining SLI/SLOs and Error Budgets, and documenting failure patterns for resilience are also responsibilities. Capacity planning, security assessments, and responding to security incidents are included. Collaboration with Release Management and development teams for managing and supporting application releases and deployments is essential, ensuring controlled rollouts. The role requires proactive problem detection, trend analysis, and providing metrics and status reports to leadership. Strong communication skills are necessary, along with knowledge transfer to Product Development teams. The position requires 24x7 on-call support for various applications, including J2EE, Salesforce, Salesforce Marketing Cloud, and MuleSoft, applying an SRE approach to support large-scale applications like Java EE, ERP, and CRM apps. The role also involves architecting and developing web applications.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior