Operate within a fast‑paced, high‑performing production environment where meeting strict SLAs is essential to business continuity. Maintain constant awareness of system health, job performance, and data flow stability to ensure uninterrupted service delivery. Take ownership of incidents across all severity levels, from routine operational issues to high‑impact outages. Diagnose problems quickly, apply known fixes or workarounds, and coordinate with developers, infrastructure, and database teams to restore full functionality. Analyze system performance metrics—including database behavior, server resource utilization, and data pipeline throughput—to identify patterns, bottlenecks, and emerging risks. Provide actionable recommendations to improve reliability, scalability, and efficiency. Proactively identify opportunities to enhance service quality, reduce manual effort, and strengthen system resilience. Suggest improvements to monitoring, alerting, automation, documentation, and operational processes. Serve as a knowledge resource for junior team members by sharing best practices, troubleshooting techniques, and product expertise. Provide informal training and mentorship to help build team capability and consistency.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
Associate degree