About The Position

Operate within a fast‑paced, high‑performing production environment where meeting strict SLAs is essential to business continuity. Maintain constant awareness of system health, job performance, and data flow stability to ensure uninterrupted service delivery. Take ownership of incidents across all severity levels, from routine operational issues to high‑impact outages. Diagnose problems quickly, apply known fixes or workarounds, and coordinate with developers, infrastructure, and database teams to restore full functionality. Analyze system performance metrics—including database behavior, server resource utilization, and data pipeline throughput—to identify patterns, bottlenecks, and emerging risks. Provide actionable recommendations to improve reliability, scalability, and efficiency. Proactively identify opportunities to enhance service quality, reduce manual effort, and strengthen system resilience. Suggest improvements to monitoring, alerting, automation, documentation, and operational processes. Serve as a knowledge resource for junior team members by sharing best practices, troubleshooting techniques, and product expertise. Provide informal training and mentorship to help build team capability and consistency.

Requirements

  • Associate’s Degree or equivalent professional experience required.
  • 2–5 years of experience in information technology, systems administration, production support, or a related technical field.
  • Applies established methodologies, operational procedures, and industry best practices to ensure consistent, high‑quality support.
  • Demonstrates the ability to analyze, troubleshoot, and resolve intermediate‑level issues involving system hardware, software, databases, or data processing pipelines.
  • Communicates clearly and effectively in both written and verbal form. Able to translate complex technical concepts into language that is accessible to non‑technical stakeholders.

Nice To Haves

  • Bachelor’s Degree in Information Technology, Computer Science, or a related discipline preferred.
  • Experience preferred with Oracle or Cassandra databases, Unix/Linux environments, and Azure Databricks, including the ability to support SQL queries, shell scripts, and distributed data processing workloads.

Responsibilities

  • Maintain constant awareness of system health, job performance, and data flow stability
  • Take ownership of incidents across all severity levels
  • Diagnose problems quickly, apply known fixes or workarounds
  • Coordinate with developers, infrastructure, and database teams to restore full functionality
  • Analyze system performance metrics
  • Identify patterns, bottlenecks, and emerging risks
  • Provide actionable recommendations to improve reliability, scalability, and efficiency
  • Proactively identify opportunities to enhance service quality, reduce manual effort, and strengthen system resilience
  • Suggest improvements to monitoring, alerting, automation, documentation, and operational processes
  • Serve as a knowledge resource for junior team members
  • Provide informal training and mentorship to help build team capability and consistency
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service