About The Position

This role is a senior-level position focused on managing and optimizing large-scale Linux infrastructure in both production and development environments. You will ensure system reliability, security, and performance while collaborating closely with DevOps, network, and software teams. The position offers a chance to implement automation, streamline processes, and support mission-critical AI/ML workloads. You will maintain robust monitoring, configuration management, and disaster recovery protocols, contributing directly to operational excellence and system uptime. Ideal candidates are proactive problem-solvers with deep expertise in Ubuntu and Red Hat Enterprise Linux, skilled in automation, cloud platforms, and infrastructure best practices. This fully remote role requires flexibility to overlap with U.S. IST hours and the ability to operate independently in a high-impact environment.

Requirements

  • 5+ years of hands-on experience in Linux system administration (Ubuntu & Red Hat).
  • Strong expertise in Bash scripting and automation tools such as Ansible, Terraform, or Python basics.
  • Experience with monitoring tools like Nagios, Zabbix, or Prometheus.
  • Solid understanding of networking fundamentals, including DNS, DHCP, NFS, SSH, and firewalls.
  • Knowledge of virtualization and containerization technologies (VMware, KVM, Docker, etc.).
  • Troubleshooting skills for system logs, kernel issues, and service failures.
  • Familiarity with version control systems (Git) and CI/CD pipeline environments.
  • Exposure to cloud platforms (AWS, GCP, Azure) is advantageous.
  • Red Hat Certified System Administrator (RHCSA) or Engineer (RHCE) preferred.
  • Experience with high-availability clusters, load balancing, and RAID management is a plus.
  • Excellent communication, documentation, and coordination skills for supporting global teams.
  • Strong ownership, accountability, and attention to detail; able to maintain SLAs under pressure.

Nice To Haves

  • Red Hat Certified System Administrator (RHCSA) or Engineer (RHCE) preferred.
  • Experience with high-availability clusters, load balancing, and RAID management is a plus.

Responsibilities

  • Manage, monitor, and maintain Ubuntu and Red Hat Linux servers across production and staging environments.
  • Perform system upgrades, kernel updates, patch management, and performance tuning to ensure optimal reliability.
  • Implement and enforce security policies, user access controls, and backup/recovery strategies.
  • Troubleshoot hardware, OS, and network-related issues; ensure minimal service disruption.
  • Maintain configuration management and deployment pipelines using Ansible, Puppet, or similar tools.
  • Monitor system health, resource utilization, and AI/ML workloads to guarantee uptime and performance.
  • Collaborate with DevOps, Cloud, and Software teams for environment provisioning and infrastructure scaling (AWS, Azure, or on-prem).
  • Participate in capacity planning, disaster recovery, and incident response activities.
  • Maintain detailed documentation, SOPs, and audit reports to support compliance and operational transparency.

Benefits

  • Fully remote work with flexible scheduling, including part-time or full-time options.
  • Competitive compensation aligned with experience and market standards.
  • Opportunities to work on AI/ML infrastructure projects and cutting-edge technologies.
  • Professional growth and skill development in Linux systems, automation, and cloud environments.
  • Collaborative, high-performing team environment with global impact.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service