Support and manage infrastructure for LLM and Agentic AI platforms, including GPU-enabled Linux environments, inference workloads, and scalable AI services. Administer and support HPC (High Performance Computing) and GRID computing environments across both on-premises and cloud infrastructures. Collaborate with AI/ML, DevOps, and platform engineering teams to deploy, optimize, and maintain infrastructure for autonomous AI agents, vector databases, distributed compute clusters, and intelligent automation frameworks. Automate AI infrastructure provisioning, configuration management, and operational workflows using Ansible, shell scripting, and Infrastructure as Code practices. Implement monitoring, observability, and performance optimization solutions for AI/ML workloads, HPC clusters, GRID environments, and Linux-based compute infrastructure. Install, configure, and maintain Linux operating systems including RHEL, Oracle Linux (OEL), and CentOS across multiple hardware platforms. Design and customize OS builds in alignment with business requirements and industry best practices. Manage Linux patching activities to maintain system security, stability, and compliance. Plan and execute routine maintenance activities to optimize system availability and performance. Work closely with security teams to identify, remediate, and prevent system vulnerabilities. Develop, maintain, and enhance Ansible playbooks to automate system administration and configuration management tasks. Identify and implement automation opportunities to improve operational efficiency and reduce manual effort. Lead and support Linux migration projects, ensuring smooth transition of applications and services across environments. Coordinate migration planning, execution, validation, and post-migration support activities. Assist in deploying and managing containerized applications using Docker, Kubernetes, and orchestration platforms. Support hybrid cloud and on-prem infrastructure environments for compute-intensive and AI-driven workloads. Support CI/CD pipelines for infrastructure and application deployments in hybrid or cloud-native environments. Create, maintain, and update technical documentation for configurations, procedures, and operational processes.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior