The role of AI Optimization Engineer involves supporting large-scale AI/ML and Generative AI workloads within an enterprise setting. This position is primarily focused on the optimization, deployment, and management of machine learning and large language models (LLMs) on GPU-accelerated High-Performance Computing (HPC) infrastructure. The successful candidate will possess significant experience in Python-based machine learning, deep learning frameworks, various model optimization techniques, and the development of scalable AI infrastructure. The engineer will collaborate closely with AI, infrastructure, and DevOps teams to devise efficient pipelines for model training and inference, implement SLURM-based workload orchestration, and deploy containerized ML solutions into production environments. Key responsibilities include enhancing model performance through methods such as pruning, quantization, and knowledge distillation, overseeing inference workflows using Triton Inference Server, and monitoring system performance with Prometheus and Grafana. The position demands practical experience with HPC environments, GPU clusters, containerization technologies, and Linux system administration, alongside a robust understanding of machine learning algorithms, deep learning architectures, and contemporary AI development tools. Prior experience with cloud platforms, vector embedding, and enterprise-scale AI deployments is considered a strong advantage.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior
Education Level
No Education Listed