Large Model Inference Acceleration Engineer

Tiktok•San Jose, CA

40d

About The Position

The Intelligent Creation - AI Platform team is a team focusing on building advanced end-to-end AI production pipelines, including deep learning model training, optimization, deployment and applications. We provide AI capabilities to empower content creation and consumption on TikTok and serve billions of users. We are seeking an experienced AI model optimization engineer with expertise in optimizing AI model training and inference, including distributed training/inference and acceleration. The ideal candidate will work at the cutting edge of AI efficiency, enhancing the performance, scalability, and deployment of large-scale generative AI models.

Requirements

Master's or PhD in Computer Science, Electrical Engineering, Artificial Intelligence, or a related field.
Strong software engineering skills, including proficiency in Python, C++, and CUDA.
5+ years of experience in AI model inference optimization.
Experience working with ML compilers, parallel computing optimization, graph fusion, CUDA kernel development and TensorRT/Triton/Cutlass for model inference acceleration.
Knowledge of transformers and diffusion models.

Responsibilities

Design and optimize large model inference pipelines for low-latency, high-throughput deployments across diverse hardware architectures through high-performance optimization technologies.
Benchmark and profile deep learning models to identify performance bottlenecks and optimize computational resources.
Collaborate with production engineers and infrastructure teams to ensure seamless integration of optimized models into production environments.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume