Advanced Micro Devices, Inc-posted 3 months ago
Austin, TX
5,001-10,000 employees

At AMD, our mission is to build great products that accelerate next-generation computing experiences – from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career.

  • Architect and maintain robust, scalable infrastructure for training and deploying machine learning and large language models, ensuring optimal performance.
  • Collaborate with AI researchers, data scientists, and software engineers to streamline the end-to-end AI model lifecycle, from development to deployment and monitoring.
  • Design, develop, and fine-tune large-scale language models and other deep learning models for various applications.
  • Implement and manage CI/CD pipelines for AI models, facilitating continuous integration, continuous deployment, and continuous training practices.
  • Monitor the performance of machine learning and large language models, identifying and addressing issues related to data drift, model degradation, and resource constraints.
  • Develop and enforce best practices for version control, testing, and deployment of AI models, ensuring compliance with industry standards and regulatory requirements.
  • Optimize computing resources for training and inference processes, leveraging cloud technologies and onPrem solutions.
  • Stay updated with the latest advancements in AI/ML technologies, tools, and practices, integrating them into our operations to enhance efficiency and effectiveness.
  • Implement best practices in model training, including managing overfitting, underfitting, and ensuring model generalizability across various domains.
  • Fine-tune models for specific tasks or industries using targeted techniques and adapt models to new domains or applications.
  • Develop and maintain tools and frameworks to streamline the model training, validation, and deployment process.
  • Document methodologies, processes, and findings; effectively communicate complex technical information to both technical and non-technical stakeholders.
  • Mentor junior team members and contribute to the team's collective knowledge and expertise in deep learning and AI.
  • Proven experience in designing, developing, and maintaining robust software systems, with a deep understanding of performance, scalability, and reliability.
  • Hands-on experience in deploying, monitoring, and managing machine learning models in production environments, including automation of pipelines and CI/CD practices.
  • Strong proficiency in Python and familiarity with deep learning frameworks like TensorFlow, PyTorch, and Keras.
  • Demonstrated ability to troubleshoot complex issues, resolve critical bottlenecks, and drive root cause analysis under time-sensitive conditions.
  • Familiarity with cloud platforms (AWS, Azure, GCP) and containerization/orchestration technologies (Docker, Kubernetes).
  • Understanding of the ethical considerations and security implications of deploying AI models, particularly large language models.
  • Strong cross-functional collaboration skills with the ability to clearly communicate technical concepts to both technical and non-technical stakeholders.
  • Proven track record of quickly adapting to new technologies, tools, and methodologies in a fast-paced environment.
  • AMD benefits at a glance.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service