Principal AI Compute SA, AGS Namer Tech

Amazon•San Francisco, CA

4d•$210,200 - $284,300•Onsite

About The Position

This role is for a Principal AI Compute Specialist Solutions Architect (SA) focused on AI, Machine Learning (ML), Deep Learning, Generative AI, and Agentic AI. The position involves driving the production usage of ML and AI at scale, leveraging Amazon Web Services (AWS). The SA will act as a Subject Matter Expert, designing scalable, secure, and cost-effective AI/ML solutions on AWS. The role requires architecting production-grade, reliable, well-governed, and compliant solutions, with a focus on responsible AI practices and robust governance frameworks. The individual will guide customers through their AI transformation journey, establish scalable GenAIOps practices, and create sustainable, enterprise-grade AI architectures. This includes engaging directly with customers, presenting AWS services, developing technical content, and enabling customers, partners, and ISVs. The role also involves contributing to the broader AWS technical community by sharing insights internally. The ideal candidate will have deep technical experience across the AI spectrum, strong communication skills, and the ability to engage stakeholders at all levels. Previous AWS experience is valued but not required if the candidate has experience building large-scale solutions. The role offers the opportunity to work directly with senior engineers and influence roadmaps.

Requirements

10+ years of specific technology domain areas (e.g. software development, cloud computing, systems engineering, infrastructure, security, networking, data & analytics) experience
Bachelor's degree in computer science, engineering, mathematics or equivalent
Experience developing technology solutions and evangelising end-to-end technology roadmaps that guide IT transformations toward cloud computing
Experience communicating across technical and non-technical audiences and at C-level, including training, workshops, publications
7+ years of experience in AI/ML infrastructure, GPU computing, or custom silicon development (e.g., accelerator design, compiler/runtime development, HW/SW co-design)
Deep hands-on experience with GPU optimization, utilization profiling, and workload performance tuning across NVIDIA GPU families (H100, B200, B300) or equivalent accelerators
Experience architecting multi-architecture compute strategies spanning GPU, custom silicon (Trainium/Inferentia), and CPU for inference and training workloads
Experience developing compute roadmaps or capacity planning strategies for large-scale AI infrastructure customers
Bachelor's degree in computer science, electrical engineering, computer engineering, or equivalent

Nice To Haves

Knowledge of distributed systems design and implementation or equivalent
Knowledge of large scale automation and workflow management or equivalent
Knowledge of database design and implementation or equivalent
Knowledge of presentations and whiteboarding skills with a high degree of comfort speaking with internal and external executives, IT management, and developers
Experience architecting, migrating, transforming or modernizing customer requirements to the cloud
Experience with AWS custom silicon (Annapurna/Inferentia/Trainium) or equivalent custom AI accelerator development (runtime drivers, profiling infrastructure, pre/post-silicon validation)
Experience with ML framework internals (PyTorch, TensorFlow, JAX) and their execution pipelines on custom hardware
Knowledge of inference optimization techniques: model quantization, batching strategies, token efficiency, pipeline decomposition, and silicon-model matching
Experience advising customers on GPU-to-Trainium migration paths or multi-accelerator training/inference architectures
Knowledge of capacity planning, instance right-sizing, and cost optimization for GPU-heavy workloads across regions and availability zones
Experience partnering with hardware vendor field teams (NVIDIA, AMD) on optimization exercises
Ability to communicate complex silicon and infrastructure tradeoffs to both deeply technical engineers and C-level executives
Experience with SageMaker HyperPod, Bedrock Mantle, or equivalent managed AI compute platforms

Responsibilities

Build and maintain technical trusted advisor relationships with influential technical decision-makers to drive successful adoption and deployment of AWS services, with particular focus on enterprise-grade AI/ML architectures, Generative AI solutions, and agentic systems.
Architect scalable, secure, and cost-effective solutions leveraging AWS's comprehensive AI stack, from traditional ML services to leading Generative AI offerings. Work closely with customers to understand their business needs and design solutions that optimize both performance and cost while ensuring robust governance and responsible AI practices.
Serve as a thought leader in the AI/ML space by developing compelling technical content and practical implementations showcasing modern AI architectures. Create reference architectures, workshops, and demos that highlight integration patterns for LLMs, RAG systems, autonomous agents, and GenAIOps best practices. Share insights through AWS Blogs, public speaking events, and technical communities.
Build and nurture an internal AWS community of AI/ML experts, focusing on knowledge-sharing across traditional ML, Generative AI, and Agentic AI domains. Establish best practices for emerging technologies and create enablement materials for the broader AWS technical community.
Collaborate across AWS teams to accelerate customer success with AI/ML implementations. Work with business development, professional services, and support teams to ensure effective adoption of AWS AI services, from proof-of-concept to production deployment.
Act as a technical liaison between customers and AWS engineering teams, ensuring successful implementation of AI solutions while maintaining alignment with AWS's well-architected framework and AI best practices.

Benefits

health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage)
401(k) matching
paid time off
parental leave
sign-on payments
restricted stock units (RSUs)

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume