Field Solution Architect, AI Infrastructure, Google Cloud

GoogleNew York, NY
4d$183,000 - $265,000

About The Position

The Google Cloud Consulting Professional Services team guides customers through the moments that matter most in their cloud journey to help businesses thrive. We help customers transform and evolve their business through the use of Google’s global network, web-scale data centers, and software infrastructure. As part of an innovative team in this rapidly growing business, you will help shape the future of businesses of all sizes and use technology to connect with customers, employees, and partners. As a Field Solution Architect, your experience and thought leadership will support Google Cloud sales teams to incubate, pilot, and deploy Google Cloud’s industry leading AI/ML accelerators (TPU/GPU) at AI innovators, large enterprises, and early stage AI startups. You will help customers innovate faster with solutions using Google Cloud’s flexible and open infrastructure. In this role, you will identify and assess AI opportunities that would benefit from AI optimized infrastructure. You will help customers leverage accelerators within their overall cloud strategy by helping run benchmarks for existing models, finding opportunities to use accelerators for new models, developing migration paths, and helping to analyze cost to performance. Along the way, you would work closely with internal Cloud AI teams to remove roadblocks and shape the future of our offerings Google Cloud accelerates every organization’s ability to digitally transform its business and industry. We deliver enterprise-grade solutions that leverage Google’s cutting-edge technology, and tools that help developers build more sustainably. Customers in more than 200 countries and territories turn to Google Cloud as their trusted partner to enable growth and solve their most critical business problems.

Requirements

  • Bachelor's degree in Computer Science, Mathematics, a related technical field, or equivalent practical experience.
  • 7 years of experience with cloud infrastructure (e.g., hardware shapes, sizes, auto-scaling, auto-provisioning), and experience with infrastructure as a service, platform as a service, and software as a service.
  • Experience coding in Python, bash scripting, and using OSS frameworks (e.g., TensorFlow, PyTorch, Jax).
  • Experience with distributed training and optimizing performance versus costs (e.g., PyTorch FSDP/DeepSpeed, JAX/pjit, bfloat16 mixed-precision, or MLPerf benchmarking).
  • Experience with orchestrators (e.g., Slurm, Kubernetes).
  • Experience building and operationalizing machine learning models.

Nice To Haves

  • Experience training and fine tuning large models (i.e., image, language, segmentation, recommendation, genomics) with accelerators.
  • Experience with containerization, K8s, Kubernetes on cloud.
  • Experience with running MLPerf benchmarks.
  • Experience with performance profiling tools (i.e., Tensorflow profiler, PyTorch profiler, Tensorboard).
  • Experience in designing and architecting large-scale AI compute clusters.
  • Ability to debug distributed training/inferencing code running.

Responsibilities

  • Serve as a trusted advisor to top customers, helping them incorporate artificial intelligence (AI) accelerators into cloud strategies by designing training and inferencing platforms.
  • Demonstrate Google Cloud differentiation through Proofs of Concept, feature demonstrations, model performance optimization, profiling, and benchmarking.
  • Collaborate seamlessly with the Google Compute Engine AI Infrastructure Dedicated Engineering Team to co-develop code artifacts, best practice documentation, and scalable machine learning (ML) solutions.
  • Influence Google Cloud infrastructure strategy by advocating for enterprise requirements and building repeatable assets to enable internal teams and customers.
  • Travel to customer sites and industry events as needed to provide direct support and represent Google Cloud AI solutions.

Benefits

  • Health, dental, vision, life, disability insurance
  • Retirement Benefits: 401(k) with company match
  • Paid Time Off: 20 days of vacation per year, accruing at a rate of 6.15 hours per pay period for the first five years of employment
  • Sick Time: 40 hours/year (statutory, where applicable); 5 days/event (discretionary)
  • Maternity Leave (Short-Term Disability + Baby Bonding): 28-30 weeks
  • Baby Bonding Leave: 18 weeks
  • Holidays: 13 paid days per year

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service