Research Scientist- Vision- Language- Action (VLA) Models

Bosch Group•Sunnyvale, CA

1d•Onsite

About The Position

The Bosch Research and Technology Center North America (RTC-NA) is part of the global Bosch Group and focuses on advanced technologies in areas like artificial intelligence, energy, internet technologies, and semiconductors. Our AI research in Silicon Valley is at the forefront of Foundation Models, Big Data Visual Analytics, Explainable AI (XAI), Natural Language Processing, Computer Vision & Mixed Reality, Cloud Robotics, Data Science, AI System Engineering, and Time-series Analysis. We develop AIoT solutions for various Bosch applications including automated driving, robotics, smart manufacturing, and smart home solutions. The Intelligent Autonomous Systems group specifically drives innovation in automated driving, ADAS, robotics, and automation through advancements in system architecture and AI components, including motion planning, task planning, and decision-making systems. We collaborate with internal business units and external academic and industry partners, publishing our findings in top-tier conferences and journals.

Requirements

Ph.D. in Computer Science, Robotics or a related discipline or Master's degree with >= 1/3 years industry experience after graduation.
A minimum of 3 years of R&D experience, or an equivalent graduate research background, primarily in AI technologies including Computer Vision and Robotic or Automotive Motion and Behavioral Planning.
Proficiency in one or more programming languages commonly used in machine learning (e.g., Python, C++, Rust).
Strong interpersonal, communication, and teamwork capabilities.
Knowledge of major machine learning frameworks like TensorFlow or PyTorch.
Hands-on experience in reinforcement learning for behavior or motion planning or other applicable contexts and familiarity with common RL techniques (e.g. PPO, DQN, DDPG).
A strong portfolio of publications in premier machine learning, deep learning, robotics and computer vision journals and conferences.

Nice To Haves

Experience with real-world product development and deployment of autonomous systems.
Hands-on experience building and applying multimodal transformer-based sequence-to-sequence models, especially multimodal vision-language-action models.
Hands-on experience in computer vision and deep learning, with work in any of the following areas: multimodal transformers, multimodal language models, diffusion models, NeRF, gaussian splatting, object detection / segmentation, 3D scene understanding, sensor calibration, SfM, voxel/BEV grid-based feature representation.

Responsibilities

Conduct research and engineering in core AI and machine learning fields to enable Embodied AI (including computer vision, autonomous planning, open-world learning, and so on) for related business domains of ADAS/AD, industrial automation, robotics etc.
Push the boundaries in (modular) end-to-end perception and planning for ADAS/AD, incorporating advancements in large vision-language-(action) models to aid reasoning capabilities and explainability.
Collaborate cross-functionally with global research and engineering teams to ensure seamless technology transfer and system integration.
Implement research results to solve real-world challenges, ensuring high-quality system integration within Bosch's existing platforms.
Stay at the forefront of innovation by actively engaging with academic and industry communities through conferences, workshops, and technical events.
Document and disseminate research findings through high-caliber publications and/or patent submissions.