About The Position

We are building advanced augmented dexterity capabilities for next-generation robotic platforms. As a Senior AI/ML Research Engineer (Computer Vision), you will develop the perception models that let our Embodied-AI system understand the surgical scene. Working within a hierarchical, multimodal stack—where a high-level model interprets sensory observations into structured intent and a low-level policy turns that intent into precise, safe, real-time control—you will focus on the vision layer: designing, training, and evaluating models that extract anatomy, instruments, actions, and surgical context from intraoperative video. You will partner with the broader AI/ML team to define how perception feeds reasoning and control, and you will drive the research-to-deployment path for your models, taking them from offline experimentation to robust, real-time performance in the OR. Working within Intuitive's Future Forward research organization, you will identify, build and finetune the AI/ML models and algorithms that enables us to deliver safe and performant embodied AI systems. This role calls for someone who is equally comfortable getting hands-on with models and data and designing systems that scale.

Requirements

  • MS or PhD in CS, EE, Robotics, or a related field, with 5+ years of applied computer-vision research experience.
  • Strong grasp of modern CV and deep-learning fundamentals: CNNs and vision transformers, segmentation, detection, tracking, and representation/self-supervised learning.
  • Demonstrated work in video understanding, including temporal action segmentation, action/phase recognition, and video segmentation.
  • Hands-on experience with modern video architectures, including video transformers and self-supervised video pretraining.
  • Exposure to vision-action (VA) / vision-language-action (VLA) models and world-model / self-supervised predictive architectures (e.g., JEPA-style models, MAE, DINO) for learning visual representations and dynamics.
  • Experience working with large, messy, real-world video datasets at scale.
  • Strong software and experimentation skills in Python and C++, with proficiency in one or more of PyTorch/TensorFlow/JAX, and the ability to stand up clean, reproducible experiments and run the full loop (data curation, augmentation, loss design, metrics, error analysis).
  • A research-and-prototyping mindset: comfortable working in ambiguity, framing open-ended problems, running rapid experiments, and reading and reproducing recent papers to pull promising techniques into practice.
  • Sound judgment about the path from prototype to product: writing code others can build on, knowing when to optimize versus when to move fast, and thinking ahead about data quality, evaluation, and robustness even at the research stage.
  • Solid foundations in linear algebra, probability, and optimization, enough to reason about and debug model behavior from first principles.
  • Comfort collaborating across a multidisciplinary team (ML, robotics, software, and clinical/domain experts) and communicating tradeoffs and findings clearly.

Nice To Haves

  • Background in healthcare, medical devices, surgical robotics, or other regulated technical domains.
  • Sim-to-real workflows and experience with robotics simulators (e.g., NVIDIA Isaac)
  • Experience with structured, ontology- or taxonomy-based labeling frameworks for fine-grained activity.
  • Multimodal fusion of video with sensor, telemetry, and system-log streams.
  • Designing annotation pipelines, QC processes, and active-learning loops.
  • Real-time / edge inference optimization (e.g., TensorRT, NVIDIA Jetson).
  • Fine-grained interaction and object-relationship modeling.
  • Relevant peer-reviewed publications (CVPR, ICCV, ECCV, NeurIPS, etc.).

Responsibilities

  • Develop temporal models for activity and workflow understanding: event/state recognition and fine-grained temporal action segmentation.
  • Benchmark in-house models against the state of the art and recommend the target perception architecture.
  • Define the perception input/output specification and demonstrate offline feasibility on recorded data.
  • Stand up a continuous-improvement loop (discrepancy flagging, active learning, human-in-the-loop relabeling) and the tooling/UI needed for offline evaluation and the path to real-time use.
  • Partner with annotation and data teams to shape label taxonomies, QC, and the data pipeline that feeds the AI/ML models.
  • Establish the path from offline evaluation on recorded data to real-time integration, including the continuous-improvement (human-in-the-loop) data loop.
  • Partner with AI/ML researchers, robotics, data engineers, and other stakeholders to deliver a perception layer that enables rapid prototyping and learning while working toward a product solution.

Benefits

  • market-competitive compensation packages, inclusive of base pay, incentives, benefits, and equity.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service