Physical AI

Centific

1d•$90,000•Hybrid

About The Position

Centific is seeking passionate AI Research Engineers to join their cutting-edge labs. This role involves translating cutting-edge research into production systems that perceive, reason, and act in the real world. The mission is to build state-of-the-art Vision AI across 2D/3D perception, egocentric/360° understanding, and multimodal reasoning. As an AI Research Engineer, you will own high-leverage experiments from paper to deployable modules in the platform. You could be part of the Computer Vision team, diving into 3D reconstruction, scene understanding, and visual AI, or the Physical AI Robotics team, working at the intersection of simulation, robotics, and AI.

Requirements

Masters/Ph.D in CS/EE/Robotics (or related), actively publishing in CV/ML/Robotics (e.g., CVPR/ICCV/ECCV, NeurIPS/ICML/ICLR, CoRL/RSS).
Strong PyTorch (or JAX) and Python; comfort with CUDA profiling and mixed precision training.
Demonstrated research in computer vision and at least one of: VLMs (e.g., LLaVA style, video-language models), embodied/physical AI, 3D perception.
Proven ability to move from paper → code → ablation → result with rigorous experiment tracking.

Nice To Haves

Experience with video models (e.g., TimeSFormer/MViT/VideoMAE), diffusion or 3D GS/NeRF pipelines, or SLAM/scene reconstruction.
Prior work on multimodal grounding (referring expressions, spatial language, affordances) or temporal reasoning.
Familiarity with ROS2, DeepStream/TAO, or edge inference optimizations (TensorRT, ONNX).
Scalable training: Ray, distributed data loaders, sharded checkpoints.
Strong software craft: testing, linting, profiling, containers, and reproducibility.
Public code artifacts (GitHub) and first-author publications or strong open source impact.

Responsibilities

Build and fine-tune models for detection, tracking, segmentation (2D/3D), pose & activity recognition, and scene understanding (incl. 360° and multi-view).
Train/evaluate vision–language models (VLMs) for grounding, dense captioning, temporal QA, and tooluse; design retrieval-augmented and agentic loops for perception-action tasks.
Prototype perception-in-the-loop policies that close the gap from pixels to actions (simulation + real data).
Integrate with planners and task graphs for manipulation, navigation, or safety workflows.
Curate datasets, author high-signal evaluation protocols/KPIs, and run ablations that make results reproducible.
Package research into reliable services on a modern stack (Kubernetes, Docker, Ray, FastAPI), with profiling, telemetry, and CI for reproducible science.
Orchestrate multi-agent pipelines (e.g., LangGraph-style graphs) that combine perception, reasoning, simulation, and code generation to self-check and self-correct.