An intern is required for the project titled "Bi-VLA: Vision-Language-Action Model-Based System for Bimanual Robotic Dexterous Manipulations"
Project title: Bi-VLA: Vision-Language-Action Model-Based System for Bimanual Robotic Dexterous Manipulations
Description: The research introduces the Bi-VLA (Vision-Language-Action) model, a novel system designed for bimanual robotic dexterous manipulations that seamlessly integrates vision, language understanding, and physical action. Starling-LM-7B-alpha, an open-source large language model trained by Reinforcement Learning from AI Feedback (RLAIF) and Qwen-VL VLM by Alibaba are embedded in the proposed architecture. Bi-VLA demonstrates the ability to interpret complex human instructions, perceive and understand the visual context of ingredients, and execute precise bimanual actions to assemble the requested salad. You will collect the datasets and study the VLM/LLM architecture for bimanual collaborative robots. Through a series of experiments, Intern will evaluate the system’s performance in terms of accuracy, efficiency, and adaptability to various salad recipes and human preferences.
Candidates requirements: ➔ basic programming skills in Python or ROS / Gazebo
Supervisor: Associate Professor Dzmitry Tsetserukou
We are more than happy to meet visitors Monday to Friday from 9:00 to 18:00. Please arrange a visit 48 hours in advance by contacting admissions@skoltech.ru