Cross-modal Human-robot Interaction (2nd)



When: Oct, 24, 2022 Where: Virtual (ECCV 2022 Workshop)

Workshop Information

The 1st workshop Human Interaction for Robotic Navigation is avaliable Here.
A long-term goal of AI research is to build intelligent agents that can see the rich visual environment around us, interact with humans in multiple modalities, and act in a physical or embodied environment. As one of the most promising directions, cross-modal human-robot interaction has increasingly attracted attention from both academic and industry fields. The community has developed numerous methods to address the problems in cross-modal human-robot interaction. Visual recognition methods like detection and segmentation enable the robot to understand the semantics in an environment. Large-scale pretraining methods and cross-modal representation learning aim at effective cross-modal alignment. Reinforcement learning methods are applied to learn human-robotic interaction policy. Moreover, the community requires the agent to have other abilities such as life-long/incremental learning or active learning, which broadens the application of real-world human-robot interaction.
Many research works have been devoted to related topics, leading to rapid growth of related publications in the top-tier conferences and journals such as CVPR, ICCV, ECCV, NeurIPS, ACL, EMNLP, T-PAMI, etc. We believe this workshop will be a very successful one and it will indeed benefit the progress of human-robot interaction significantly.
Our workshop are expected to give a promising direction of cross-modal human-robot interaction, which will cover but not limit to the following topics:
  • Large-scale cross-modal pretraining, cross-modal representation learning, cross-modal reasoning.
  • Vision-language grounding, visual question answering, visual dialogue, visual commonsense reasoning, vision-language navigation, vision-dialog navigation.
  • Reinforcement learning, policy exploration for making decisions about cross-modal interaction.
  • Self-supervised learning, life-long/incremental learning, active learning.
  • Real world cross-modal interaction applications involving humans, e.g. smart assistant, indoor robots, auto-driving, medical diagnosis etc.
  • New benchmarks that evaluate the benefit of multi-modal reasoning and interaction approaches in specific scenarios.