Our workshop are expected to give a promising direction of cross-modal human-robot interaction, which will cover but not limit to the following topics:
  • Large-scale cross-modal pretraining, cross-modal representation learning, cross-modal reasoning.
  • Vision-language grounding, visual question answering, visual dialogue, visual commonsense reasoning, vision-language navigation, vision-dialog navigation.
  • Reinforcement learning, policy exploration for making decisions about cross-modal interaction.
  • Self-supervised learning, life-long/incremental learning, active learning.
  • Real world cross-modal interaction applications involving humans, e.g. smart assistant, indoor robots, auto-driving, medical diagnosis etc.
  • New benchmarks that evaluate the benefit of multi-modal reasoning and interaction approaches in specific scenarios.
Submission: CMT submission site
We invite high-quality, original (i.e., not been previously published or accepted for publication in substantially similar form in any peer-reviewed venue including journal, conference or workshop) submissions. Papers are limited to 14 pages, including figures and tables, in the ECCV style with additional pages containing only cited references being allowed. The paper template of the ECCV 2022 main conference should be used. Please refer to the following files for detailed formatting instructions:
Import Dates:
  • Workshop Paper Submission deadline: August, 15, 2022 (11:59PM PDT)
  • Notification of Acceptance: August 26, 2022 (11:59PM PDT)
  • Camera-ready Papers Due: September 1, 2022 (11:59PM PDT)