European Conference on Computer Vision (ECCV 2018)
In this work, we propose a technique to tackle action detection in RGB-D videos under a challenging condition in which we have limited labeled data and partially observed training modalities. Common methods such as transfer learning do not take advantage of the rich information from extra modalities potentially available in the source domain dataset. On the other hand, previous work on cross-modality learning only focuses on a single domain or task. In this work, we propose a graph distillation method that incorporates rich privileged information from a large multi-modal dataset in the source domain, and shows an improved performance in the target domain where training data is scarce. Leveraging both a large-scale dataset and extra modalities, our method learns a better model on the target domain without needing to have access to these extra modalities during test time. We evaluate our approach on action classification and temporal action detection tasks in RGB-D videos, and show that our model outperforms the state-of-the-art by a large margin on the challenging NTU RGB+D and PKU-MMD benchmarks.
Graph Distillation for Action Detection with Privileged Information
Zelun Luo, Jun-Ting Hsieh, Lu Jiang, Juan Carlos Niebles, Li Fei-Fei