目录
[1] Bi-directional Relationship Inferring Network for Referring Image Segmentation[2] A Real-Time Cross-modality Correlation Filtering Method for Referring Expression Comprehension[3] Vision-Dialog Navigation by Exploring Cross-modal Memory[4] VQA with No Questions-Answers Training[5] Referring Image Segmentation via Cross-Modal Progressive Comprehension[6] Local-Global Video-Text Interactions for Temporal Grounding[7] Hypergraph Attention Networks for Multimodal Learning总结
[1] Bi-directional Relationship Inferring Network for Referring Image Segmentation
卢湖川老师已有方法:语言->视觉,没有视觉->语言。(->:指导)
[2] A Real-Time Cross-modality Correlation Filtering Method for Referring Expression Comprehension
北航刘偲、中山李冠斌现有方法:两阶段(生成proposals、选最优proposal)比较慢将相关滤波引入跨模态领域,用language feature当做kernel,在image feature上做相关滤波,得到响应图(bbox的中心),再回归w和h。像极了SiamRPN,只不过一个branch改成了另一个模态。
[3] Vision-Dialog Navigation by Exploring Cross-modal Memory
跨模态记忆问题?导航:只基于对话历史->加入视觉模块
[4] VQA with No Questions-Answers Training
不用answer就可以训练。通过问题图,生成问题,生成的问题的答案没有意义。
[5] Referring Image Segmentation via Cross-Modal Progressive Comprehension
额,没太听懂。
[6] Local-Global Video-Text Interactions for Temporal Grounding
参考链接
[7] Hypergraph Attention Networks for Multimodal Learning
参考链接
总结
这次结束的超级快,一小时20分钟。