2020-07-02 CVPR2020 V&L论文讨论(3) 笔记

    技术2022-07-16  63

    目录

    [1] Bi-directional Relationship Inferring Network for Referring Image Segmentation[2] A Real-Time Cross-modality Correlation Filtering Method for Referring Expression Comprehension[3] Vision-Dialog Navigation by Exploring Cross-modal Memory[4] VQA with No Questions-Answers Training[5] Referring Image Segmentation via Cross-Modal Progressive Comprehension[6] Local-Global Video-Text Interactions for Temporal Grounding[7] Hypergraph Attention Networks for Multimodal Learning总结

    [1] Bi-directional Relationship Inferring Network for Referring Image Segmentation

    卢湖川老师已有方法:语言->视觉,没有视觉->语言。(->:指导)

    [2] A Real-Time Cross-modality Correlation Filtering Method for Referring Expression Comprehension

    北航刘偲、中山李冠斌现有方法:两阶段(生成proposals、选最优proposal)比较慢将相关滤波引入跨模态领域,用language feature当做kernel,在image feature上做相关滤波,得到响应图(bbox的中心),再回归w和h。像极了SiamRPN,只不过一个branch改成了另一个模态。

    [3] Vision-Dialog Navigation by Exploring Cross-modal Memory

    跨模态记忆问题?导航:只基于对话历史->加入视觉模块

    [4] VQA with No Questions-Answers Training

    不用answer就可以训练。通过问题图,生成问题,生成的问题的答案没有意义。

    [5] Referring Image Segmentation via Cross-Modal Progressive Comprehension

    额,没太听懂。

    [6] Local-Global Video-Text Interactions for Temporal Grounding

    参考链接

    [7] Hypergraph Attention Networks for Multimodal Learning

    参考链接

    总结

    这次结束的超级快,一小时20分钟。

    Processed: 0.149, SQL: 9