Discovery

共同研究先：Al-Azhar UniversityAcademic 共同研究数 2

Article　2020　IEEE : Institute of Electrical and Electronics Engineers

Video Alignment Using Bi-Directional Attention Flow in a Multi-Stage Learning Model（Last author）

多段階学習モデルにおける双方向注意フローを用いたビデオアライメント

Reham Abobeah, Amin Shoukry, Jiro Katto
IEEE Access
【抄録】Recently, deep learning techniques have contributed to solving a multitude of computer vision tasks. In this paper, we propose a deep-learning approach for video alignment, which involves finding the best correspondences between two overlapping videos. We formulate the video alignment task as a variant of the well-known machine comprehension (MC) task in natural language processing. While MC answers a question about a given paragraph, our technique determines the most relevant frame sequence in the context video to the query video. This is done by representing the individual frames of the two videos by highly discriminative and compact descriptors. Next, the descriptors are fed into a multi-stage network that is able, with the help of the bidirectional attention flow mechanism, to represent the context video at various granularity levels besides estimating the query-aware context part. The proposed model was trained on 10k video-pairs collected from 'YouTube'. The obtained results show that our model outperforms all known state of the art techniques by a considerable margin, confirming its efficacy. © 2020 IEEE.
【抄録日本語訳】近年、深層学習技術は、多くのコンピュータビジョンタスクの解決に貢献している。本論文では、重なり合う2つの映像の最適な対応関係を見つける映像アライメントのためのディープラーニングアプローチを提案する。我々は、自然言語処理でよく知られている機械理解（MC）タスクの変形として、ビデオアライメントタスクを定式化する。MCが与えられた段落に関する質問に答えるのに対し、本手法は問い合わせ映像に最も関連するフレーム列を文脈映像から決定する。これは、2つの映像の個々のフレームを高度に識別可能でコンパクトな記述子で表現することで実現される。このネットワークは双方向のアテンションフローメカニズムの助けを借りて、クエリを考慮した文脈部分を推定する以外に、様々な粒度レベルで文脈ビデオを表現することができる。提案モデルを'YouTube'から収集した1万件のビデオペアに対して学習させた。その結果、提案モデルは既知の最先端技術を大幅に上回る性能を示し、その有効性を確認することができた。© 2020 IEEE.

Conference Paper　2019 1 1　

Bi-directional attention flow for video alignment（Last author）

ビデオアライメントのための双方向アテンションフロー

Reham Abobeah, Marwan Torki, Amin Shoukry, Jiro Katto
【抄録】In this paper, a novel technique is introduced to address the video alignment task which is one of the hot topics in computer vision. Specifically, we aim at finding the best possible correspondences between two overlapping videos without the restrictions imposed by previous techniques. The novelty of this work is that the video alignment problem is solved by drawing an analogy between it and the machine comprehension (MC) task in natural language processing (NLP). Simply, MC seeks to give the best answer to a question about a given paragraph. In our work, one of the two videos is considered as a query, while the other as a context. First, a pre-trained CNN is used to obtain high-level features from the frames of both the query and context videos. Then, the bidirectional attention flow mechanism; that has achieved considerable success in MC; is used to compute the query-context interactions in order to find the best mapping between the two input videos. The proposed model has been trained using 10k of collected video pairs from”YouTube”. The initial experimental results show that it is a promising solution for the video alignment task when compared to the state of the art techniques. Copyright © 2019 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
【抄録日本語訳】本論文では、コンピュータビジョンのホットトピックの1つであるビデオアライメントタスクに対処するための新しい技術を紹介する。具体的には、従来の手法の制約を受けることなく、重なり合う2つの映像の最適な対応関係を見出すことを目的とする。本研究の新規性は、ビデオアライメント問題を自然言語処理（NLP）における機械理解（MC）タスクとアナロジーすることで解決している点にある。MCは与えられた段落に関する質問に対して最適な回答を与えることを目指すものである。我々の研究では、2つの映像の一方をクエリとし、他方をコンテキストとする。まず、事前に学習したCNNを用いて、クエリとコンテキストの両映像のフレームから高レベルの特徴を取得する。次に、MCで大きな成功を収めた双方向アテンションフロー機構を用いて、クエリとコンテキストの相互作用を計算し、2つの入力映像の間の最適なマッピングを見つける。提案モデルは、"YouTube "から収集した1万件のビデオペアを用いて学習させた。本論文では、提案するモデルを、"YouTube "から収集した1万件のビデオペアを用いて学習させ、ビデオアライメントタスクのための有望な解決策であることを示す。Copyright © 2019 by SCITEPRESS - Science and Technology Publications, Lda. 無断転載を禁じます