TY - JOUR
AU - Xiao, Xia
AB - Recently, cross-media web video event mining based on heterogeneous information networks (HIN) has attracted extensive attention. However, each web video is described by only a few words, resulting in sparse semantic associations between textual and visual information. In this case, it is difficult to seek out videos belonging to the same event, which brings great challenges to web video event mining. Nevertheless, nodes with similar structural features in network tend to be highly correlated. It can be found that structural information could provide complementary clues for correlation learning, which has been widely ignored in previous studies. Thus, we propose a novel cross-media correlation learning method with integrated text semantics and network structural information for web video event mining. Firstly, a multi-modal HIN is constructed to describe the interactions among videos, near-duplicate keyframes (NDKs) and terms. Then, hidden semantic associations between nodes are learned by designing semantic paths for enriching the sparse text distribution information. Next, the first-order proximity and second-order proximity of each pair of nodes are fused to obtain structural correlations in network, which reflects local and global structural proximity between nodes. Finally, a semantic and structural association fusion model based on network embedding is proposed to learn distinguishable low-dimensional representation vectors for event mining. Experiments on web videos from YouTube show the superior performance of our proposed method compared with several state-of-the-art models, with an average F1 score improved by 16%\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$16\%$$\end{document} to 57%\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$57\%$$\end{document}.
TI - Cross-media correlation learning for web video event mining with integrated text semantics and network structural information
JO - Neural Computing and Applications
DO - 10.1007/s00521-023-08323-4
DA - 2023-06-01
UR - https://www.deepdyve.com/lp/springer-journals/cross-media-correlation-learning-for-web-video-event-mining-with-SheKudndNv
SP - 11815
EP - 11831
VL - 35
IS - 16
DP - DeepDyve
ER -