Access the full text.
Sign up today, get DeepDyve free for 14 days.
(2017)
3462–71. C ©TheAuthor(s) 2018
D Liben-Nowell, J Kleinberg (2003)
ACM Conference of Information and Knowledge Management
Downloaded from https://academic.oup.com/nsr/article/5/1/21/4803968 by DeepDyve user on 15 July 2022 National Science Review 5: 21, 2018 RESEARCH HIGHLIGHT doi: 10.1093/nsr/nwx147 Advance access publication 12 January 2018 COMPUTER SCIENCE Special Topic: Machine Learning Jun Zhu (0) Networks are everywhere. Popular exam- With the feature vectors μ,wecan (0) (1) 5 ples include social networks, the hyper- (0) define a model for the prediction task. linked World Wide Web, transportation For example, if we want to classify each networks, electricity power networks and (0) node, each single vector can be used as biological gene networks. Networks are the input to a classifier. If the goal is for typically represented as a graph whose link prediction, we can use a pair of vec- vertices represent entities and edges rep- tors μ and μ to define a probabilis- i j (0) resent links or relationships between tic model for the link E to present, and X X 3 ij 2 3 these entities. As the pervasiveness and if the prediction is for the whole net- scope of network data increase, there has work, we can aggregate all vectors into Figure 1. Learning representations on graphs. been significant interest in developing a single vector and fit it into a classifier. statistical models to learn from networks Then, we can optimize the objective to passing algorithms (e.g. mean-field and for prediction or reasoning tasks. find optimal feature vectors, similar in belief propagation) on the graph, which The early work has been focused on spirit to an expectation–maximization al- can leverage the compact dependence designing good proximity (or similarity) gorithm that alternatively infers the un- structure. Structure2Vec draws ideas measures between nodes, using features known vectors and updates the parame- from message-passing algorithms. In- related to certain topological properties ters. This framework can be generalized stead of inferring probabilities, it learns of a graph, such as common neighbors, to deal with dynamic networks for tem- a feature vector μ ∈ R for each node, Jaccard’s coefficient, Adamic/Adar and poral reasoning [4], when temporal in- following a message-passing protocol Katz (see [1] for example). Inspired by formation is important. with a local update rule. The algorithm is the substantial success of deep learning, iterative, starting with some initial values learning a good representation from net- of μ. At iteration t, it updates the feature works has attracted increasing attention, Jun Zhu vector for each node i by using the local though this was not the first attempt to Department of Computer Science and Technology, ‘message’ from neighbors: learn latent features of networks (see, for Tsinghua University, China, example, the previous attempts at using E-mail: dcszj@mail.tsinghua.edu.cn (t) (t −1) μ ← f W, X , μ , ∀i i j Bayesian nonparametric techniques [2]). j ∈N One recent approach is Struc- (1) REFERENCES ture2Vec [3], which embeds the entities (i.e. nodes) into a Euclidean space by 1. Liben-Nowell D and Kleinberg J. In: ACM Confer- where f is a function mapping that can be conjoining ideas from probabilistic ence of Information and Knowledge Management, defined by a deep neural network and W graphical models and deep learning. The New Orleans, USA, 2003, 556–9. are weights. A simple one-layer network basic idea is illustrated in Figure 1, where 2. Zhu J. In: International Conference on Machine can be defined as: each node X is associated with a latent Learning, Edinburgh, Scotland, 2012, 1179–86. variable H . In general, the latent vari- 3. Dai H, Dai B and Song L. In: International Confer- (t) (t −1) μ ← σ W X +W μ , ∀i , 1 i 2 ables H are random and the whole set of i j ence on Machine Learning, New York City, USA, j ∈N H is characterized by a joint distribution 2016, 2702–11. p(H|X). To answer queries like finding 4. Trivedi R, Dai H and Wang Y et al. In: Interna- (2) the marginal distribution of a single or a tional Conference on Machine Learning, Sydney, set of variables, we need to use message- where σ is the sigmoid function. Australia, 2017, 3462–71. The Author(s) 2018. Published by Oxford University Press on behalf of China Science Publishing & Media Ltd. All rights reserved. For permissions, plea se e-mail: journals.permissions@oup.com
National Science Review – Oxford University Press
Published: Jan 1, 2018
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.