Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Learning representations on graphs

Learning representations on graphs Downloaded from https://academic.oup.com/nsr/article/5/1/21/4803968 by DeepDyve user on 15 July 2022 National Science Review 5: 21, 2018 RESEARCH HIGHLIGHT doi: 10.1093/nsr/nwx147 Advance access publication 12 January 2018 COMPUTER SCIENCE Special Topic: Machine Learning Jun Zhu (0) Networks are everywhere. Popular exam- With the feature vectors μ,wecan (0) (1) 5 ples include social networks, the hyper- (0) define a model for the prediction task. linked World Wide Web, transportation For example, if we want to classify each networks, electricity power networks and (0) node, each single vector can be used as biological gene networks. Networks are the input to a classifier. If the goal is for typically represented as a graph whose link prediction, we can use a pair of vec- vertices represent entities and edges rep- tors μ and μ to define a probabilis- i j (0) resent links or relationships between tic model for the link E to present, and X X 3 ij 2 3 these entities. As the pervasiveness and if the prediction is for the whole net- scope of network data increase, there has work, we can aggregate all vectors into Figure 1. Learning representations on graphs. been significant interest in developing a single vector and fit it into a classifier. statistical models to learn from networks Then, we can optimize the objective to passing algorithms (e.g. mean-field and for prediction or reasoning tasks. find optimal feature vectors, similar in belief propagation) on the graph, which The early work has been focused on spirit to an expectation–maximization al- can leverage the compact dependence designing good proximity (or similarity) gorithm that alternatively infers the un- structure. Structure2Vec draws ideas measures between nodes, using features known vectors and updates the parame- from message-passing algorithms. In- related to certain topological properties ters. This framework can be generalized stead of inferring probabilities, it learns of a graph, such as common neighbors, to deal with dynamic networks for tem- a feature vector μ ∈ R for each node, Jaccard’s coefficient, Adamic/Adar and poral reasoning [4], when temporal in- following a message-passing protocol Katz (see [1] for example). Inspired by formation is important. with a local update rule. The algorithm is the substantial success of deep learning, iterative, starting with some initial values learning a good representation from net- of μ. At iteration t, it updates the feature works has attracted increasing attention, Jun Zhu vector for each node i by using the local though this was not the first attempt to Department of Computer Science and Technology, ‘message’ from neighbors: learn latent features of networks (see, for Tsinghua University, China, example, the previous attempts at using E-mail: dcszj@mail.tsinghua.edu.cn (t) (t −1) μ ← f W, X , μ , ∀i i j Bayesian nonparametric techniques [2]). j ∈N One recent approach is Struc- (1) REFERENCES ture2Vec [3], which embeds the entities (i.e. nodes) into a Euclidean space by 1. Liben-Nowell D and Kleinberg J. In: ACM Confer- where f is a function mapping that can be conjoining ideas from probabilistic ence of Information and Knowledge Management, defined by a deep neural network and W graphical models and deep learning. The New Orleans, USA, 2003, 556–9. are weights. A simple one-layer network basic idea is illustrated in Figure 1, where 2. Zhu J. In: International Conference on Machine can be defined as: each node X is associated with a latent Learning, Edinburgh, Scotland, 2012, 1179–86. variable H . In general, the latent vari- 3. Dai H, Dai B and Song L. In: International Confer- (t) (t −1) μ ← σ W X +W μ , ∀i , 1 i 2 ables H are random and the whole set of i j ence on Machine Learning, New York City, USA, j ∈N H is characterized by a joint distribution 2016, 2702–11. p(H|X). To answer queries like finding 4. Trivedi R, Dai H and Wang Y et al. In: Interna- (2) the marginal distribution of a single or a tional Conference on Machine Learning, Sydney, set of variables, we need to use message- where σ is the sigmoid function. Australia, 2017, 3462–71. The Author(s) 2018. Published by Oxford University Press on behalf of China Science Publishing & Media Ltd. All rights reserved. For permissions, plea se e-mail: journals.permissions@oup.com http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png National Science Review Oxford University Press

Learning representations on graphs

National Science Review , Volume 5 (1) – Jan 1, 2018

Loading next page...
 
/lp/ou_press/learning-representations-on-graphs-wuUatBSK9w

References (2)

Publisher
Oxford University Press
Copyright
Copyright © 2022 China Science Publishing & Media Ltd. (Science Press)
ISSN
2095-5138
eISSN
2053-714X
DOI
10.1093/nsr/nwx147
Publisher site
See Article on Publisher Site

Abstract

Downloaded from https://academic.oup.com/nsr/article/5/1/21/4803968 by DeepDyve user on 15 July 2022 National Science Review 5: 21, 2018 RESEARCH HIGHLIGHT doi: 10.1093/nsr/nwx147 Advance access publication 12 January 2018 COMPUTER SCIENCE Special Topic: Machine Learning Jun Zhu (0) Networks are everywhere. Popular exam- With the feature vectors μ,wecan (0) (1) 5 ples include social networks, the hyper- (0) define a model for the prediction task. linked World Wide Web, transportation For example, if we want to classify each networks, electricity power networks and (0) node, each single vector can be used as biological gene networks. Networks are the input to a classifier. If the goal is for typically represented as a graph whose link prediction, we can use a pair of vec- vertices represent entities and edges rep- tors μ and μ to define a probabilis- i j (0) resent links or relationships between tic model for the link E to present, and X X 3 ij 2 3 these entities. As the pervasiveness and if the prediction is for the whole net- scope of network data increase, there has work, we can aggregate all vectors into Figure 1. Learning representations on graphs. been significant interest in developing a single vector and fit it into a classifier. statistical models to learn from networks Then, we can optimize the objective to passing algorithms (e.g. mean-field and for prediction or reasoning tasks. find optimal feature vectors, similar in belief propagation) on the graph, which The early work has been focused on spirit to an expectation–maximization al- can leverage the compact dependence designing good proximity (or similarity) gorithm that alternatively infers the un- structure. Structure2Vec draws ideas measures between nodes, using features known vectors and updates the parame- from message-passing algorithms. In- related to certain topological properties ters. This framework can be generalized stead of inferring probabilities, it learns of a graph, such as common neighbors, to deal with dynamic networks for tem- a feature vector μ ∈ R for each node, Jaccard’s coefficient, Adamic/Adar and poral reasoning [4], when temporal in- following a message-passing protocol Katz (see [1] for example). Inspired by formation is important. with a local update rule. The algorithm is the substantial success of deep learning, iterative, starting with some initial values learning a good representation from net- of μ. At iteration t, it updates the feature works has attracted increasing attention, Jun Zhu vector for each node i by using the local though this was not the first attempt to Department of Computer Science and Technology, ‘message’ from neighbors: learn latent features of networks (see, for Tsinghua University, China, example, the previous attempts at using E-mail: dcszj@mail.tsinghua.edu.cn (t) (t −1) μ ← f W, X , μ , ∀i i j Bayesian nonparametric techniques [2]). j ∈N One recent approach is Struc- (1) REFERENCES ture2Vec [3], which embeds the entities (i.e. nodes) into a Euclidean space by 1. Liben-Nowell D and Kleinberg J. In: ACM Confer- where f is a function mapping that can be conjoining ideas from probabilistic ence of Information and Knowledge Management, defined by a deep neural network and W graphical models and deep learning. The New Orleans, USA, 2003, 556–9. are weights. A simple one-layer network basic idea is illustrated in Figure 1, where 2. Zhu J. In: International Conference on Machine can be defined as: each node X is associated with a latent Learning, Edinburgh, Scotland, 2012, 1179–86. variable H . In general, the latent vari- 3. Dai H, Dai B and Song L. In: International Confer- (t) (t −1) μ ← σ W X +W μ , ∀i , 1 i 2 ables H are random and the whole set of i j ence on Machine Learning, New York City, USA, j ∈N H is characterized by a joint distribution 2016, 2702–11. p(H|X). To answer queries like finding 4. Trivedi R, Dai H and Wang Y et al. In: Interna- (2) the marginal distribution of a single or a tional Conference on Machine Learning, Sydney, set of variables, we need to use message- where σ is the sigmoid function. Australia, 2017, 3462–71. The Author(s) 2018. Published by Oxford University Press on behalf of China Science Publishing & Media Ltd. All rights reserved. For permissions, plea se e-mail: journals.permissions@oup.com

Journal

National Science ReviewOxford University Press

Published: Jan 1, 2018

There are no references for this article.