Learning representations on graphs

Learning representations on graphs National Science Review 5: 21, 2018 RESEARCH HIGHLIGHT doi: 10.1093/nsr/nwx147 Advance access publication 12 January 2018 COMPUTER SCIENCE Special Topic: Machine Learning Jun Zhu (0) Networks are everywhere. Popular exam- With the feature vectors μ,wecan (0) (1) 5 ples include social networks, the hyper- (0) define a model for the prediction task. linked World Wide Web, transportation For example, if we want to classify each networks, electricity power networks and (0) node, each single vector can be used as biological gene networks. Networks are the input to a classifier. If the goal is for typically represented as a graph whose link prediction, we can use a pair of vec- vertices represent entities and edges rep- tors μ and μ to define a probabilis- i j (0) resent links or relationships between tic model for the link E to present, and X X 3 ij 2 3 these entities. As the pervasiveness and if the prediction is for the whole net- scope of network data increase, there has work, we can aggregate all vectors into Figure 1. Learning representations on graphs. been significant interest in developing a single vector and fit it into a classifier. statistical models to learn from networks Then, we can optimize the objective to passing algorithms (e.g. mean-field and for prediction or reasoning tasks. find optimal feature vectors, similar in belief propagation) on the graph, which The early work has been focused on spirit to an expectation–maximization al- can leverage the compact dependence designing good proximity (or similarity) gorithm that alternatively infers the un- structure. Structure2Vec draws ideas measures between nodes, using features known vectors and updates the parame- from message-passing algorithms. In- related to certain topological properties ters. This framework can be generalized stead of inferring probabilities, it learns of a graph, such as common neighbors, to deal with dynamic networks for tem- a feature vector μ ∈ R for each node, Jaccard’s coefficient, Adamic/Adar and poral reasoning [4], when temporal in- following a message-passing protocol Katz (see [1] for example). Inspired by formation is important. with a local update rule. The algorithm is the substantial success of deep learning, iterative, starting with some initial values learning a good representation from net- of μ. At iteration t, it updates the feature works has attracted increasing attention, Jun Zhu vector for each node i by using the local though this was not the first attempt to Department of Computer Science and Technology, ‘message’ from neighbors: learn latent features of networks (see, for Tsinghua University, China, example, the previous attempts at using E-mail: dcszj@mail.tsinghua.edu.cn (t) (t −1) μ ← f W, X , μ , ∀i i j Bayesian nonparametric techniques [2]). j ∈N One recent approach is Struc- (1) REFERENCES ture2Vec [3], which embeds the entities (i.e. nodes) into a Euclidean space by 1. Liben-Nowell D and Kleinberg J. In: ACM Confer- where f is a function mapping that can be conjoining ideas from probabilistic ence of Information and Knowledge Management, defined by a deep neural network and W graphical models and deep learning. The New Orleans, USA, 2003, 556–9. are weights. A simple one-layer network basic idea is illustrated in Figure 1, where 2. Zhu J. In: International Conference on Machine can be defined as: each node X is associated with a latent Learning, Edinburgh, Scotland, 2012, 1179–86. variable H . In general, the latent vari- 3. Dai H, Dai B and Song L. In: International Confer- (t) (t −1) μ ← σ W X +W μ , ∀i , 1 i 2 ables H are random and the whole set of i j ence on Machine Learning, New York City, USA, j ∈N H is characterized by a joint distribution 2016, 2702–11. p(H|X). To answer queries like finding 4. Trivedi R, Dai H and Wang Y et al. In: Interna- (2) the marginal distribution of a single or a tional Conference on Machine Learning, Sydney, set of variables, we need to use message- where σ is the sigmoid function. Australia, 2017, 3462–71. The Author(s) 2018. Published by Oxford University Press on behalf of China Science Publishing & Media Ltd. All rights reserved. For permissions, plea se e-mail: journals.permissions@oup.com Downloaded from https://academic.oup.com/nsr/article-abstract/5/1/21/4803968 by Ed 'DeepDyve' Gillespie user on 16 March 2018 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png National Science Review Oxford University Press

Learning representations on graphs

Free
1 page

Loading next page...
1 Page
 
/lp/ou_press/learning-representations-on-graphs-wuUatBSK9w
Publisher
Oxford University Press
Copyright
© The Author(s) 2018. Published by Oxford University Press on behalf of China Science Publishing & Media Ltd. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com
ISSN
2095-5138
eISSN
2053-714X
D.O.I.
10.1093/nsr/nwx147
Publisher site
See Article on Publisher Site

Abstract

National Science Review 5: 21, 2018 RESEARCH HIGHLIGHT doi: 10.1093/nsr/nwx147 Advance access publication 12 January 2018 COMPUTER SCIENCE Special Topic: Machine Learning Jun Zhu (0) Networks are everywhere. Popular exam- With the feature vectors μ,wecan (0) (1) 5 ples include social networks, the hyper- (0) define a model for the prediction task. linked World Wide Web, transportation For example, if we want to classify each networks, electricity power networks and (0) node, each single vector can be used as biological gene networks. Networks are the input to a classifier. If the goal is for typically represented as a graph whose link prediction, we can use a pair of vec- vertices represent entities and edges rep- tors μ and μ to define a probabilis- i j (0) resent links or relationships between tic model for the link E to present, and X X 3 ij 2 3 these entities. As the pervasiveness and if the prediction is for the whole net- scope of network data increase, there has work, we can aggregate all vectors into Figure 1. Learning representations on graphs. been significant interest in developing a single vector and fit it into a classifier. statistical models to learn from networks Then, we can optimize the objective to passing algorithms (e.g. mean-field and for prediction or reasoning tasks. find optimal feature vectors, similar in belief propagation) on the graph, which The early work has been focused on spirit to an expectation–maximization al- can leverage the compact dependence designing good proximity (or similarity) gorithm that alternatively infers the un- structure. Structure2Vec draws ideas measures between nodes, using features known vectors and updates the parame- from message-passing algorithms. In- related to certain topological properties ters. This framework can be generalized stead of inferring probabilities, it learns of a graph, such as common neighbors, to deal with dynamic networks for tem- a feature vector μ ∈ R for each node, Jaccard’s coefficient, Adamic/Adar and poral reasoning [4], when temporal in- following a message-passing protocol Katz (see [1] for example). Inspired by formation is important. with a local update rule. The algorithm is the substantial success of deep learning, iterative, starting with some initial values learning a good representation from net- of μ. At iteration t, it updates the feature works has attracted increasing attention, Jun Zhu vector for each node i by using the local though this was not the first attempt to Department of Computer Science and Technology, ‘message’ from neighbors: learn latent features of networks (see, for Tsinghua University, China, example, the previous attempts at using E-mail: dcszj@mail.tsinghua.edu.cn (t) (t −1) μ ← f W, X , μ , ∀i i j Bayesian nonparametric techniques [2]). j ∈N One recent approach is Struc- (1) REFERENCES ture2Vec [3], which embeds the entities (i.e. nodes) into a Euclidean space by 1. Liben-Nowell D and Kleinberg J. In: ACM Confer- where f is a function mapping that can be conjoining ideas from probabilistic ence of Information and Knowledge Management, defined by a deep neural network and W graphical models and deep learning. The New Orleans, USA, 2003, 556–9. are weights. A simple one-layer network basic idea is illustrated in Figure 1, where 2. Zhu J. In: International Conference on Machine can be defined as: each node X is associated with a latent Learning, Edinburgh, Scotland, 2012, 1179–86. variable H . In general, the latent vari- 3. Dai H, Dai B and Song L. In: International Confer- (t) (t −1) μ ← σ W X +W μ , ∀i , 1 i 2 ables H are random and the whole set of i j ence on Machine Learning, New York City, USA, j ∈N H is characterized by a joint distribution 2016, 2702–11. p(H|X). To answer queries like finding 4. Trivedi R, Dai H and Wang Y et al. In: Interna- (2) the marginal distribution of a single or a tional Conference on Machine Learning, Sydney, set of variables, we need to use message- where σ is the sigmoid function. Australia, 2017, 3462–71. The Author(s) 2018. Published by Oxford University Press on behalf of China Science Publishing & Media Ltd. All rights reserved. For permissions, plea se e-mail: journals.permissions@oup.com Downloaded from https://academic.oup.com/nsr/article-abstract/5/1/21/4803968 by Ed 'DeepDyve' Gillespie user on 16 March 2018

Journal

National Science ReviewOxford University Press

Published: Jan 1, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off