Learning representations on graphs

Learning representations on graphs National Science Review 5: 21, 2018 RESEARCH HIGHLIGHT doi: 10.1093/nsr/nwx147 Advance access publication 12 January 2018 COMPUTER SCIENCE Special Topic: Machine Learning Jun Zhu (0) Networks are everywhere. Popular exam- With the feature vectors μ,wecan (0) (1) 5 ples include social networks, the hyper- (0) define a model for the prediction task. linked World Wide Web, transportation For example, if we want to classify each networks, electricity power networks and (0) node, each single vector can be used as biological gene networks. Networks are the input to a classifier. If the goal is for typically represented as a graph whose link prediction, we can use a pair of vec- vertices represent entities and edges rep- tors μ and μ to define a probabilis- i j (0) resent links or relationships between tic model for the link E to present, and X X 3 ij 2 3 these entities. As the pervasiveness and if the prediction is for the whole net- scope of network data increase, there has work, we can aggregate all vectors into Figure 1. Learning representations on graphs. been significant interest in developing a single vector and fit it into a classifier. statistical models to learn from networks Then, we can optimize the objective to passing algorithms (e.g. mean-field and for prediction or reasoning tasks. find optimal feature vectors, similar in belief propagation) on the graph, which The early work has been focused on spirit to an expectation–maximization al- can leverage the compact dependence designing good proximity (or similarity) gorithm that alternatively infers the un- structure. Structure2Vec draws ideas measures between nodes, using features known vectors and updates the parame- from message-passing algorithms. In- related to certain topological properties ters. This framework can be generalized stead of inferring probabilities, it learns of a graph, such as common neighbors, to deal with dynamic networks for tem- a feature vector μ ∈ R for each node, Jaccard’s coefficient, Adamic/Adar and poral reasoning [4], when temporal in- following a message-passing protocol Katz (see [1] for example). Inspired by formation is important. with a local update rule. The algorithm is the substantial success of deep learning, iterative, starting with some initial values learning a good representation from net- of μ. At iteration t, it updates the feature works has attracted increasing attention, Jun Zhu vector for each node i by using the local though this was not the first attempt to Department of Computer Science and Technology, ‘message’ from neighbors: learn latent features of networks (see, for Tsinghua University, China, example, the previous attempts at using E-mail: dcszj@mail.tsinghua.edu.cn (t) (t −1) μ ← f W, X , μ , ∀i i j Bayesian nonparametric techniques [2]). j ∈N One recent approach is Struc- (1) REFERENCES ture2Vec [3], which embeds the entities (i.e. nodes) into a Euclidean space by 1. Liben-Nowell D and Kleinberg J. In: ACM Confer- where f is a function mapping that can be conjoining ideas from probabilistic ence of Information and Knowledge Management, defined by a deep neural network and W graphical models and deep learning. The New Orleans, USA, 2003, 556–9. are weights. A simple one-layer network basic idea is illustrated in Figure 1, where 2. Zhu J. In: International Conference on Machine can be defined as: each node X is associated with a latent Learning, Edinburgh, Scotland, 2012, 1179–86. variable H . In general, the latent vari- 3. Dai H, Dai B and Song L. In: International Confer- (t) (t −1) μ ← σ W X +W μ , ∀i , 1 i 2 ables H are random and the whole set of i j ence on Machine Learning, New York City, USA, j ∈N H is characterized by a joint distribution 2016, 2702–11. p(H|X). To answer queries like finding 4. Trivedi R, Dai H and Wang Y et al. In: Interna- (2) the marginal distribution of a single or a tional Conference on Machine Learning, Sydney, set of variables, we need to use message- where σ is the sigmoid function. Australia, 2017, 3462–71. The Author(s) 2018. Published by Oxford University Press on behalf of China Science Publishing & Media Ltd. All rights reserved. For permissions, plea se e-mail: journals.permissions@oup.com Downloaded from https://academic.oup.com/nsr/article-abstract/5/1/21/4803968 by Ed 'DeepDyve' Gillespie user on 16 March 2018 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png National Science Review Oxford University Press

Learning representations on graphs

Free
1 page

Loading next page...
1 Page
 
/lp/ou_press/learning-representations-on-graphs-wuUatBSK9w
Publisher
Oxford University Press
Copyright
© The Author(s) 2018. Published by Oxford University Press on behalf of China Science Publishing & Media Ltd. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com
ISSN
2095-5138
eISSN
2053-714X
D.O.I.
10.1093/nsr/nwx147
Publisher site
See Article on Publisher Site

Abstract

National Science Review 5: 21, 2018 RESEARCH HIGHLIGHT doi: 10.1093/nsr/nwx147 Advance access publication 12 January 2018 COMPUTER SCIENCE Special Topic: Machine Learning Jun Zhu (0) Networks are everywhere. Popular exam- With the feature vectors μ,wecan (0) (1) 5 ples include social networks, the hyper- (0) define a model for the prediction task. linked World Wide Web, transportation For example, if we want to classify each networks, electricity power networks and (0) node, each single vector can be used as biological gene networks. Networks are the input to a classifier. If the goal is for typically represented as a graph whose link prediction, we can use a pair of vec- vertices represent entities and edges rep- tors μ and μ to define a probabilis- i j (0) resent links or relationships between tic model for the link E to present, and X X 3 ij 2 3 these entities. As the pervasiveness and if the prediction is for the whole net- scope of network data increase, there has work, we can aggregate all vectors into Figure 1. Learning representations on graphs. been significant interest in developing a single vector and fit it into a classifier. statistical models to learn from networks Then, we can optimize the objective to passing algorithms (e.g. mean-field and for prediction or reasoning tasks. find optimal feature vectors, similar in belief propagation) on the graph, which The early work has been focused on spirit to an expectation–maximization al- can leverage the compact dependence designing good proximity (or similarity) gorithm that alternatively infers the un- structure. Structure2Vec draws ideas measures between nodes, using features known vectors and updates the parame- from message-passing algorithms. In- related to certain topological properties ters. This framework can be generalized stead of inferring probabilities, it learns of a graph, such as common neighbors, to deal with dynamic networks for tem- a feature vector μ ∈ R for each node, Jaccard’s coefficient, Adamic/Adar and poral reasoning [4], when temporal in- following a message-passing protocol Katz (see [1] for example). Inspired by formation is important. with a local update rule. The algorithm is the substantial success of deep learning, iterative, starting with some initial values learning a good representation from net- of μ. At iteration t, it updates the feature works has attracted increasing attention, Jun Zhu vector for each node i by using the local though this was not the first attempt to Department of Computer Science and Technology, ‘message’ from neighbors: learn latent features of networks (see, for Tsinghua University, China, example, the previous attempts at using E-mail: dcszj@mail.tsinghua.edu.cn (t) (t −1) μ ← f W, X , μ , ∀i i j Bayesian nonparametric techniques [2]). j ∈N One recent approach is Struc- (1) REFERENCES ture2Vec [3], which embeds the entities (i.e. nodes) into a Euclidean space by 1. Liben-Nowell D and Kleinberg J. In: ACM Confer- where f is a function mapping that can be conjoining ideas from probabilistic ence of Information and Knowledge Management, defined by a deep neural network and W graphical models and deep learning. The New Orleans, USA, 2003, 556–9. are weights. A simple one-layer network basic idea is illustrated in Figure 1, where 2. Zhu J. In: International Conference on Machine can be defined as: each node X is associated with a latent Learning, Edinburgh, Scotland, 2012, 1179–86. variable H . In general, the latent vari- 3. Dai H, Dai B and Song L. In: International Confer- (t) (t −1) μ ← σ W X +W μ , ∀i , 1 i 2 ables H are random and the whole set of i j ence on Machine Learning, New York City, USA, j ∈N H is characterized by a joint distribution 2016, 2702–11. p(H|X). To answer queries like finding 4. Trivedi R, Dai H and Wang Y et al. In: Interna- (2) the marginal distribution of a single or a tional Conference on Machine Learning, Sydney, set of variables, we need to use message- where σ is the sigmoid function. Australia, 2017, 3462–71. The Author(s) 2018. Published by Oxford University Press on behalf of China Science Publishing & Media Ltd. All rights reserved. For permissions, plea se e-mail: journals.permissions@oup.com Downloaded from https://academic.oup.com/nsr/article-abstract/5/1/21/4803968 by Ed 'DeepDyve' Gillespie user on 16 March 2018

Journal

National Science ReviewOxford University Press

Published: Jan 1, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 12 million articles from more than
10,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Unlimited reading

Read as many articles as you need. Full articles with original layout, charts and figures. Read online, from anywhere.

Stay up to date

Keep up with your field with Personalized Recommendations and Follow Journals to get automatic updates.

Organize your research

It’s easy to organize your research with our built-in tools.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

Monthly Plan

  • Read unlimited articles
  • Personalized recommendations
  • No expiration
  • Print 20 pages per month
  • 20% off on PDF purchases
  • Organize your research
  • Get updates on your journals and topic searches

$49/month

Start Free Trial

14-day Free Trial

Best Deal — 39% off

Annual Plan

  • All the features of the Professional Plan, but for 39% off!
  • Billed annually
  • No expiration
  • For the normal price of 10 articles elsewhere, you get one full year of unlimited access to articles.

$588

$360/year

billed annually
Start Free Trial

14-day Free Trial