Interpretable deep learning for nuclear deformation in heavy ion collisions
Interpretable deep learning for nuclear deformation in heavy ion collisions
Pang, Long-Gang;Zhou, Kai;Wang, Xin-Nian
2019-06-14 00:00:00
1;2 4;5 1;2;3 Long-Gang Pang , Kai Zhou , and Xin-Nian Wang Physics Department, University of California, Berkeley, CA 94720, USA Nuclear Science Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA Key Laboratory of Quark & Lepton Physics (MOE) and Institute of Particle Physics, Central China Normal University, Wuhan 430079, China Frankfurt Institute for Advanced Studies, 60438 Frankfurt am Main, Germany and Institute for Theoretical Physics, Goethe University, 60438 Frankfurt am Main, Germany The structure of heavy nuclei is difficult to disentangle in high-energy heavy-ion collisions. The deep convolution neural network (DCNN) might be helpful in mapping the complex final states of heavy-ion collisions to the nuclear structure in the initial state. Using DCNN for supervised regression, we successfully extracted the magnitude of the nuclear deformation from event-by-event correlation between the momentum anisotropy or elliptic flow (v ) and total number of charged hadrons (dN =d) within a Monte Carlo model. Furthermore, a degeneracy is found in the correla- ch tion between collisions of prolate-prolate and oblate-oblate nuclei. Using the Regression Attention Mask algorithm which is designed to interpret what has been learned by DCNN, we discovered that the correlation in total-overlapped collisions is sensitive to only large nuclear deformation, while the correlation in semi-overlapped collisions is discriminative for all magnitudes of nuclear deformation. The method developed in this study can pave a way for exploration of other aspects of nuclear structure in heavy-ion collisions. I. INTRODUCTION nuclear structure despite of the highly complex and dy- namical nature of the collisions. Mapping between two sets of data is always possible The nuclear structure plays an important role in ex- through deep neural network as long as there is a contin- plaining the experimental data of heavy-ion collisions [1– uous geometric transformation [20]. However, the power 13]. For example, experimental data from collisions of de- of mapping is not yet fully explored in regression tasks formed uranium nuclei [10] are found to favor one model to map high dimensional scientific data to continuously of initial configurations, as described by semi-classical changing control variables. If a brute force mapping us- field of gluons from each nucleons [9, 14–16], whose ini- ing deep learning succeeds to build the connection, it can tial entropy deposition does not have a distinctive linear discover knowledge that may evade observation through dependence on the number of binary nucleon-nucleon col- conventional approaches. Such mapping can be made lisions [17]. The mysterious enhancement of triangle flow more efficient when the connection is already intuitively in ultra central heavy-ion collisions can be partially re- or evidently apparent. This is how a recent research was solved by considering many body quantum effects in the motivated where a deep learning system discovered the nuclear structure [6, 8, 11]. Despite of these empirical ob- surprising connection between human gender and their servations, a quantitative study of nuclear structure from retinal images [21]. high-energy heavy-ion collisions is still difficult because In this study we would like to use deep learning to map of the complexity of the final states. correlations between spectral observables to the initial Nuclear shape deformation is one aspect of the nu- nuclear deformation and explore whether the information clear structure that can have observable influence on on nuclear structure is encoded in the complex output of the hadron spectra and correlation in the final states of heavy-ion collisions using a Monte Carlo model. If the heavy-ion collisions. A well established measurement of connection exists, we will investigate whether the deep the nuclear shape deformation is the low energy Coulomb learning can decode this information from the output of excitation [18, 19]. When deformed nuclei pass through heavy-ion collisions using supervised regression and un- a thin slice of lead (Pb), some of the deformed nuclei are derstand what has been learned by the deep neural net- excited and deflected by the low energy Coulomb inter- work. action. These excited nuclei radiate low-energy gamma rays that can be used to determine the nuclear shape de- formation. The shape deformation of nuclear structure is used as input for the theoretical description of heavy ion II. RESULTS collision [1, 2, 7]. It will be interesting to know whether the output of heavy-ion collisions is sufficient to constrain The nucleon density distribution of deformed nuclei the nuclear shape deformation or other parameters in the can be described by the deformed Woods-Saxon distri- bution. The deformation is controlled by two parame- ters, and (see Eq. [1] in Section IV), as visualized 2 4 in Fig. 1(a). As the deformation parameter changes lgpang.1984@berkeley.edu smoothly from negative to positive values, the shapes of arXiv:1906.06429v1 [nucl-th] 14 Jun 2019 2 (a) nuclear shape deformation (c) final states of heavy ion collisions using different deformed nuclei <latexit sha1_base64="SyHSZJpwOhtQKYgp72vZKxjvzwo=">AAAB7nicbVDLSgNBEOyNrxhfUY9eBoPgKexqQI9BLx4jmAckS5id9CZDZh/M9Aoh5CO8eFDEq9/jzb9xkuxBowUNRVU33V1BqqQh1/1yCmvrG5tbxe3Szu7e/kH58KhlkkwLbIpEJboTcINKxtgkSQo7qUYeBQrbwfh27rcfURuZxA80SdGP+DCWoRScrNTuBUi8X+uXK27VXYD9JV5OKpCj0S9/9gaJyCKMSShuTNdzU/KnXJMUCmelXmYw5WLMh9i1NOYRGn+6OHfGzqwyYGGibcXEFurPiSmPjJlEge2MOI3MqjcX//O6GYXX/lTGaUYYi+WiMFOMEjb/nQ2kRkFqYgkXWtpbmRhxzQXZhEo2BG/15b+kdVH1Lqvefa1Sv8njKMIJnMI5eHAFdbiDBjRBwBie4AVendR5dt6c92VrwclnjuEXnI9v8daPTg==</latexit> spherical oblate prolate highly deformed body-body tip-tip (d) attention maps learned by the deep neural network <latexit sha1_base64="zMCHI6VYC1gj5C2VmYdDyz9HVys=">AAAB7nicbVDLSgNBEOz1GeMr6tHLYBA8hd0o6DHoxWME84BkCbOT3mTI7IOZXiGEfIQXD4p49Xu8+TdOkj1oYkFDUdVNd1eQKmnIdb+dtfWNza3twk5xd2//4LB0dNw0SaYFNkSiEt0OuEElY2yQJIXtVCOPAoWtYHQ381tPqI1M4kcap+hHfBDLUApOVmp1AyTeq/ZKZbfizsFWiZeTMuSo90pf3X4isghjEoob0/HclPwJ1ySFwmmxmxlMuRjxAXYsjXmExp/Mz52yc6v0WZhoWzGxufp7YsIjY8ZRYDsjTkOz7M3E/7xORuGNP5FxmhHGYrEozBSjhM1+Z32pUZAaW8KFlvZWJoZcc0E2oaINwVt+eZU0qxXvsuI9XJVrt3kcBTiFM7gAD66hBvdQhwYIGMEzvMKbkzovzrvzsWhdc/KZE/gD5/MH7s6PTA==</latexit> 2 (b) regression performance of deep neural network Figure 1: (Color-online) Determining nuclear shape deformation using deep learning. (a) The three dimensional nuclear shapes as a function of two parameters 2 [ 0:5; 0:5] and 2 [ 0:2; 0:2]. (b) The regression performance 2 4 of two deep convolution neural networks using the same architectures but different weights learned by setting labels to be , (two left figures) and j j and j j (two right figures). (c) The complex Monte Carlo output for 2 4 2 4 collisions of different deformed nuclei. Deep learning uses these images as training and testing inputs. The x-axis represents the normalized total number of charged particles. The y-axis represents the normalized elliptic anisotropy of final state particles in momentum space. There is a degeneracy in the correlation between prolate and oblate nuclei as the network failed to predict the sign of the nuclear deformation. (d) The Regression Attention Mask helps to discover the most discriminative regions for nuclear deformation. While the “ankle" region (semi-central collisions and large v ) is sensitive to nuclear deformation, the “toe” region (central collisions and small v ) is only sensitive to 2 2 large nuclear deformation j j > 0:17. nuclei change from oblate (pumpkin-like) to prolate (egg- is the correlation between momentum anisotropy and to- like). We expect different patterns both in the initial en- tal number of charged hadrons that is termed as charged ergy density distribution and final state hadron spectra multiplicity in the final state. The horizontal-axis rep- for collisions of different deformed nuclei. resents the ground truth of deformation parameters and the vertical-axis represents the predictions by the deep To determine the nuclear shape deformation, we convolution neural network. We observe that the pre- trained a deep convolution neural network (DCCN) to dicted values by deep learning span a wide range from predict two deformation parameters and from phys- to and to in the left two sub-figures. The 2 4 2 2 4 4 ical observables obtained through theoretical simulations uncertainty range indicates that there is a degeneracy of heavy ion collisions using supervised regression as for the selected physical observable F ( ) F ( ). As 2 2 shown in Fig. 1(b). The physical observable we choose 3 a result, no inverse function F can map the selected not only verify the old findings that fully overlapped col- physical observable to the sign of . Knowing the de- lisions are sensitive to large nuclear deformation, but also generacy, we change our target to predict only the ab- discovers new features in the region of semi-overlapped solute values of the deformation parameters. This way, collisions, which work well to determine nuclear deforma- DCNN successfully extracted j j with small uncertainty tion both small and large. and j j with medium uncertainty as shown in the right 4 The Regression Attention Mask is an important step two sub-figures. The success in predicting the absolute towards the interpretable deep learning for science re- values of the deformation parameters indicates that the search. In the present study, the attention mask reveals nuclear deformation is encoded in the complex output of interesting features that are also physically sound. For heavy ion collisions. The failure in extracting the sign of most central collisions, the attention mask finds the “toe" the parameters, on the other hand, indicates a degener- region to be sensitive to large deformation, which corre- acy in the physical observable of the final state between sponds to fully overlapped tip-tip and body-body colli- prolate and oblate nuclei as discovered by the network. sions. For spherical nuclei with small j j on the other The statistical distributions of momentum anisotropy as hand, spatial eccentricity is strongly correlated with colli- a function of charged multiplicity verifies this degener- sion geometry with a thin “toe". Attention mask suggests acy as shown in Fig. 1(c), where the training image for a large discriminative “ankle" region for all values of j j, = 0:2 is indistinguishable visually from = 0:2. 2 2 because few events have extremely small or large v in We will refer to the region of large multiplicity and small semi-overlapped collisions. momentum anisotropy as the “toe", and the medium mul- Much to our disappointment, deep learning fails to pre- tiplicity and large anisotropy as the “ankle" in the statis- dict the sign of and , indicating a degeneracy in the 2 4 tical distributions of momentum anisotropy as a function physical observable from collisions between prolate and of charged multiplicity in Fig. 1(c). oblate nuclei. Degeneracy can be observed directly in To understand what has been learned by the deep neu- some nuclei, for example in Kr, whose ground-state wave ral network, we use “Regression Attention Mask” to high- function is a quantum superposition of prolate and oblate light the most discriminative regions in the testing images shapes [22]. The degeneracy we discover in the present as shown in Fig. 1(d). We observed that the attention study is with regard to observables in the final state of mask smoothly vary from spherical nuclei to highly de- high-energy heavy-ion collisions. Data from heavy-ion formed nuclei, indicating that the network has learned experiments disfavor initial-state models whose entropy self-consistent features. density deposition is linearly proportional to the num- The most discriminative region is the “toe" at the right ber of binary collisions. As a result, tip-tip and body- bottom corner that corresponds to most central colli- body fully overlapped collisions produce similar numbers sions. The “Regression Attention Mask” discovered this of charged particles and momentum anisotropy fluctua- “toe" region where the attention masks become higher tions for both prolate and oblate nuclei. It becomes clear and wider as j j goes from 0.17 to 0.5. The observation when the 3-dimensional deformed nuclei are projected to is consistent with physical intuition that fully overlapped 2-dimension by the extremely strong Lorentz contraction tip-tip and body-body collisions of highly deformed nu- along the beam direction. The failure in predicting the clei have large momentum anisotropy fluctuations. sign of and using shallow and deep neural network 2 4 The “toe" region discovered by deep learning has long indicates that the model is not over-fitting. been proposed to be sensitive to nuclear deformation. Such a degeneracy discovered by the network should However, results from DCNN show that the “toe” is less not be surprising. If the physical process F maps both sensitive to small deformations when j j < 0:17 where j j and j j to the same final state observable x, it 2 2 the attention mask is very small. The regression mask would be impossible for the network to find the inverse also finds that the “ankle" region for semi-overlapped col- 1 function = F (x). However, deep learning is help- lisions, where dN =dYj 0:5, is sensitive to both ch normed ful to efficiently verify the existence of an inverse func- small and large nuclear deformations. tion for the absolute value of . In the present study, the network helps us to discover the existence of both the degeneracy and the inverse function j j = F (x). III. DISCUSSION The sign of and might be determined using data 2 4 from other experiments such as low energy collisions or electron-ion collisions in which one might be able to study Our results suggest that the nuclear shape deforma- other interesting nuclear structures such as the neutron tion is encoded in the complex outcome of heavy-ion col- skin, the electric charge and weak charge distribution, lisions. Supervised regression in deep learning can de- the pair correlation and the alpha clustering structure. code part of the information from the final state out- come. DCNN can predict the deformation parameter j j Our input images to the deep learning are the statis- to high accuracy. We have designed the Regression At- tical information of engineered features. This is different tention Mask algorithm to locate important regions in the from common computer vision problems where DCNN input image. The attention of the artificial neural net- learns correlations between different patches of the same work vary smoothly as the value of j j increases. It does image. For scientific problems, the statistical informa- 2 4 tion of many input samples from the same category is lisions have high multiplicity and small anisotropic flow used to distinguish one category from another. It is also while the body-body aligned collisions have similar mul- feasible to learn features in each event and use the sta- tiplicity but large anisotropic flow for soft hadrons of low tistical distribution of automatically learned features for transverse momenta p . In this paper, we first train a the classification or regression task. 34-layer deep residual neural network [25] with squeeze- In the present study, we only use complete and semi- excitation blocks [26] to predict the shape deformation parameter of deformed nuclei using regression. Then we overlapped collisions where v increases linearly as the initial state spatial eccentricity increases. For peripheral use the “Regression Attention Mask ” to interpret what has been learned by the deep neural network. collisions where dN =dYj < 0:5, v decreases as ch normed 2 the spatial eccentricity continue to increase. The map- ping function we used to get v from spatial eccentricity A. Collisions of deformed nuclei does not work for peripheral collisions. It is the same reason for not using higher order momentum anisotropy as part of the training input. We use the Trento Monte Carlo model [27] to provide A thorough study may require relativistic hydrody- IP-Glasma-like fluctuating initial conditions of heavy-ion namic simulations of heavy-ion collisions. The 3+1D collisions. The shapes of deformed nuclei are given by the hydrodynamic simulations may provide useful informa- deformed Woods-Saxon distribution, tion that help to quantify the shape parameters, e.g., the event-plane twist along the longitudinal direction due to (r; ; ) = (1) (r R (1+ Y ()+ Y ()))=a 0 2 20 4 40 1 + e forward-backward asymmetry. This asymmetry not only arise in non-central collisions, but also in central (zero im- where is the nucleon density in nucleus, R is the 0 0 pact parameter) tip-body collisions. However, extending Woods-Saxon radius, and are the deformation pa- 2 4 the present work to a full (3+1)D simulation is beyond rameters introduced via an expansion in spherical har- our computational capability now. This might be feasi- 5 2 3 4 p p monics, Y = (3 cos 1), Y = (35 cos 20 40 4 16 ble by running the recently developed GPU-parallelized 30 cos + 3) and a is the Woods-Saxon tail width. hydrodynamic code in (2+1)D mode, e.g., CLVisc [23] The orientations of the colliding nuclei are given by or GPU-VH [24]. In addition, one may improve the effi- Euler rotations with random angles (; ;
). ciency by selecting events with specific collision geometry provided that some regions are more discriminative in de- R(; ;
) = R (
)R ( )R () (2) z y z termining the nuclear shape deformation. In summary, Monte Carlo simulations of heavy-ion col- where R () is the first rotation along z-axis, R ( ) is z y lisions with various deformed nuclei reveal clear patterns the second rotation along y-axis and R (
) is the third in the complex final state, from which one can retrieve rotation along the original z-axis. Because the deformed information about the structure of the initial state nuclei. nuclei are symmetric along the z-axis, the first rotation Deep convolution neural network designed for classifica- R () can be ignored. To make sure the sampled rota- tion is successfully used in regression task to predict the tions are isotropic, the tilt angle along y-axis is sampled magnitude of the nuclear deformation parameters from according to a uniform distribution cos() 2 U[ 1; 1), the correlation between momentum anisotropy and total whereas the spin angle along z-axis is sampled accord- hadron multiplicity. The network reveals that there is de- ing to a uniform distribution 2 U[0; 2). generacy between the outputs of prolate (positive ) and We prepare 51x51=2601 groups of deformed uranium oblate (negative ) heavy-ion collisions. The Regression nucleus with 51 2 [ 0:5; 0:5] and 51 2 [ 0:2; 0:2]. 2 4 Attention Mask algorithm helps to locate the most dis- For each group, we simulate 100000 collisions with all criminative regions in the input image. It not only veri- possible collision geometries determined by the orienta- fies that the DCNN learned the hidden structures which tion of each nucleus and the impact parameter (the trans- are sensitive to nuclear deformation, but also discovers a verse distance between the center of two colliding nuclei). degeneracy in the sign of the nuclear deformation. From these collisions we further select half of the events with highest total entropy, which corresponds to central- ity range 0 50%. IV. METHOD In experiments, the directly accessible information is the number of final state charged hadrons at mid-rapidity Not all nuclei have a perfect spherical shape. Many nu- dN =dYj and the momentum anisotropy v of fi- ch Y =0 2 clei have large deformations that lead to complex struc- nal state hadrons. It is shown in many studies that tures in the final state of heavy-ion collisions. For ex- dN =dYj is proportional to the total entropy density ch Y =0 ample, the collisions of prolate-shaped uranium nuclei s of the initial state. The anisotropy v can be approx- 0 2 have tip-tip, tip-body and body-body crossing patterns. imately computed from the geometric eccentricity of the 2 2 2 2 The fluid dynamic expansion transfers the initial geomet- initial state " = hy x i=hy +x i, where x and y are the ric eccentricity to the momentum anisotropy of the final transverse coordinates in the overlapped regions of colli- state hadrons. As a result, the most central tip-tip col- sion, hi represents weighted average where weights are 5 given by the local entropy density s(x; y). The geomet- residual neural network designed for image classifica- ric eccentricity in initial state transforms to momentum tion also works well on regression task. Our inputs are anisotropy in the final state through relativistic hydro- images of 2 dimensional event-by-event distributions of dynamic expansion of the strongly coupled quark gluon (dN =dYj ; v =v ) in 56 56 bins. The input ch normed 2 2max plasma. To make the current method directly applicable image is first processed using a two-dimensional convolu- to experiment, we match the " to v through a heuristic tion, then it is fed to a type-I residual box containing 3 2 2 equation [28, 29], blocks named Residual Block-I, where the output feature maps have the same transverse size as the input image. 0 3 v = k " + k " + (3) The resulted feature maps are fed to four type-II resid- 2 2 2 2 2 2 ual boxes consecutively. Each type-II residual box has where the coefficients k = 0:2, k = 0:1 and is the 2 2 2 3 to 6 blocks named Residual Block-II. The first Resid- residual that introduces additionally 10% uniformly- ual Block-II in each box reduces the width and height of distributed random fluctuations. the input feature map by a factor of 2. All the residual The total entropy is self-normalized with the mean en- blocks have one “add” operation and the last “add” layer tropy of the 0 1% most central collisions for each nu- has a name “add_16”. Each residual block has 2 Conv2D clear shape deformation. The self-normalization makes layers and in total they contribute to 16 2 = 32 convo- the method applicable to experimental data because lution layers. We have used global average pooling layer [30] to get the mean of each feature map with size 7 7 dN =dY s ch 0 for the 512 channels. This 512 neurons are connected to dN =dYj = : (4) ch normed hdN =dYi hs i ch 01% 0 0 1% 2 neurons in the output layer to make predictions for the nuclear deformation parameter j j and j j. One rea- 2 4 We now have 2601 groups of (dN =dYj ; v ) dis- ch normed 2 son to use this deep residual neural network is to verify tributions. The data are divided into 3 groups, 80% for whether a deep network can learn the sign of the param- training, 10% for validating and 10% for testing. We eter where a shallow network has already failed. The use data augmentation to enlarge the size of the train- residual neural network also has better interpretability ing data set. For each distribution, we randomly sample than VGG-like network as shown in the paper [31]. 90% from 50000 data points to create a new image. The data augmentation produces 160000 images for training, 16000 for validating and 16000 for testing. C. Regression Attention Mask for interpretable deep learning B. Deep regression network Interpretability is the most indispensable consideration of the deep neural networks when it is used in science re- searches, as well as self-driving cars, medical diagnosis Output Dense(2) and government policy making. The interpretability is ReLu ReLu add add defined as the ability to explain or to present in under- Global Average Pooling -> 512 Squeeze Excite Squeeze Excite standable terms to a human [32]. Visualization, verbal 3 Residual Block-II -> 7x7x512 explanation and clustering of similar instances are all un- Batch Norm Batch Norm 6 Residual Block-II -> 14x14x256 derstandable representations of the deep neural networks Conv2D(3x3) Conv2D(3x3) with Interpretability. 4 Residual Block-II ->28x28x128 ReLu ReLu There are many ways to visualize what has been 3 Residual Block-I -> 56x56x64 Batch Norm Batch Norm learned by the network classifier. For reviews see the book “Interpretable Machine Learning” [33] and surveys Conv2D(3x3) -> 56x56x64 Conv2D(3x3) Conv2D(3x3) [32, 34–36]. There are global explanations that explain Input(56x56x1) x x the network in the whole input space by visualizing what Regression Network 1 Residual Block-I 1 Residual Block-II each feature map learns. There are also local explana- tions that explain local features in one specific image. We have designed the “Regression Attention Mask” al- Figure 2: (Color-online) The architecture of the 34-layer gorithm, which provides a local interpretation of a given regression neural network using residual and squeeze image. excitation blocks. For the global explanation, deconvolution is used to visualize each feature map of the deep convolution neu- Shown in Fig. 2 is the 34-layer deep convolution neu- ral network [37–40]. For the local explanation, there are ral network for the regression task. The residual blocks many different methods developed based on the assump- make it possible to design deep convolution neural net- tion that one highly complex machine learning model can work for image classifications. And the squeeze excita- be locally approximated by a linear model around one tion operation additionally pushes the image classifica- given input image. One way to construct the importance tion to the state-of-the-art. We verify that the deep map is to measure the probability changes with parts Conv 1x1 6 of the image occluded [41, 42] or similar pixels/super- For classification task, the importance of each pixel pixels (LIME) masked [43]. Different from our method, to classification can be computed using the gradient those occlusion methods depend on manually constructed weighted class activation map (Grad-CAM), masks of input images. c k Saliency map is another way to explain the pre-trained X X 1 @f gradcam(x) = A (5) convolution neural network around one given image [30]. c k k @A ij n=1 i;j=1 It assumes that the predicted class score can be approx- imated by a linear function f(x) w x + b around one where x is the input image, c is the number of channels, k given image x in the input space, where f is the function is the size of the activation map, f is the class score, A ij learned by the network. The gradient w = @f=@x repre- is the pixel value of the nth activation map A in layer sents the importance of each pixel. However, the original “add_16” at site (i; j). The class activation map is scaled saliency map is noisy [30] due to negative gradients and up to the same size as the input image by upsampling. non-linear dependences on x. The improved saliency map In the original grad-cam paper, the weighted class acti- uses guided back-propagation [44] to maximize the class vation map is forwarded to a ReLU activation function score of one given class by dropping negative influences. to remove negative contributions. Otherwise the positive These gradient based methods as well as many alterna- influence on one class might be equally negative on the tives [45–47] are sensitive to constant shift [48] except the other to cancel the important regions, when the predic- pattern net [49, 50]. The interpretability of all different tion probabilities are close for the top-2 classes. However, saliency maps can be quantified using our “Regression both positive and negative contributions are required to Attention Mask ” method. reproduce the regression value. Different from the origi- What is closely related to our method is the class acti- nal grad-cam algorithm, the ReLU activation function in vation map where locations in the feature maps of the last our algorithm is removed to adapt to the regression task. convolution layer are matched to the input image [51, 52]. The attention mask for input image x is defined as We have discarded the RELU activation function in the m = gradcam(x ) > T where T is the threshold. In i i gradient weighted class activation map to get our specific the present study, the threshold T is set to the mean activation map for regression tasks. RELU pick positive value of the given mask. Since the input images have influence to enlarge the class score while regression needs similar structure for the same and , we compute the 2 4 both positive and negative components in the activation P averaged attention mask m = w m , for all events in i i map to reproduce the regression value. The regression i the range 2 [j j;j j + 0:02], weighted by w , 2 2 i activation map is used to create “Regression Attention Mask”. exp [ ] Based on the class activation map, the class activation w = P ; = jjf(m x ) f(x )jj; (6) i i i i i exp [ ] mask is invented to quantify the interpretability of dif- j ferent neural networks. The class activation mask is a where x is the ith input image, m is the attention mask i i two-dimensional image that has the same size as the in- of the trained regression network. The m x is the i i put image. It has only one channel and its pixel values pixel-wise multiplication between the attention mask and are initialized with 0. Pixels are set to 1 if correspond- the input image, which helps to occlude unimportant re- ing regions in class activation map have values larger gions. Feeding the original image x and the occluded than some threshold. The interpretability of one classi- image m x to the regression network f helps to get i i fier is quantified by the intersection over union score be- the prediction difference . Smaller prediction differ- tween the class activation mask and human understand- ence indicates better attention mask that leads to higher able concept-segmentation, e.g., human labeled masks weight w . for an object, part, scene, material, texture and color [31]. The interpretability has the order ResNet > VGG > GoogLeNet > AlexNet regarding different network ar- ACKNOWLEDGMENTS chitectures. Different from that method, we propose to use prediction difference of the masked image to quantify the interpretability in the regression network. We thank Volker Koch, Jorgen Randrup, Feng Yuan To disentangle hidden representations of the learned and Xin Dong for helpful discussions. This work is feature maps, studies in Refs. [53] and [54] use graphs, supported by DOE under Contract No. DE-AC02- decision trees and local part template. Recently a deep 05CH11231, by NSF under Grant No. ACI-1550228 neural network has been trained to jointly classify im- within the JETSCAPE Collaboration, by NSFC un- ages into categories and provide its reasoning [55]. Our der Grant No. 11861131009 and No. 11890714, by framework provides an explanation about its decisions in BMBF under the ErUM-Data project and the AI re- the regression task and helps us to understand the fea- search grant of SAMSON AG, Frankfurt. Computations tures of the correlation in determination of the nuclear are performed on GPU workstations at CCNU and DOE deformation. NERSC. 7 [1] Ulrich W. Heinz and Anthony Kuhlman. Anisotropic flow nuclear collisions. Ann. Rev. Nucl. Part. Sci., 57:205– and jet quenching in ultrarelativistic U + U collisions. 243, 2007. Phys. Rev. Lett., 94:132301, 2005. [18] H. Morinaga and P.C. Gugelot. Gamma rays following [2] Anthony J. Kuhlman and Ulrich W. Heinz. Multiplicity (, xn) reactions. Nuclear Physics, 46:210 – 224, 1963. distribution and source deformation in full-overlap U+U [19] D Cline. Nuclear shapes studied by coulomb excita- collisions. Phys. Rev., C72:037901, 2005. tion. Annual Review of Nuclear and Particle Science, [3] Peter Filip, Richard Lednicky, Hiroshi Masui, and Nu Xu. 36(1):683–716, 1986. Initial eccentricity in deformed Au-197 + Au-197 and [20] Francois Chollet. Deep Learning with Python. Manning U-238 + U-238 collisions at sNN=200 GeV at the BNL Publications Co., Greenwich, CT, USA, 1st edition, 2017. Relativistic Heavy Ion Collider. Phys. Rev., C80:054903, [21] Ryan Poplin, Avinash V. Varadarajan, Katy Blumer, 2009. Yun Liu, Michael V. McConnell, Greg S. Corrado, Lily [4] Sergei A. Voloshin. Testing the Chiral Magnetic Ef- Peng, and Dale R. Webster. Prediction of cardiovascu- fect with Central U+U collisions. Phys. Rev. Lett., lar risk factors from retinal fundus photographs via deep 105:172301, 2010. learning. Nature Biomedical Engineering, 2(3):158–164, [5] Andy Goldschmidt, Zhi Qiu, Chun Shen, and Ulrich 2018. Heinz. Collision geometry and flow in uranium + ura- [22] E. Clement et al. Shape coexistence in neutron-deficient nium collisions. Phys. Rev., C92(4):044903, 2015. krypton isotopes. Phys. Rev., C75:054313, 2007. [6] M. Alvioli, H. Holopainen, K. J. Eskola, and M. Strik- [23] Long-Gang Pang, Hannah Petersen, and Xin-Nian man. Initial state anisotropies and their uncertainties Wang. Pseudorapidity distribution and decorrelation in ultrarelativistic heavy-ion collisions from the Monte of anisotropic flow within the open-computing-language Carlo Glauber model. Phys. Rev., C85:034902, 2012. implementation CLVisc hydrodynamics. Phys. Rev., [7] P. Filip. Ground-State Properties Of Nuclei And Initial C97(6):064918, 2018. State In Relativistic Heavy Ion Collisions. In Proceed- [24] Dennis Bazow, Ulrich W. Heinz, and Michael Strickland. ings, 11th International Workshop Relativistic Nuclear Massively parallel simulations of relativistic fluid dynam- Physics: from Hundreds of MeV to TeV: Stara Lesna, ics on graphics processing units with CUDA. Comput. Slovak Republik, June 17-23, 2012, page 111, 2013. Phys. Commun., 225:92–113, 2018. [8] G. S. Denicol, C. Gale, S. Jeon, J. F. Paquet, and [25] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian B. Schenke. Effect of initial-state nucleon-nucleon cor- Sun. Deep residual learning for image recognition. CoRR, relations on collective flow in ultra-central heavy-ion col- abs/1512.03385, 2015. lisions. 2014. [26] Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation [9] Bjoern Schenke, Prithwish Tribedy, and Raju Venu- networks. CoRR, abs/1709.01507, 2017. gopalan. Initial-state geometry and fluctuations in [27] J. Scott Moreland, Jonah E. Bernhard, and Steffen A. Au+Au, Cu+Au, and U+U collisions at energies avail- Bass. Alternative ansatz to wounded nucleon and binary able at the BNL Relativistic Heavy Ion Collider. Phys. collision scaling in high-energy nuclear collisions. Phys. Rev., C89(6):064908, 2014. Rev., C92(1):011901, 2015. [10] L. Adamczyk et al. Azimuthal anisotropy in U+U [28] Fernando G. Gardim, Frederique Grassi, Matthew and Au+Au collisions at RHIC. Phys. Rev. Lett., Luzum, and Jean-Yves Ollitrault. Mapping the hydro- 115(22):222301, 2015. dynamic response to the initial geometry in heavy-ion [11] M. Alvioli and M. Strikman. Neutron skin effect in W collisions. Phys. Rev., C85:024908, 2012. and W production in high-energy proton-lead collisions. [29] Jacquelyn Noronha-Hostler, Li Yan, Fernando G. 2018. Gardim, and Jean-Yves Ollitrault. Linear and cubic re- [12] Maciej Rybczyński, Milena Piotrowska, and Wojciech sponse to the initial eccentricity in heavy-ion collisions. Broniowski. Signatures of clustering in ultrarelativis- Phys. Rev., C93(1):014909, 2016. tic collisions with light nuclei. Phys. Rev., C97(3):034912, [30] Karen Simonyan, Andrea Vedaldi, and Andrew Zisser- 2018. man. Deep inside convolutional networks: Visualising [13] J. Noronha-Hostler, N. Paladino, S. Rao, Matthew D. image classification models and saliency maps. CoRR, Sievert, and Douglas E. Wertepny. Ultracentral Colli- abs/1312.6034, 2013. sions of Small and Deformed Systems at RHIC: UU, dAu, [31] David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, and 9 9 9 3 3 3 BeAu, Be Be, He He, and HeAu Collisions. 2019. Antonio Torralba. Network dissection: Quantifying inter- [14] Larry D. McLerran and Raju Venugopalan. Computing pretability of deep visual representations. In Computer quark and gluon distribution functions for very large nu- Vision and Pattern Recognition, 2017. clei. Phys. Rev., D49:2233–2241, 1994. [32] Finale Doshi-Velez and Been Kim. Towards A Rigor- [15] Larry D. McLerran and Raju Venugopalan. Gluon distri- ous Science of Interpretable Machine Learning. arXiv bution functions for very large nuclei at small transverse e-prints, page arXiv:1702.08608, Feb 2017. momentum. Phys. Rev., D49:3352–3355, 1994. [33] Christoph Molnar. Interpretable Machine Learning, A [16] Bjoern Schenke, Prithwish Tribedy, and Raju Venu- Guide for Making Black Box Models Explainable. 2019. gopalan. Event-by-event gluon multiplicity, energy den- [34] Quan shi Zhang and Song chun Zhu. Visual interpretabil- sity, and eccentricities in ultrarelativistic heavy-ion colli- ity for deep learning: a survey. Frontiers of Information sions. Phys. Rev., C86:034908, 2012. Technology & Electronic Engineering, 19(1):27–39, Jan [17] Michael L. Miller, Klaus Reygers, Stephen J. Sanders, 2018. and Peter Steinberg. Glauber modeling in high energy [35] Riccardo Guidotti, Anna Monreale, Franco Turini, Dino 8 Pedreschi, and Fosca Giannotti. A survey of methods and Karol Zieba. Visualbackprop: visualizing cnns for for explaining black box models. CoRR, abs/1802.01933, autonomous driving. CoRR, abs/1611.05418, 2016. 2018. [47] K. Fu, W. Dai, Y. Zhang, Z. Wang, M. Yan, and X. Sun. [36] A. Adadi and M. Berrada. Peeking inside the black-box: Multicam: Multiple class activation mapping for air- A survey on explainable artificial intelligence (xai). IEEE craft recognition in remote sensing images. Remote Sens, Access, 6:52138–52160, 2018. 11(544), 2019. [37] Dumitru Erhan, Y Bengio, Aaron Courville, and Pascal [48] Pieter-Jan Kindermans, Sara Hooker, Julius Adebayo, Vincent. Visualizing higher-layer features of a deep net- Maximilian Alber, Kristof T. Schütt, Sven Dähne, Du- work. Technical Report, UniveristÃ
c de MontrÃ
c al, 01 mitru Erhan, and Been Kim. The (Un)reliability of 2009. saliency methods. arXiv e-prints, page arXiv:1711.00867, [38] Matthew D. Zeiler and Rob Fergus. Visualizing Nov 2017. and understanding convolutional networks. CoRR, [49] Grégoire Montavon, Sebastian Bach, Alexander Binder, abs/1311.2901, 2013. Wojciech Samek, and Klaus-Robert Müller. Explaining [39] Chris Olah, Alexander Mordvintsev, and Ludwig nonlinear classification decisions with deep taylor decom- Schubert. Feature visualization. Distill, 2017. position. CoRR, abs/1512.02479, 2015. https://distill.pub/2017/feature-visualization. [50] Pieter-Jan Kindermans, Kristof T. Schütt, Maximilian [40] Chris Olah, Arvind Satyanarayan, Ian Johnson, Shan Alber, Klaus-Robert Müller, Dumitru Erhan, Been Kim, Carter, Ludwig Schubert, Katherine Ye, and Alexander and Sven Dähne. Learning how to explain neural net- Mordvintsev. The building blocks of interpretability. Dis- works: PatternNet and PatternAttribution. arXiv e- till, 2018. https://distill.pub/2018/building-blocks. prints, page arXiv:1705.05598, May 2017. [41] Marko Robnik-Šikonja and Igor Kononenko. Explaining [51] B. Zhou, A. Khosla, Lapedriza. A., A. Oliva, and A. Tor- classifications for individual instances. IEEE Trans. on ralba. Learning Deep Features for Discriminative Local- Knowl. and Data Eng., 20(5):589–600, May 2008. ization. CVPR, 2016. [42] Luisa M. Zintgraf, Taco S. Cohen, Tameem Adel, and [52] Ramprasaath R. Selvaraju, Abhishek Das, Ramakrishna Max Welling. Visualizing deep neural network decisions: Vedantam, Michael Cogswell, Devi Parikh, and Dhruv Prediction difference analysis. CoRR, abs/1702.04595, Batra. Grad-cam: Why did you say that? visual explana- 2017. tions from deep networks via gradient-based localization. [43] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. CoRR, abs/1610.02391, 2016. "why should I trust you?": Explaining the predictions of [53] Feng Shi Ying-Nian Wu Quan-Shi Zhang, Rui-Ming Cao any classifier. In Proceedings of the 22nd ACM SIGKDD and Song-Chun Zhu. Interpreting cnn knowledge via International Conference on Knowledge Discovery and an explanatory graph. In TThe Thirty-Second AAAI Data Mining, San Francisco, CA, USA, August 13-17, Conference on Artificial Intelligence (AAAI-18), Febrary 2016, pages 1135–1144, 2016. 2018. [44] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas [54] Quan-Shi Zhang, Ying-Nian Wu, and Song-Chun Zhu. Brox, and Martin A. Riedmiller. Striving for simplicity: Interpretable convolutional neural networks. In The The all convolutional net. CoRR, abs/1412.6806, 2014. IEEE Conference on Computer Vision and Pattern [45] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Recognition (CVPR), June 2018. Axiomatic attribution for deep networks. CoRR, [55] Atsushi Kanehira and Tatsuya Harada. Learning abs/1703.01365, 2017. to explain with complemental examples. CoRR, [46] Mariusz Bojarski, Anna Choromanska, Krzysztof Choro- abs/1812.01280, 2018. manski, Bernhard Firner, Larry D. Jackel, Urs Muller,
http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.pngHigh Energy Physics - PhenomenologyarXiv (Cornell University)http://www.deepdyve.com/lp/arxiv-cornell-university/interpretable-deep-learning-for-nuclear-deformation-in-heavy-ion-ArslhJMNzl
Interpretable deep learning for nuclear deformation in heavy ion collisions