TY - JOUR AU1 - Yang, Xuejun AB - Parallel computing is the main technical approach for achieving very high performance computing. In the history of parallel computing, there have been three phases, i.e. moderate parallelism described by Amdahl's law [1], large-scale parallelism described by Gustafson's law [2], and high-productivity parallelism described by the productivity evaluation model [3]. In April 2010, IBM Inc. in their report ‘Some Challenges on Road from Petascale to Exascale’ presented five challenges in an exascale system; these stem from power consumption, memory access, communication, reliability, and programming [4], respectively referred to as the energy wall, memory wall, communication wall, reliability wall, and programming wall. Faced with the challenges of ‘walls’, we investigate wall measurement models at the scientific level. For example, existing reliability theories, such as probability theory, do not consider the effect of reliability on performance, while the classic speedup model does not reflect the relation between performance and reliability. To incorporate reliability and performance into a unified measurement model, we measure reliability based on fault-tolerant overhead. As current fault-tolerant techniques include a certain time overhead, we created a reliability speedup model with fault tolerance to measure the effect of fault-tolerant overhead on speedup: \begin{eqnarray*} S^R &=& \frac{S_P}{1 + \frac{{\it Mean\ fault\ tolerance\ overhead\ per\ failure}}{{\it Mean\ time\ between\ failures}}} \nonumber\\ &=& \frac{S_P}{1 + R(P)} = {\it PUU}^R . \end{eqnarray*} Here SR is the reliability speedup, R(P) reflects the relation between fault-tolerant overhead and the number of nodes P, U is the traditional efficiency of the system, and UR is the reliability efficiency after introducing a reliability factor. Thus, the reliability wall is defined as the supremum of reliability speedup [5], which is a trade-off between reliability and computing performance when the system size is scaled up, as shown in Fig. 1. Similarly, memory, communication, and other walls are also defined as the supremum of the corresponding speedups. Figure 1. Open in new tabDownload slide Reliability speedup and reliability wall [5]. Figure 1. Open in new tabDownload slide Reliability speedup and reliability wall [5]. Faced with the challenges of ‘walls’, we have made breakthroughs in both the architecture and enabling technologies. With respect to the architecture, we highlight (1) a design that balances computing with communication and memory access, coordinates the application, software, and hardware, and integrates the computer system and execution environment; (2) a chip architecture that integrates a general processor as well as a specific processor; (3) a support framework for applications; (4) state-of-the-art cooling technology, such as the ICEOTOPE Corp. cooling product; (5) a system-on-chip architecture based on neural networks; (6) a new programming language together with its compiler; (7) large-scale parallel algorithms that can be scaled up to tens of thousands or even millions of nodes; (8) a domain-oriented support environment for high-performance computing; (9) a support environment for the design and execution of parallel programs at the instruction, thread, multi-core, and multi-node levels; (10) the optimization of memory access at the architecture, operating system, compiler, and algorithm levels; (11) power optimization at the chip, architecture, operating system, and compiler levels; (12) fault tolerance technology integrated both the software and hardware; and (13) service-oriented cloud computing, amongst others. With respect to the enabling technology, with the rapid development of nanomaterial, quantum computing, and bioinformatics, we focus on (1) quantum walk Boson sampling, (2) programmable nanometer circuits, (3) memristors, (4) holographic optical storage, (5) on-chip optical interconnects, (6) system-wide optical interconnects, and so on. Information science and information technology complement one another. In the late 20th century, information technology developed rapidly. However, there has been no major breakthrough in the foundation of information technology, i.e. information science, in the last 40 years. Therefore, we need to strengthen research on the fundamental theory by refining scientific problems from engineering, enabling breakthroughs in the theory, and then applying these to engineering. We also emphasize interdisciplinary studies by promoting the intersection and merging of computer science with several disciplines, including physics, mathematics, material science, chemistry, and micro-electronics, in order to enhance the capacity for sustainable development. REFERENCES 1 Amdahl GM . Validity of the single processor approach to achieving large scale computing capabilities , Proc of AFIPS Spring Joint Computer Conference , 1967 Washington, DC Thompson (pg. 483 - 5 ) Atlantic City, NJ 2 Gustafson J . , Commun ACM , 1988 , vol. 31 (pg. 532 - 3 ) Crossref Search ADS 3 Yang XJ , Du J , Wang ZY . , J Supercomput , 2011 , vol. 56 (pg. 164 - 81 ) Crossref Search ADS 4 Steinmacher-Burow B , Gara A . Some Challenges on Road from Petascale to Exascalehttp://www.physik.uni-regensburg.de/forschung/wettig/workshops/APQ_April2010/talks/20100414%20lQCD%20RegensburgSteinmacher-Burowv07.pdf10.1093/nsr/nwu002.html (6 March 2014, date last accessed). 5 Yang XJ , Wang ZY , Xue JL , et al. , IEEE T Comput , 2012 , vol. 61 (pg. 767 - 79 ) Crossref Search ADS © The Author(s) 2014. Published by Oxford University Press on behalf of China Science Publishing & Media Ltd. All rights reserved. For Permissions, please email: journals.permissions@oup.com This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits non-commercial reuse, distribution, and reproduction in any medium, provided the original work is properly cited. for commercial re-use, please contact journals.permissions@oup.com © The Author(s) 2014. Published by Oxford University Press on behalf of China Science Publishing & Media Ltd. All rights reserved. For Permissions, please email: journals.permissions@oup.com TI - Thoughts on high-performance computing JF - National Science Review DO - 10.1093/nsr/nwu002 DA - 2014-09-01 UR - https://www.deepdyve.com/lp/oxford-university-press/thoughts-on-high-performance-computing-tFEQWvuzXc SP - 332 EP - 333 VL - 1 IS - 3 DP - DeepDyve ER -