MICRO-21 by Wen-mei Hwu, Program Chair I found the following presentations of particular interest: ¢ " M o d e l i n g t h e E f f e c t s of I n s t r u c t i o n Q u e u e L o a d i n g o n a S t a t i c I n s t r u c t i o n S t r e a m M i c r o A r c h i t e c t u r e , " J. Jacobs, A. Uht and R. Ord, U. C. San Diego. The instruction queue is a critical component of the proposed mlcroarchitecture where executable instructions are detected and delivered to the execution unit. This paper clarifies the issue of loading instructions into the instruction queue and evaluates the resulting performance due to different schemes. ¢ " T r a c e S e l e c t i o n for C o m p i l i n g L a r g e C A p p l i c a t i o n P r o g r a m s t o M i c r o , o d e , " P. Chang and W. Hwu, U. of Illinois, Important execution paths are identified in the complicated UNIX programs so that trace scheduling can be effectively applied. Experimental results are provided for ten UNIX system and CAD programs which all exhibit complicated control structure. This is the first paper to address the issue of applying trace scheduling to complicated programs. The work is critical to adapting trace scheduling to RISC's and other upcoming pipelined, parallel mlcroarchltectures. ¢ " C o n t r o l S t o r e I m p l e m e n t a t i o n of a H i g h P e r f o r m a n c e V L S I C I S C , " J. Chang, H. Chao, K. Lewis, and M. Holland, IBM T.J. Watson Research The CMOS 370 has some Control Store on chip and some off. A small on-chip Control Store holds the first two microwords of each microsequence (target of conditional branches). A close look reveals that the two-level Control Store structure can be viewed as a programmer managed target instruction buffer. This structure makes it possible to access one microinstruction from a (mostly off-chip) large Control store every cycle while achieving a short cycle time. ¢ "Efficient Srlvastava, to support instruction M i c r o - C o d e E m u l a t i o n H a r d w i r e d P i p e l i n e d P r o c e s s o r s , " J. Mulder, R. Portier, A. and R. in't Velt, Delft University of Technology, The Netherlands. Efficient trapping is proposed efficient instruction emulation in processors with hardwired control. This makes the issue of set design relatively independent of the implementation (hardwired or microprogrammed). ¢ " M u l t i p l e I n s t r u c t i o n I s s u e a n d S i n g l e - C h i p P r o c e s s o r s , " A. Pleszkun and G. Sohi, U. of WisconsinMadison. Sometimes issuing multiple instructions is not a win. It would be interesting to experiment on the effect of compilation support (trace scheduling, register allocation, etc.) on the instruction issue rate. Comparing the results presented in this paper and those presented by the VLIW team, compilation support seems to be critical for issuing multiple instructions per cycle. ¢ " L a z y D a t a R o u t i n g a n d G r e e d y S c h e d u l i n g f o r A p p l i c a t i o n - S p e c i f i c S i g n a l P r o c e s s o r s , " K. Rimey and P. Hilfinger, U. C. Berkeley. The paper discusses the dilemma due to the interdependence between data routing and code scheduling in ASIC code generation. This issue corresponds closely to the one regarding the code scheduling and register allocation for pipelined and/or wide instruction architectures. The trend is to consider both factors together during code generation. ¢ " F l e x i b l e P r o c e s s o r s : A P r o m i s i n g A p p l i c a t i o n - s p e c i f i c P r o c e s s o r D e s i g n A p p r o a c h , " A. Wolfe and J. Shen, Carnegie Mellon University. The dynamic reconfigurability is a very interesting feature of the proposed ASIC paradigm. However, the slow prototype makes one wonder if a simple microprocessor can be programmed to achieve the same performance for the target applications. ¢ " I m p l e m e n t i n g a P r o l o g M a c h i n e w i t h M u l t i p l e F u n c t i o n a l U n i t s , " A. Singhal and Y. Patt, U. C. Berkeley. Parallel unification and execution result in factor of 4 speedup over the Berkeley PLM. Some interesting design tradeoffs are evaluated in terms of their impact on the projected execution speed. These tradeoffs include memory system, prefetching, backtracking, and unification. ¢ " H a r d w a r e S u p p o r t for L a r g e A t o m i c U n i t s in D y n a m i c a l l y S c h e d u l e d M a c h i n e s , " S. Melvin, M. Shebanow, and Y. Part, U.C. Berkeley. The issue raised here regards the separation of atomic units at different levels of abstraction. The authors argue that instead of having instructions as the atomic units at all levels of abstraction, the compiler, the architecture, and the microarchitecture should have different atomic units tailored to their individual needs. This is a nice follow-up to the Static-Dynamic Interface work by the authors.
/lp/association-for-computing-machinery/micro-21-from-the-program-chair-nUnv49cm3z