TY - JOUR
AU - 
AB - ATFormer: A Learned Performance Model with Transfer Learning Across Devices for Deep Learning Tensor Programs Yang Bai, Wenqian Zhao, Shuo Yin, Zixiao Wang, Bei Yu Department of Computer Science and Engineering The Chinese University of Hong Kong, Hong Kong SAR {ybai,wqzhao,syin22,zxwang22,byu}@cse.cuhk.edu.hk Abstract Input High-level The training and inference efﬁciency of ever- Conv [a] structure Conv [b] larger deep neural networks highly rely on the Conv [b] Low-level performance of tensor operators on speciﬁc detail hardware platforms. Therefore, a compilation- Matmul [e] Conv [c] based optimization ﬂow with automatic ten- Computation Graph Output sor generation and parameter tuning is neces- sary for efﬁcient model deployment. While Search compilation-based methods with performance Cost Model Algorithm models can provide dynamic and suitable code optimization, they suffer from a large design Search Space Online Offline space exploration with rough measurement ac- DataSet DataSet Best curacy and poor transferability among differ- Candidate ent hardware platforms. This paper presents Full Tensor Programs Hardware ATFormer, a simple yet efﬁcient design with Auto-Tuning attention-inspired modules to accurately pre- dict the performance of optimized operators Figure 1: The overview of a search-based framework by capturing global and long-range dependen- with computation graph, cost model, and search space. cies 
TI - ATFormer: A Learned Performance Model with Transfer Learning Across Devices for Deep Learning Tensor Programs
JF - Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
DO - 10.18653/v1/2023.emnlp-main.250
DA - 2023-01-01
UR - https://www.deepdyve.com/lp/unpaywall/atformer-a-learned-performance-model-with-transfer-learning-across-PA3YA94SVc
DP - DeepDyve
ER -