TY - JOUR AU - AB - ATFormer: A Learned Performance Model with Transfer Learning Across Devices for Deep Learning Tensor Programs Yang Bai, Wenqian Zhao, Shuo Yin, Zixiao Wang, Bei Yu Department of Computer Science and Engineering The Chinese University of Hong Kong, Hong Kong SAR {ybai,wqzhao,syin22,zxwang22,byu}@cse.cuhk.edu.hk Abstract Input High-level The training and inference efficiency of ever- Conv [a] structure Conv [b] larger deep neural networks highly rely on the Conv [b] Low-level performance of tensor operators on specific detail hardware platforms. Therefore, a compilation- Matmul [e] Conv [c] based optimization flow with automatic ten- Computation Graph Output sor generation and parameter tuning is neces- sary for efficient model deployment. While Search compilation-based methods with performance Cost Model Algorithm models can provide dynamic and suitable code optimization, they suffer from a large design Search Space Online Offline space exploration with rough measurement ac- DataSet DataSet Best curacy and poor transferability among differ- Candidate ent hardware platforms. This paper presents Full Tensor Programs Hardware ATFormer, a simple yet efficient design with Auto-Tuning attention-inspired modules to accurately pre- dict the performance of optimized operators Figure 1: The overview of a search-based framework by capturing global and long-range dependen- with computation graph, cost model, and search space. cies TI - ATFormer: A Learned Performance Model with Transfer Learning Across Devices for Deep Learning Tensor Programs JF - Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing DO - 10.18653/v1/2023.emnlp-main.250 DA - 2023-01-01 UR - https://www.deepdyve.com/lp/unpaywall/atformer-a-learned-performance-model-with-transfer-learning-across-PA3YA94SVc DP - DeepDyve ER -