Access the full text.
Sign up today, get DeepDyve free for 14 days.
J Szymanski (2014)
Comparative analysis of text representation methods using classificationCybern. Syst., 45
P Czarnul (2012)
A model, design, and implementation of an efficient multithreaded workflow execution engine with data streaming, caching, and storage constraintsJ. Supercomput., 63
P Rosciszewski, P Czarnul, R Lewandowski, M Schally-Kacprzak (2016)
Kernelhive: a new workflow-based framework for multilevel high performance computing using clusters and workstations with CPUs and GPUsConcurr. Comput. Pract. Exp., 28
The paper deals with parallelization of computing similarity measures between large vectors. Such computations are important components within many applications and consequently are of high importance. Rather than focusing on optimization of the algorithm itself, assuming specific measures, the paper assumes a general scheme for finding similarity measures for all pairs of vectors and investigates optimizations for scalability in a hybrid Intel Xeon/Xeon Phi system. Hybrid systems including multicore CPUs and many-core compute devices such as Intel Xeon Phi allow parallelization of such computations using vectorization but require proper load balancing and optimization techniques. The proposed implementation uses C/OpenMP with the offload mode to Xeon Phi cards. Several results are presented: execution times for various partitioning parameters such as batch sizes of vectors being compared, impact of dynamic adjustment of batch size, overlapping computations and communication. Execution times for comparison of all pairs of vectors are presented as well as those for which similarity measures account for a predefined threshold. The latter makes load balancing more difficult and is used as a benchmark for the proposed optimizations. Results are presented for the native mode on an Intel Xeon Phi, CPU only and the CPU $$+$$ + offload mode for a hybrid system with 2 Intel Xeons with 20 physical cores and 40 logical processors and 2 Intel Xeon Phis with a total of 120 physical cores and 480 logical processors.
International Journal of Parallel Programming – Springer Journals
Published: Sep 29, 2016
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.