Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Mining block correlations to improve storage performance

Mining block correlations to improve storage performance Block correlations are common semantic patterns in storage systems. They can be exploited for improving the effectiveness of storage caching, prefetching, data layout, and disk scheduling. Unfortunately, information about block correlations is unavailable at the storage system level. Previous approaches for discovering file correlations in file systems do not scale well enough for discovering block correlations in storage systems.In this article, we propose two algorithms, C-Miner and C-Miner *, that use a data mining technique called frequent sequence mining to discover block correlations in storage systems. Both algorithms run reasonably fast with feasible space requirement, indicating that they are practical for dynamically inferring correlations in a storage system. C-Miner is a direct application of a frequent-sequence mining algorithm with a few modifications; compared with C-Miner , C-Miner * is redesigned for mining block correlations by making concessions for the specific problem of long sequences in storage system traces. Therefore, C-Miner * can discover 7--109% more correlation rules within 2--15 times shorter time than C-Miner . Moreover, we have also evaluated the benefits of block correlation-directed prefetching and data layout through experiments. Our results using real system workloads show that correlation-directed prefetching and data layout can reduce average I/O response time by 12--30% compared to the base case, and 7--25% compared to the commonly used sequential prefetching scheme for most workloads. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png ACM Transactions on Storage (TOS) Association for Computing Machinery

Mining block correlations to improve storage performance

Loading next page...
 
/lp/association-for-computing-machinery/mining-block-correlations-to-improve-storage-performance-k3CU0vWuwc

References (75)

Publisher
Association for Computing Machinery
Copyright
Copyright © 2005 by ACM Inc.
ISSN
1553-3077
DOI
10.1145/1063786.1063790
Publisher site
See Article on Publisher Site

Abstract

Block correlations are common semantic patterns in storage systems. They can be exploited for improving the effectiveness of storage caching, prefetching, data layout, and disk scheduling. Unfortunately, information about block correlations is unavailable at the storage system level. Previous approaches for discovering file correlations in file systems do not scale well enough for discovering block correlations in storage systems.In this article, we propose two algorithms, C-Miner and C-Miner *, that use a data mining technique called frequent sequence mining to discover block correlations in storage systems. Both algorithms run reasonably fast with feasible space requirement, indicating that they are practical for dynamically inferring correlations in a storage system. C-Miner is a direct application of a frequent-sequence mining algorithm with a few modifications; compared with C-Miner , C-Miner * is redesigned for mining block correlations by making concessions for the specific problem of long sequences in storage system traces. Therefore, C-Miner * can discover 7--109% more correlation rules within 2--15 times shorter time than C-Miner . Moreover, we have also evaluated the benefits of block correlation-directed prefetching and data layout through experiments. Our results using real system workloads show that correlation-directed prefetching and data layout can reduce average I/O response time by 12--30% compared to the base case, and 7--25% compared to the commonly used sequential prefetching scheme for most workloads.

Journal

ACM Transactions on Storage (TOS)Association for Computing Machinery

Published: May 1, 2005

There are no references for this article.