ACM Transactions on Architecture and Code Optimization (TACO)

ACM Transactions on Architecture and Code Optimization (TACO) | DeepDyve

journal article

LitStream Collection

Introduction

Calder, Brad; Tullsen, Dean

2006 ACM Transactions on Architecture and Code Optimization (TACO)

doi: 10.1145/1132462.1132463pmid: N/A

journal article

LitStream Collection

Bit-split string-matching engines for intrusion detection and prevention

Tan, Lin; Brotherton, Brett; Sherwood, Timothy

2006 ACM Transactions on Architecture and Code Optimization (TACO)

doi: 10.1145/1132462.1132464pmid: N/A

Network Intrusion Detection and Prevention Systems have emerged as one of the most effective ways of providing security to those connected to the network and at the heart of almost every modern intrusion detection system is a string-matching algorithm. String matching is one of the most critical elements because it allows for the system to make decisions based not just on the headers, but the actual content flowing through the network. Unfortunately, checking every byte of every packet to see if it matches one of a set of thousands of strings becomes a computationally intensive task as network speeds grow into the tens, and eventually hundreds, of gigabits/second. To keep up with these speeds, a specialized device is required, one that can maintain tight bounds on worst-case performance, that can be updated with new rules without interrupting operation, and one that is efficient enough that it could be included on-chip with existing network chips or even into wireless devices. We have developed an approach that relies on a special purpose architecture that executes novel string matching algorithms specially optimized for implementation in our design. We show how the problem can be solved by converting the large database of strings into many tiny state machines, each of which searches for a portion of the rules and a portion of the bits of each rule. Through the careful codesign and optimization of our architecture with a new string-matching algorithm, we show that it is possible to build a system that is 10 times more efficient than the currently best known approaches.

journal article

LitStream Collection

Efficient remote profiling for resource-constrained devices

Nagpurkar, Priya; Mousa, Hussam; Krintz, Chandra; Sherwood, Timothy

2006 ACM Transactions on Architecture and Code Optimization (TACO)

doi: 10.1145/1132462.1132465pmid: N/A

The widespread use of ubiquitous, mobile, and continuously connected computing agents has inspired software developers to change the way they test, debug, and optimize software. Users now play an active role in the software evolution cycle by dynamically providing valuable feedback about the execution of a program to developers. Software developers can use this information to isolate bugs in, maintain, and improve the performance of a wide-range of diverse and complex embedded device applications. The collection of such feedback poses a major challenge to systems researchers since it must be performed without degrading a user's experience with, or consuming the severely restricted resources of the mobile device. At the same time, the resource constraints of embedded devices prohibit the use of extant software profiling solutions. To achieve efficient remote profiling of embedded devices, we couple two efficient hardware/software program monitoring techniques: Hybrid Profiling Support(HPS) and Phase-Aware Sampling. HPS efficiently inserts profiling instructions into an executing program using a novel extension to Dynamic-Instruction Stream Editing(DISE). Phase-aware sampling exploits the recurring behavior of programs to identify key opportunities during execution in order to collect profile information (i.e. sample). Our prior work on phase-aware sampling required code duplication to toggle sampling. By guiding low-overhead, hardware-supported sampling according to program phase behavior via HPS, our system is able to collect highly accurate profiles transparently. We evaluate our system assuming a general purpose configuration as well as a popular handheld device configuration. We measure the accuracy and overhead of our techniques and quantify the overhead in terms of computation, communication, and power consumption. We compare our system to random and periodic sampling for a number of widely used performance profile types. Our results indicate that our system significantly reduces the overhead of sampling while maintaining high accuracy.

journal article

LitStream Collection

Recovery code generation for general speculative optimizations

Lin, Jin; Hsu, Wei-Chung; Yew, Pen-Chung; Ju, Roy Dz-Ching; Ngai, Tin-Fook

2006 ACM Transactions on Architecture and Code Optimization (TACO)

doi: 10.1145/1132462.1132466pmid: N/A

A general framework that integrates both control and data speculation using alias profiling and/or compiler heuristic rules has shown to improve CPU2000 performance on Itanium systems. However, speculative optimizations require check instructions and recovery code to ensure correct execution when speculation fails at runtime. How to generate check instructions and their associated recovery code efficiently and effectively is an issue yet to be well studied. It is also, very important that the recovery code generated in the earlier phases integrate gracefully in the later optimization phases. At the very least, it should not hinder later optimizations, thus, ensuring overall performance improvement. This paper proposes a framework that uses an if-block structure to facilitate check instructions and recovery code generation for general speculative optimizations. It allows speculative instructions and their recovery code generated in the early compiler optimization phases to be integrated effectively with the subsequent optimization phases. It also allows multilevel speculation for multilevel pointers and multilevel expression trees to be handled with no additional complexity. The proposed recovery code generation framework has been implemented and evaluated in the Open Research Compiler (ORC).

journal article

LitStream Collection

Optimal register reassignment for register stack overflow minimization

Choi, Yoonseo; Han, Hwansoo

2006 ACM Transactions on Architecture and Code Optimization (TACO)

doi: 10.1145/1132462.1132467pmid: N/A

Architectures with a register stack can implement efficient calling conventions. Using the overlapping of callers' and callees' registers, callers are able to pass parameters to callees without a memory stack. The most recent instance of a register stack can be found in the Intel Itanium architecture. A hardware component called the register stack engine (RSE) provides an illusion of an infinite-length register stack using a memory-backed process to handle overflow and underflow for a physically limited number of registers. Despite such hardware support, some applications suffer from the overhead required to handle register stack overflow and underflow. The memory latency associated with the overflow and underflow of a register stack can be reduced by generating multiple register allocation instructions within a procedure Settle et al. 2003. Live analysis is utilized to find a set of registers that are not required to keep their values across procedure boundaries. However, among those dead registers, only the registers that are consecutively located in a certain part of the register stack frame can be removed. We propose a compiler-supported register reassignment technique that reduces RSE overflow/underflow further. By reassigning registers based on live analysis, our technique forces as many dead registers to be removed as possible. We define the problem of optimal register reassignment, which minimizes interprocedural register stack heights considering multiple call sites within a procedure. We present how this problem is related to a path-finding problem in a graph called a sequence graph . We also propose an efficient heuristic algorithm for the problem. Finally, we present the measurement of effects of the proposed techniques on SPEC CINT2000 benchmark suite and the analysis of the results. The result shows that our approach reduces the RSE cycles by 6.4% and total cpu cycles by 1.7% on average.

Showing 1 to 5 of 5 Articles

Articles per page

ACM Transactions on Architecture and Code Optimization (TACO)

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

0001

Related Journals: