Adaptive Packet Sampling for Flow Volume Measurement Beak-Young Choi, Jaesung Park, Zhi-Li Zhang http://www.cs.umn.edu/networking/sampling/samplingProject.htm Department of Computer Science and Engineering, University of Minnesota E-mail: {choiby, jpark, zhzhang}@cs.umn.edu In this study, we have addressed the problem of flow volume measurement using packet sampling approach. Traffic measurement and monitoring serve as the basis for a wide range of IP network operations, management and engineering tasks. Particularly, flow-level measurements are required for applications such as traffic profiling, usage-based accounting, traffic engineering, traffic matrix and QoS monitoring. With todayâs high-speed (e.g., Gigabit or Terabit) links, however, capturing every packet is no longer feasible due to the excessive overheads it incurs on linecards or routers. As a scalable alternative, both the Internet IETF working groups IPFIX (IP Flow Information Export) and PSAMP (Packet Sampling) have recommended the use of packet sampling. The foremost and fundamental question regarding sampling is its accuracy. This is especially pertinent in the Internet, where traffic is known to fluctuate dynamically and unpredictably. Inaccurate packet sampling not only defeats the purpose of traffic measurement and monitoring, but worse, can lead to wrong decisions by network operators. Excessive oversampling should also be avoided for the measurement solution to be scalable. Therefore, it is important to control the accuracy of estimation so as to balance the trade-off between the utility and overhead of measurements. Given the dynamic nature of network traffic, static sampling does not always ensure the accuracy of estimation, and tends to oversample at peak periods when economy and timeliness are most critical. Packet sampling for flow-level measurements is a particularly challenging problem. One issue is the diversity of flows: flows can vary drastically in their volume. The dynamics of flows is another issue: flows arrive at random time and stay active for a random duration; the rate of a flow (i.e., the number of packets generated by a flow per unit of time) may also vary over time, further complicating the matter of packet sampling. How can we ensure accuracy of measurement of dynamic flows? How many packets does one need to sample in order to produce flow measurements with a pre-specified error bound? How to decide on a sampling rate to avoid excessive oversampling while ensuring accuracy? How to perform sampling procedure and estimate flow volume? How easily can it be implemented at line speed? To answer these questions, in this paper we advance a theoretical framework and develop an adaptive packet sampling technique using stratified random sampling. The technique is targeted for accurate (i.e., with bounded sampling errors both in packet and byte count) estimation of large or elephant flows, since a small percentage of flows are observed to account for a large percentage of the total traffic. We classify flows based on proportion of packet count to the total count over a time interval that encompasses the flow duration. A flow is referred to as an elephant flow in our study if its packet count proportion is larger than a pre-specified threshold. Since the proportion is defined over the time interval enclosing a flow, it eliminates the rate fluctuation impact. This definition of elephant flows captures flow characteristics of packet count, byte count as well as burstiness. We tackle the problem of flow dynamics using stratified random sampling. The proposed technique uses predetermined non-overlapping time blocks called strata. For each stratum, it samples packets with the same rate. The sampling rate is set to bound the estimation error of the smallest elephant flow. At the end of each stratum, flow statistics are estimated and updated. The predetermined time blocks enable us to estimate flow volume without knowing flow arrival times and their durations. From each flowâs point of view, it is stratified in fixed time. At the end of the last time block enclosing the flow, its statistics are summarized into a single estimation record. A time block is the minimum time scale over which an elephant flow (packet count proportion) is determined. It is also minimum time scale over which the sampling rate can be adjusted. In order to achieve a desired accuracy, a certain minimum number of packets must be sampled during the sampling frame that encompasses an elephant flow duration. Given the arbitrary length of elephant flow duration, the sampling frame could be one block or a few consecutive blocks in the stratified sampling. To ensure that we collect enough samples of each elephant flow, we sample the minimum required number of packets every block. The minimum required number of samples is computed at the beginning of each block to accommodate dynamic changes in the total traffic. Using real network traffic traces, we demonstrate that the proposed technique indeed produces the desired accuracy of flow volume estimation, while at the same time achieving significant reduction in the amount of packet samples as well as flow cache size. ACM SIGCOMM Computer Communications Review Volume 32, Number 3: July 2002
/lp/association-for-computing-machinery/adaptive-packet-sampling-for-flow-volume-measurement-XZlWxwAVE8