Streaming Aggregation using Reconfigurable Hardware

Abstract: High throughput and low latency streaming aggregation is essential for many applications that analyze massive volumes of data in real-time. In many cases, high speed stream aggregation can be achieved incrementally by computing partial results for multiple windows. However, for particular problems, temporarily storing all incoming raw data to a single window before processing is more efficient or even the only option. This thesis presents the first FPGA-based single window stream aggregation designs for tuple-based and time-based windowing policies. The proposed approach is able to support challenging queries required in realistic stream processing problems. More precisely, holistic, distributive, and algebraic aggregation functions, as well as custom ones can be supported. Our designs offer aggregation for large number of concurrently active keys and handles large window sizes and frequent aggregations. Maxeler's dataflow engines (DFEs), which suit well the stream processing characteristics, are used to implement the designs. DFEs have a direct feed of incoming data from the network as well as direct access to off-chip DRAM. The tuple-based single window DFE processes up to 8 million tuples-per-second (1.1 Gbps) offering 1-2 orders of magnitude higher throughput than a state-of-the-art stream processing software system. The processing latency is less than 4 usec, 4 orders of magnitude lower latency than software. The time-based single-window stream aggregation DFE offers high processing throughput, up to 150 Mtuples/sec, similar to related GPU systems, which however do not support both time-based and single windows. It also offers an ultra-low processing latency of 1-10 usec, at least 4 orders of magnitude lower than software-based solutions.