Realizing Low-Latency Packet Processing on Multi-Hundred-Gigabit-Per-Second Commodity Hardware : Exploit Caching to Improve Performance

Abstract: By virtue of the recent technological developments in cloud computing, more applications are deployed in the cloud. Among these modern cloud-based applications, many societal applications require bounded and predictable low-latency responses. However, the current cloud infrastructure is unsuitable for these applications since it cannot satisfy these requirements due to many limitations in both hardware and software.This doctoral dissertation describes our attempts to reduce the latency of Internet services by carefully studying the multi-hundred-gigabit-per-second commodity hardware, optimizing it, and improving its performance. The main focus is to improve the performance of packet processing done by the network functions deployed on commodity hardware, known as network functions virtualization (NFV), which is one of the significant sources of latency for Internet services.The first contribution of this dissertation takes a step toward optimizing the cache performance of time-critical NFV service chains. By doing so, we reduce the tail latencies of such systems running at 100 Gbps. This is an important achievement as it increases the probability of realizing bounded and predictable latency for Internet services.The second contribution of this dissertation performs whole-stack optimizations on software-based network functions deployed on top of modular packet processing frameworks to further enhance the effectiveness of cache memories. We build a system to efficiently handle metadata and produce a customized binary of NFVservice chains. Our system improves both throughput & latency of per-core hundred-gigabit-per-second packet processing on commodity hardware.The third contribution of this dissertation studies the efficiency of I/O security solutions provided by commodity hardware at multi-hundred-gigabit-per-second rates. We characterize the performance of IOMMU & IOTLB (i.e., I/O virtual address translation cache) at 200 Gbps and explore the possible opportunities to mitigate its performance overheads in the Linux kernel.

  CLICK HERE TO DOWNLOAD THE WHOLE DISSERTATION. (in PDF format)