Enabling Internet-Scale Publish/Subscribe In Overlay Networks

University dissertation from Stockholm : KTH Royal Institute of Technology

Abstract: As the amount of data in todays Internet is growing larger, users are exposedto too much information, which becomes increasingly more difficult tocomprehend. Publish/subscribe systems leverage this problem by providingloosely-coupled communications between producers and consumers of data ina network. Data consumers, i.e., subscribers, are provided with a subscriptionmechanism, to express their interests in a subset of data, in order to be notifiedonly when some data that matches their subscription is generated by theproducers, i.e., publishers. Most publish/subscribe systems today, are basedon the client/server architectural model. However, to provide the publish/-subscribe service in large scale, companies either have to invest huge amountof money for over-provisioning the resources, or are prone to frequent servicefailures. Peer-to-peer overlay networks are attractive alternative solutions forbuilding Internet-scale publish/subscribe systems. However, scalability comeswith a cost: a published message often needs to traverse a large number ofuninterested (unsubscribed) nodes before reaching all its subscribers. Werefer to this undesirable traffic, as relay overhead. Without careful considerations,the relay overhead might sharply increase resource consumption for therelay nodes (in terms of bandwidth transmission cost, CPU, etc) and couldultimately lead to rapid deterioration of the system’s performance once therelay nodes start dropping the messages or choose to permanently abandonthe system. To mitigate this problem, some solutions use unbounded numberof connections per node, while some other limit the expressiveness of thesubscription scheme.In this thesis work, we introduce two systems called Vitis and Vinifera, fortopic-based and content-based publish/subscribe models, respectively. Boththese systems are gossip-based and significantly decrease the relay overhead.We utilize novel techniques to cluster together nodes that exhibit similarsubscriptions. In the topic-based model, distinct clusters for each topic areconstructed, while clusters in the content-based model are fuzzy and do nothave explicit boundaries. We augment these clustered overlays by links thatfacilitate routing in the network. We construct a hybrid system by injectingstructure into an otherwise unstructured network. The resulting structuresresemble navigable small-world networks, which spans along clusters of nodesthat have similar subscriptions. The properties of such overlays make theman ideal platform for efficient data dissemination in large-scale systems. Thesystems requires only a bounded node degree and as we show, through simulations,they scale well with the number of nodes and subscriptions and remainefficient under highly complex subscription patterns, high publication rates,and even in the presence of failures in the network. We also compare bothsystems against some state-of-the-art publish/subscribe systems. Our measurementsshow that both Vitis and Vinifera significantly outperform theircounterparts on various subscription and churn scenarios, under both syntheticworkloads and real-world traces.

  CLICK HERE TO DOWNLOAD THE WHOLE DISSERTATION. (in PDF format)