On social interaction metrics : social network crawling based on interestingness

Abstract: With the high use of online social networks we are entering the era of big data. With limited resources it is important to evaluate and prioritize interesting data. This thesis addresses the following aspects of social network analysis: efficient data collection, social interaction evaluation and user privacy concerns. It is possible to collect data from online social networks via their open APIs. However, a systematic and efficient collection of online social networks data is still challenging. To improve the quality of the data collection process, prioritizing methods are statistically evaluated. Results suggest that the collection time can be reduced by up to 48% by prioritizing the collection of posts. Evaluation of social interactions also require data that covers all the interactions in a given domain. This has previously been hard to do, but the proposed crawler is capable of extracting all social interactions from a given page. With the extracted data it is for instance possible to illustrate indirect interactions between different users that do not necessarily have to be connected. Methods using the same data to identify and cluster different opinions in online communities have been developed. These methods are evaluated with the too Linguistic Inquiry and Word Count. The privacy of the content produced; and the users’ private information provided on social networks is important to protect. Users must be aware of the consequence of posting in online social networks in terms of privacy. Methods to protect user privacy are presented. The proposed crawler in this thesis has, over the period of 20 months, collected over 38 million posts from public pages on Facebook covering: 4 billion likes and 340 million comments from over 280 million users. The performed data collection yielded one of the largest research dataset of social interactions on Facebook today, enabling qualitative research in form of social network analysis.

  CLICK HERE TO DOWNLOAD THE WHOLE DISSERTATION. (in PDF format)