Towards Multiple Embeddings for Multivariate Network Analysis

Abstract: The study of multivariate networks (MVNs, i.e., large data sets where datapoints have relations to other data points and both these relations and the pointsthemselves can have attributed data) is an important task in many different fields,such as social networks for the humanities, citation networks for bibliometricsand biochemical networks for life sciences. Furthermore, when dealing withvisualization and analysis of MVNs, many open challenges still exist regardingboth computational aspects (i.e., the challenge of computing different metricsof a large-scale MVN) and visual aspects (i.e. the challenge of displaying allthe information of a large-scale MVN in a way that is comprehensible to theuser). In the search for efficient and scalable visual analytics methods, especiallyfor exploratory data analysis, this thesis explores a novel approach of aspectdrivenMVN embedding and the use of ensembles of embeddings for multi-levelsimilarity calculations. Starting from the observation that there already existseveral different embedding techniques for datatypes that are common for realworldMVNs, the main question that we will try to answer is: “Could the useof multiple embeddings provide for new and better solutions for visual analytics onmultivariate networks?" This main question then inspires the formulation of fourmore specific research goals regarding: (1) methods for combining embeddings,(2) the development of a general methodology framework, (3) new visualizationmethods, and (4) proof-of-concept applications for real-world scenarios.The focus of our work lies on similarity-based analysis within the domainsof bibliometrics and scientometrics, and our first major step is to developa methodology for combining several different embeddings (for the sameunderlying data) to augment the quality of similarity calculations. This stepincludes an adaptation of some of the key ideas from ensemble methods to thefield of embeddings, and also an interactive optimization process for finding thebest performing ensembles. Upon this foundation, we develop an aspect-drivenapproach which seeks to divide an underlying MVN into separately embeddableaspects, which in turn allows for the resulting embedding vectors to be used inflexible analysis scenarios with high level of interaction. We then proceed toshow how the concept of similarity-based analysis can be used to obtain valuableinsights to, and a better understanding of, a large set of scientific publications.For this, we introduce the abstract concept of similarity patterns which we use toexpress how a specific set of similarity criteria are distributed over a data set.Furthermore, we present proof-of-concept applications which are designed toallow the user to exploit these similarity patterns at different levels of detail. Wealso show that our proposed methodology is generalizable beyond the scope ofMVNs, and therefore could be applied to other fields as well.