From Numerical Sensor Data to Semantic Representations : A Data-driven Approach for Generating Linguistic Descriptions
Abstract: In our daily lives, sensors recordings are becoming more and more ubiquitous. With the increased availability of data comes the increased need of systems that can represent the data in human interpretable concepts. In order to describe unknown observations in natural language, an artificial intelligence system must deal with several issues involving perception, concept formation, and linguistic description. These issues cover various subfields within artificial intelligence, such as machine learning, cognitive science, and natural language generation.The aim of this thesis is to address the problem of semantically modelling and describing numerical observations from sensor data. This thesis introduces data-driven approaches to perform the tasks of mining numerical data and creating semantic representations of the derived information in order to describe unseen but interesting observations in natural language.The research considers creating a semantic representation using the theory of conceptual spaces. In particular, the central contribution of this thesis is to present a data-driven approach that automatically constructs conceptual spaces from labelled numerical data sets. This constructed conceptual space then utilises semantic inference techniques to derive linguistic interpretations for novel unknown observations. Another contribution of this thesis is to explore an instantiation of the proposed approach in a real-world application. Specifically, this research investigates a case study where the proposed approach is used to describe unknown time series patterns that emerge from physiological sensor data. This instantiation first presents automatic data analysis methods to extract time series patterns and temporal rules from multiple channels of physiological sensor data, and then applies various linguistic description approaches (including the proposed semantic representation based on conceptual spaces) to generate human-readable natural language descriptions for such time series patterns and temporal rules.The main outcome of this thesis is the use of data-driven strategies that enable the system to reveal and explain aspects of sensor data which may otherwise be difficult to capture by knowledge-driven techniques alone. Briefly put, the thesis aims to automate the process whereby unknown observations of data can be 1) numerically analysed, 2) semantically represented, and eventually 3) linguistically described.
This dissertation MIGHT be available in PDF-format. Check this page to see if it is available for download.