Approaches to Interactive Online Machine Learning

Abstract: With the Internet of Things paradigm, the data generated by the rapidly increasing number of connected devices lead to new possibilities, such as using machine learning for activity recognition in smart environments. However, it also introduces several challenges. The sensors of different devices might be of different types, making the fusion of data non-trivial. Moreover, the devices are often mobile, resulting in that data from a particular sensor is not always available, i.e. there is a need to handle data from a dynamic set of sensors. From a machine learning perspective, the data from the sensors arrives in a streaming fashion, i.e., online learning, as compared to many learning problems where a static dataset is assumed. Machine learning is in many cases a good approach for classification problems, but the performance is often linked to the quality of the data. Having a good data set to train a model can be an issue in general, due to the often costly process of annotating the data. With dynamic and heterogeneous data, annotation can be even more problematic, because of the ever-changing environment. This means that there might not be any, or a very small amount of, annotated data to train the model on at the start of learning, often referred to as the cold start problem.To be able to handle these issues, adaptive systems are needed. With adaptive we mean that the model is not static over time, but is updated if there for instance is a change in the environment. By including human-in-the-loop during the learning process, which we refer to as interactive machine learning, the input from users can be utilized to build the model. The type of input used is typically annotations of the data, i.e. user input in the form of correctly labelled data points. Generally, it is assumed that the user always provides correct labels in accordance with the chosen interactive learning strategy. In many real-world applications these assumptions are not realistic however, as users might provide incorrect labels or not provide labels at all in line with the chosen strategy.In this thesis we explore which interactive learning strategies are possible in the given scenario and how they affect performance, as well as the effect of machine learning algorithms on performance. We also study how a user who is not always reliable, i.e. that does not always provide a correct label when expected to, can affect performance. We propose a taxonomy of interactive online machine learning strategies and test how the different strategies affect performance through experiments on multiple datasets. The findings show that the overall best performing interactive learning strategy is one where the user provides labels when previous estimations have been incorrect, but that the best performing machine learning algorithm depends on the problem scenario. The experiments also show that a decreased reliability of the user leads to decreased performance, especially when there is a limited amount of labelled data.

  This dissertation MIGHT be available in PDF-format. Check this page to see if it is available for download.