Neural Network Approaches To Survival Analysis

University dissertation from Department of Astronomy and Theoretical Physics, Lund University

Abstract: Predicting the probable survival for a patient can be very challenging for many diseases. In many forms of cancer, the choice of treatment can be directly impacted by the estimated risk for the patient. This thesis explores different methods to predict the patient's survival chances using artificial neural networks (ANN). ANN is a machine learning technique inspired by how neurons in the brain function. It is capable of learning to recognize patterns by looking at labeled examples, so-called supervised learning. Certain characteristics of medical data make it difficult to use ANN methods and the articles in this thesis investigates different methods of overcoming those difficulties. One of the most prominent difficulties is the missing data known as censoring. Survival data usually originates from medical studies, which only are conducted during a limited time period for example during five years. During this time, some patients will leave the study for various reasons like death by unrelated causes. Some patients will also survive the study without experiencing cancer recurrence or death. These patients provide partial information about the survival characteristics of the disease but are challenging to include in statistical models. Articles 1-3, and 5 utilize a genetic algorithm to train ANN models to maximize (or minimize) non-differentiable functions, which are impossible to combine with traditional ANN training techniques which rely on gradient information. One of these functions is the concordance index, which compares survival predictions in a pair-wise fashion. This function is often used to compare prognostic models in survival analysis, and is maximized directly using the genetic algorithm approach. In contrast, Article 5 tries to produce the best grouping of the patients into low, intermediate, or high risk by maximizing, or minimizing the area under the survival curve. Article 4 does not use a genetic algorithm approach but instead takes the approach to modify the underlying data. Regular gradient methods are used to train ANNs on survival data where censored times are estimated in a maximum likelihood framework.