Classification and Computational Methods in Gene Expression Data Analysis

University dissertation from Department of Theoretical Physics Lund University

Abstract: The technology of cDNA microarrays has given us the possibility to monitor the state of cells by measuring the activity of thousands of genes simultaneously. This high-throughput techniqe has in cancer research allowed exploratory studies of molecular mechanisms behind for example metastasis and response to therapy. This increased knowledge can hopefully result in new therapies and improved prognostic and predictive tools. These tools however have to be properly validated in large cohorts and must be subjected to large-scale trials before use in the clinic. One aim of this thesis is to evaluate the performance of classifiers of clinical outcome for breast cancer based on gene expression data as compared to conventional clinical markers. Additionally, we develop computational methods for analysis and classification using gene expression data. Our results suggests that clinical markers and molecular profiling have similar power in breast cancer prognosis. Further studies using larger cohorts are thus needed to validate and refine molecular prognostic profiles. We have also performed multicategory classification of leukemia into genetic subtypes and have predicted response to therapy in a subgroup. The main contribution to the computational analysis is our development of a method for improvement of missing value imputation of 2-dye cDNA microarray data. Recognizing that some categories of missing values are over- or underestimated in a kNN-based imputation method, we suggest a linear model that corrects for this bias and improves imputation of these spots.

  This dissertation MIGHT be available in PDF-format. Check this page to see if it is available for download.