Mining Clinical Text in Cancer Care

Abstract: Health care and clinical practice generate large amounts of text detailing symptoms, test results, diagnoses, treatments, and outcomes for patients. This clinical text, documented in health records, is a potential source of knowledge and an underused resource for improved health care. The focus of this work has been text mining of clinical text in the domain of cancer care, with the aim to develop and evaluate methods for extracting relevant information from such texts. Two different types of clinical documentation have been included: clinical notes from electronic health records in Swedish and Norwegian pathology reports.Free text, and clinical text in particular, is considered as a kind of unstructured information, which is difficult to process automatically. Therefore, information extraction can be applied to create a more structured representation of a text, making its content more accessible for machine learning and statistics. To this end, this thesis describes the development of an efficient and accurate tool for information extraction for pathology reports.Another application for clinical text mining is risk prediction and diagnosis prediction. The goal for such prediction is to create a machine learning model capable of identifying patients at risk of a specific disease or some other adverse outcome. The motivation for cancer diagnosis prediction is that an early diagnosis can be beneficial for the outcome of treatment. Here, a disease prediction model was developed and evaluated for prediction of cervical cancer. To create this model, health records of patients diagnosed with cervical cancer were processed in two steps. First, clinical events were extracted from free text clinical notes through the use of named entity recognition. The extracted events were next combined with other event types, such as diagnosis codes and drug codes from the same health records. Finally, machine learning models were trained for predicting cervical cancer, and evaluation showed that events extracted from the free text records were the most informative event type for the diagnosis prediction.

  CLICK HERE TO DOWNLOAD THE WHOLE DISSERTATION. (in PDF format)