Computer-based support for knowledge extraction from clinical databases

University dissertation from Vimmerby : VTT Grafiska

Abstract: This thesis is devoted to aspects related to the analysis and interpretation of medical data thus allowing knowledge extraction from medical databases. The difficulties with traditional approaches for data analysis and interpretation by people less proficient in statistics are well known. An attempt has been made to identify different stages in the processes of data analysis and interpretation where this category of users might need help and to identify how this help could best be provided (studies I, II, III). In this work artificial intelligence approaches have been used as remedies to improve user-friendliness as well as power and effectiveness of statistical software in respect to data analysis strategies. Prototype implementations based on these approaches are presented and discussed. Issues pertaining to the evaluation of decision support systems m medicine have been identified and discussed in detail (study IV).Knowledge in a knowledge-based system for decision support is generally acquired from experts and the literature. Knowledge can also be effectively extracted from a database of patient observations and from interpretation of those observations. The resulting system would be more accurate in the latter case, especially if it is intended to operate in decision support in the same clinical setting.The studies V and VI were conducted to show how retrospectively collected data could be utilized for the purpose of knowledge extraction. More traditional data analysis approaches (study V) were used to analyze a database on liver diseases. The data material used in the study was collected in the HELP system as a routine part of patient care. The main issues involved were detection of outliers and treatment of missing values in order to facilitate utilization of this kind of database for eventual knowledge extraction. In Study V, statistical techniques including discriminant analysis and artificial intelligence approaches such asinductive learning, were used. The 'K nearest neighbor' technique was found to be an easy and acceptable method for estimating missing values when the database contained only a few missing values for each object in the database. Discriminant analysis was found to be a good method for classifying a patient, based on a set of variables, into two or more disease classes. The results show that when discriminant analysis was applied to two groups based on a relatively large number (19) of variables, then only a few (3) of the variables accounted for a high percentage of correct classifications.The knowledge-based approach for data analysis and interpretation used in (study III) was applied to a large database (study VI). The main emphasis was to study the feasibility of the approach in exploring a large patient record system. The data material was taken from Kronan Health Center - a primary health care center in suburban Stockholm with a patient database consisting of about 14,000 medical records. The analysis was carried out to test the hypothesis of a possible causation between hypertension and diabetes. The results of this study support the assumption that there is a relationship between diabetes and hypertension but the question of the direction of this relationship remained unsolved, as did the question of direct causality. On the other hand, the results of this study are in accordance with the hypothesis of a common metabolic syndrome. The results arrived at by the analysis method (multivariate tabular analysis) utilized by the system are, moreover in accordance with another statistical method (log linear analysis). This also supports the approach taken in the knowledge-based system.

  This dissertation MIGHT be available in PDF-format. Check this page to see if it is available for download.