Dec
8
Text Mining in Clinical Records
Filed Under Clinical Informatics, IT-related, Observations | 3 Comments
Clinical records would mean any type of electronic records that contains clinical data in granular or free-text form, structured or unstructured.
Text mining refers to the process of deriving high-quality information from text and involves the process of structuring the input text (usually by parsing, along with the addition of some derived linguistic features and the removal of others, and subsequent insertion into a database), deriving patterns within the structured data, and finally evaluation and interpretation of the output. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling (i.e., learning relations between named entities). [Wikipedia - http://en.wikipedia.org/wiki/Text_mining]
While using usual data mining techniques should have been fine, clinical data is frequently entered as free-text. The great complexity and user-unfriendliness of interfaces that force clinical care providers to enter data in structured form with adequate degree of granularity only is a great disincentive.
To mitigate and attain a degree of balance of sorts, as things currently stand allowing free-text entries is the most prudent option to adopt. The pay-off being the necessity of using text-mining to cull the required data from these free-text contents to allow proper data analysis.
While text-mining provides the appropriate tools to populate the data warehouses, searching for the right words makes the process inefficient. Using appropriate terms and their corresponding codes for clinical terms (e.g., SNOMED-CT) to cull the appropriate words from these fields definitely improves it. With more useful data being extracted faster, both quality and quantity of data for mining is significantly increased.
However, this use of codes introduces an intermediary stage where the clinical databases are text-mined using terms from the codes. Next, the corresponding codes are extracted and these are then made to go through the ETVL process to actually populate the data warehouses.
Once the data warehouses and marts, as appropriate, are populated, the rest of the process follows usual data mining and knowledge discovery in clinical databases.
|
|
|
