Fast max-margin clustering for unsupervised word sense disambiguation in biomedical texts
Document Type
Conference Proceeding
Publication Date
3-19-2009
Abstract
Background: We aim to solve the problem of determining word senses for ambiguous biomedical terms with minimal human effort. Methods: We build a fully automated system for Word Sense Disambiguation by designing a system that does not require manually-constructed external resources or manually-labeled training examples except for a single ambiguous word. The system uses a novel and efficient graph-based algorithm to cluster words into groups that have the same meaning. Our algorithm follows the principle of finding a maximum margin between clusters, determining a split of the data that maximizes the minimum distance between pairs of data points belonging to two different clusters. Results: On a test set of 21 ambiguous keywords from PubMed abstracts, our system has an average accuracy of 78%, outperforming a state-of-the-art unsupervised system by 2% and a baseline technique by 23%. On a standard data set from the National Library of Medicine, our system outperforms the baseline by 6% and comes within 5% of the accuracy of a supervised system. Conclusion: Our system is a novel, state-of-the-art technique for efficiently finding word sense clusters, and does not require training data or human effort for each new word to be disambiguated. © 2009 Duan et al; licensee BioMed Central Ltd.
Identifier
63449126014 (Scopus)
Publication Title
BMC Bioinformatics
External Full Text Location
https://doi.org/10.1186/1471-2105-10-S3-S4
e-ISSN
14712105
PubMed ID
19344480
Issue
SUPPL. 3
Volume
10
Grant
DUE-0434581
Fund Ref
Temple University
Recommended Citation
    Duan, Weisi; Song, Min; and Yates, Alexander, "Fast max-margin clustering for unsupervised word sense disambiguation in biomedical texts" (2009). Faculty Publications.  12129.
    
    
    
        https://digitalcommons.njit.edu/fac_pubs/12129
    
 
				 
					