Faculty Publications

Fast max-margin clustering for unsupervised word sense disambiguation in biomedical texts

Weisi Duan, Department of Computer and Information Sciences
Min Song, New Jersey Institute of Technology
Alexander Yates, Department of Computer and Information Sciences

Document Type

Conference Proceeding

Publication Date

3-19-2009

Abstract

Background: We aim to solve the problem of determining word senses for ambiguous biomedical terms with minimal human effort. Methods: We build a fully automated system for Word Sense Disambiguation by designing a system that does not require manually-constructed external resources or manually-labeled training examples except for a single ambiguous word. The system uses a novel and efficient graph-based algorithm to cluster words into groups that have the same meaning. Our algorithm follows the principle of finding a maximum margin between clusters, determining a split of the data that maximizes the minimum distance between pairs of data points belonging to two different clusters. Results: On a test set of 21 ambiguous keywords from PubMed abstracts, our system has an average accuracy of 78%, outperforming a state-of-the-art unsupervised system by 2% and a baseline technique by 23%. On a standard data set from the National Library of Medicine, our system outperforms the baseline by 6% and comes within 5% of the accuracy of a supervised system. Conclusion: Our system is a novel, state-of-the-art technique for efficiently finding word sense clusters, and does not require training data or human effort for each new word to be disambiguated. © 2009 Duan et al; licensee BioMed Central Ltd.

Identifier

63449126014 (Scopus)

Publication Title

BMC Bioinformatics

External Full Text Location

https://doi.org/10.1186/1471-2105-10-S3-S4

e-ISSN

14712105

PubMed ID

19344480

Issue

SUPPL. 3

Volume

Grant

DUE-0434581

Fund Ref

Temple University

Recommended Citation

Duan, Weisi; Song, Min; and Yates, Alexander, "Fast max-margin clustering for unsupervised word sense disambiguation in biomedical texts" (2009). Faculty Publications. 12129.
https://digitalcommons.njit.edu/fac_pubs/12129

This document is currently not available here.

COinS

DOI

10.1186/1471-2105-10-S3-S4

Faculty Publications

Fast max-margin clustering for unsupervised word sense disambiguation in biomedical texts

Document Type

Publication Date

Abstract

Identifier

Publication Title

External Full Text Location

e-ISSN

PubMed ID

Issue

Volume

Grant

Fund Ref

Recommended Citation

DOI

Search

Browse

Author Corner

Links

Faculty Publications

Fast max-margin clustering for unsupervised word sense disambiguation in biomedical texts

Authors

Document Type

Publication Date

Abstract

Identifier

Publication Title

External Full Text Location

e-ISSN

PubMed ID

Issue

Volume

Grant

Fund Ref

Recommended Citation

Share

DOI

Search

Browse

Author Corner

Links