Biomedical text categorization with concept graph representations using a controlled vocabulary
Document Type
Conference Proceeding
Publication Date
9-28-2012
Abstract
Recent work using graph representations for text categorization has shown promising performance over conventional bag-of-words representation of text documents. In this paper we investigate a graph representation of texts for the task of text categorization. In our representation we identify high level concepts extracted from a database of controlled biomedical terms and build a rich graph structure that contains important concepts and relationships. This procedure ensures that graphs are described with a regular vocabulary, leading to increased ease of comparison. We then classify document graphs by applying a set-based graph kernel that is intuitively sensible and able to deal with the disconnectedness of the constructed concept graphs. We compare this approach to standard approaches using non-graph, text-based features. We also do a comparison amongst different kernels that can be used to see which performs better. Copyright 2012 ACM.
Identifier
84866635017 (Scopus)
ISBN
[9781450315524]
Publication Title
Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
External Full Text Location
https://doi.org/10.1145/2350176.2350181
First Page
26
Last Page
32
Recommended Citation
Mishra, Meenakshi; Huan, Jun; Bleik, Said; and Song, Min, "Biomedical text categorization with concept graph representations using a controlled vocabulary" (2012). Faculty Publications. 18090.
https://digitalcommons.njit.edu/fac_pubs/18090
