Fast retrieval of electronic messages that contain mistyped words or spelling errors
Document Type
Article
Publication Date
12-1-1997
Abstract
This paper presents an index structure for retrieving electronic messages that contain mistyped words or spelling errors. Given a query string (e.g., a search key), we want to find those messages that approximately contain the query, i.e., certain inserts, deletes and mismatches are allowed when matching the query with a word (or phrase) in the messages. Our approach is to store the messages sequentially in a database and hash their "fingerprints" into a number of "fingerprint files." When the query is given, its fingerprints are also hashed into the files and a histogram of votes is constructed on the messages. We derive a lower bound, based on which one can prune a large number of nonqualifying messages (i.e., those whose votes are below the lower bound) during searching. The paper presents some experimental results, which demonstrate the effectiveness of the index structure and the lower bound. © 1997 IEEE.
Identifier
0031164072 (Scopus)
Publication Title
IEEE Transactions on Systems Man and Cybernetics Part B Cybernetics
External Full Text Location
https://doi.org/10.1109/3477.584951
ISSN
10834419
First Page
441
Last Page
451
Issue
3
Volume
27
Grant
SBR-421280
Fund Ref
National Science Foundation
Recommended Citation
Wang, Jason Tsong Li and Chang, Chia Yo, "Fast retrieval of electronic messages that contain mistyped words or spelling errors" (1997). Faculty Publications. 16670.
https://digitalcommons.njit.edu/fac_pubs/16670
