Document Type

Dissertation

Date of Award

12-31-2020

Degree Name

Doctor of Philosophy in Computing Sciences - (Ph.D.)

Department

Computer Science

First Advisor

Dimitri Theodoratos

Second Advisor

Yi Chen

Third Advisor

James Geller

Fourth Advisor

Senjuti Basu Roy

Fifth Advisor

Vincent Oria

Abstract

Keyword search has been seen in recent years as an attractive way for querying data with some form of structure. Indeed, it allows simple users to extract information from databases without mastering a complex structured query language and without having knowledge of the schema of the data. It also allows for integrated search of heterogeneous data sources. However, as keyword queries are ambiguous and not expressive enough, keyword search cannot scale satisfactorily on big datasets and the answers are, in general, of low accuracy. Therefore, flat keyword search alone cannot efficiently return high quality results on large data with structure. In this dissertation, the algorithm improve keyword search over databases by exploiting semantic information of the data and by extending flat keyword queries with semantic information.

First, it develop an algorithm for keyword search over graph databases which exploits tree canonical forms and techniques developed for mining tree patterns. The algorithm substantially reduces the number of redundant intermediate results generated, which is the bottleneck of query evaluation algorithms. Our experiments show that it outperforms previous algorithms by one to two orders of magnitude in terms of efficiency and memory consumption and displays smooth scalability.

Furthermore, the algorithm leverages semantic information of the data to address the aforementioned problems. The method follows a schema-based approach for evaluating keyword queries on relational databases, which computes patterns mapped onto the schema graph of the database. Pattern graphs are representatives for clusters of query results. As such, they are much less numerous than the actual query results and can be translated into SQL queries on the relational database which can produce the results in the cluster. Our pattern graphs allow keywords to match schema elements and capture key-foreign key relationships and inclusion relationships. The the system employ information-retrieval-based and semantics-based techniques for scoring query pattern graphs and design an efficient top-k algorithm for computing the patterns graphs of a keyword query.

Finally, the dissertation study employing keyword queries enhanced with cohesiveness constraints (cohesive keyword queries) to query relational databases. Cohesive keyword queries bridge the gap between flat keyword queries and structured queries. The dissertation formally define semantics for cohesive queries on relational databases and design an efficient evaluation algorithm. The experimental results show that cohesive keyword queries substantially improve the quality of the results of flat keyword queries and the performance of their evaluation. Most importantly, these improvements are attained without compromising the simplicity and convenience of traditional keyword search.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.