Date of Award

Spring 1999

Document Type


Degree Name

Doctor of Philosophy in Computing Sciences - (Ph.D.)


Computer and Information Science

First Advisor

James Geller

Second Advisor

Yehoshua Perl

Third Advisor

Richard B. Scherl

Fourth Advisor

James J. Cimino

Fifth Advisor

Michael Halper

Sixth Advisor

Waldemar G. Johanson


A controlled medical terminology (CMT) is a collection of concepts (or terms) that are used in the medical domain. Typically, a CMT also contains attributes of those concepts and/or relationships between those concepts. Electronic CMTs are extremely useful and important for communication between and integration of independent information systems in healthcare, because data in this area is highly fragmented. A single query in this area might involve several databases, e.g., a clinical database, a pharmacy database, a radiology database, and a lab test database.

Unfortunately, the extensive sizes of CMTs, often containing tens of thousands of concepts and hundreds of thousands of relationships between pairs of those concepts, impose steep learning curves for new users of such CMTs. In this dissertation, we address the problem of helping a user to orient himself in an existing large CMT. In order to help a user comprehend a large, complex CMT, we need to provide abstract views of the CMT. However, at this time, no tools exist for providing a user with such abstract views. One reason for the lack of tools is the absence of a good theory on how to partition an overwhelming CMT into manageable pieces.

In this dissertation, we try to overcome the described problem by using a threepronged approach. (1) We use the power of Object-Oriented Databases to design a schema extraction process for large, complex CMTs. The schema resulting from this process provides an excellent, compact representation of the CMT. (2) We develop a theory and a methodology for partitioning a large OODI3 schema, modeled as a graph, into small meaningful units. The methodology relies on the interaction between a human and a computer, making optimal use of the human's semantic knowledge and the computer's speed. Furthermore, the theory and methodology developed for the scbemalevel partitioning are also adapted to the object-level of a CMT. (3) We use purely structural similarities for partitioning CMTs, eliminating the need for a human expert in the partitioning methodology mentioned above.

Two large medical terminologies are used as our test beds, the Medical Entities Dictionary (MED) and the Unified Medical Language System (UMLS), which itself contains a number of terminologies.