Principles of ontology

John F. Sowa (sowa@west.poly.edu)
Wed, 3 Dec 1997 23:34:44 -0500

At the ontology meeting at Stanford, Nov. 6 & 7, I distributed a list
of definitions and principles that were based on the discussions that
have been continuing during the 4 meetings that were held in 1996 & 1997.
The following definitions are based on my interpretation of the discussions
that we held, and I'm circulating them for further comments and discussion.

John Sowa
_____________________________________________________________________________

The subject of _ontology_ is the study of the _categories_ of things
that exist or may exist in some domain. The product of such a study,
called _an ontology_, is a catalog of the types of things that are
assumed to exist in a domain of interest D from the perspective of a
person who uses a language L for the purpose of talking about D. The
types in the ontology represent the _predicates_, _word senses_, or
_concept and relation types_ of the language L when used to discuss
topics in the domain D. An uninterpreted logic, such as predicate
calculus, conceptual graphs, or KIF, is _ontologically neutral_. It
imposes no constraints on the subject matter or the way the subject may
be characterized. By itself, logic says nothing about anything, but
the combination of logic with an ontology provides a language that can
express relationships about the entities in the domain of interest.

An informal ontology may be specified by a catalog of types that are
either undefined or defined only by statements in a natural language.
A formal ontology is specified by a collection of names for concept and
relation types organized in a partial ordering by the type-subtype
relation. Formal ontologies are further distinguished by the way the
subtypes are distinguished from their supertypes: an _axiomatized
ontology_ distinguishes subtypes by axioms and definitions stated in a
formal language, such as logic or some computer-oriented notation that
can be translated to logic; a _prototype-based ontology_ distinguishes
subtypes by a comparison with a typical member or _prototype_ for each
subtype. Large ontologies often use a mixture of definitional methods:
formal axioms and definitions are used for the terms in mathematics,
physics, and engineering; and prototypes are used for plants, animals,
and common household items.

ALIGNMENT. A mapping of concepts and relations between two ontologies A
and B that preserves the partial ordering by subtypes in both
A and B. If an alignment maps a concept or relation x in
ontology A to a concept or relation y in ontology B, then x
and y are said to be _equivalent_. The mapping may be
partial: there could be many concepts in A or B that have no
equivalents in the other ontology. Before two ontologies A
and B can be aligned, it may be necessary to introduce new
subtypes or supertypes of concepts or relations in either A
or B in order to provide suitable targets for alignment.
No other changes to the axioms, definitions, proofs, or
computations in either A or B are made during the process of
alignment. Alignment does not depend on the choice of names
in either ontology. For example, an alignment of a Japanese
ontology to an English ontology might map the Japanese concept
Go to the English concept Five. Meanwhile, the English
concept for the verb _go_ would not have any association with
the Japanese concept Go.

AXIOMATIZED ONTOLOGY. A terminological ontology whose concept and
relation types are distinguished by axioms and definitions
that are stated in logic or in some computer-oriented language
that could be automatically translated to logic. There is no
restriction on the complexity of the logic that may be used to
state the axioms and definitions. The distinction between
terminological and axiomatized ontologies is one of degree
rather than kind. Axiomatized ontologies tend to be smaller
than terminological ontologies, but their axioms and
definitions can support more complex inferences and
computations. Examples of axiomatized ontologies include
formal theories in science and mathematics, the collections of
rules and frames in an expert system, and specifications of
conceptual schemas in languages like SQL.

DIFFERENTIAE. The properties that distinguish a subtype from other
types that have a common supertype. The term comes from
Aristotle's method of defining new types by stating the
_genus_ or supertype and stating the properties that
distinguish the new type from its supertype. Aristotle's
method of definition has become the de facto standard for
natural language dictionaries, and it is also widely used for
AI knowledge bases and object-oriented programming languages.

INTEGRATION. The process of finding commonalities between two different
ontologies A and B and deriving a new ontology C that
facilitates interoperability between computer systems that are
based on the A and B ontologies. The new ontology C may
replace A or B, or it may be used only as an intermediary
between a system based on A and a system based on B.
Depending on the amount of change necessary to derive C from A
and B, different levels of integration can be distinguished:
alignment, partial compatibility, and unification. Alignment
is the weakest form of integration: it requires minimal
change, but it can only support limited kinds of
interoperability. It is useful for classification and
information retrieval, but it does not support deep inferences
and computations. Partial compatibility requires more changes
in order to support more extensive interoperability, even
though there may be some concepts or relations in one system
or the other that could create obstacles to full
interoperability. Unification or total compatibility may
require extensive changes or major reorganizations of A and B,
but it can result in the most complete interoperability:
everything that can be done with one can be done in an exactly
equivalent way with the other.

KNOWLEDGE BASE. An informal term for a collection of information that
includes an ontology as one component. Besides an ontology,
a knowledge base may contain information specified in a
declarative language such as logic or expert-system rules, but
it may also include unstructured or unformalized information
expressed in natural language or procedural code.

LEXICON. A knowledge base about some subset of words in the vocabulary
of a natural language. One component of a lexicon is a
terminological ontology whose concept types represent the word
senses in the lexicon. The lexicon may also contain
additional information about the syntax, spelling,
pronunciation, and usage of the words. Besides conventional
dictionaries, lexicons include large collections of words and
word senses, such as WordNet from Princeton University and EDR
from the Japan Electronic Dictionary Research Institute, Ltd.
Other examples include classification schemes, such as the
Library of Congress subject headings, the Medical Subject
Headers (MeSH), and the Defense Technical Information Center
Thesaurus (DTIC).

MIXED ONTOLOGY. An ontology in which some subtypes are distinguished by
axioms and definitions, but other subtypes are distinguished
by prototypes. The top levels of a mixed ontology would
normally be distinguished by formal definitions, but some of
the lower branches, such as plants, animals, and common
household objects might be distinguished by prototypes.

PARTIAL COMPATIBILITY. An alignment of two ontologies A and B that
supports equivalent inferences and computations on all
equivalent concepts and relations. If A and B are partially
compatible, then any inference or computation that can be
expressed in one ontology using only the aligned concepts and
relations can be translated to an equivalent inference or
computation in the other ontology.

PROTOTYPE-BASED ONTOLOGY. A terminological ontology whose types and
subtypes are distinguished by prototypes rather than
definitions and axioms in a formal language. Before a
prototype-based ontology can be considered formal, there must
be some method for measuring the similarity of any two
entities that can be classified according to the types of the
ontology. Given such a measure, every type t in the ontology
must be assigned a _prototype_ or typical instance p. Then an
entity x can classified by the following recursive procedure:

o Suppose that x has been classified as an instance of some
type t, which has subtypes s sub 1 , ... , s sub n.

o Measure the similarity of x to the prototypes
p sub 1 , ... , p sub n for each subtype of t.

o Classify x as an instance of that subtype s sub i whose
prototype p sub i is most similar to x by the measure
used for the ontology.

For any entity x, this procedure is invoked with x compared to
the immediate subtypes of the universal type =BE. After x has
been classified as an instance of any type t, the procedure is
invoked recursively to classify x further as some subtype of
t. The procedure stops when x is classified as an instance of
a type whose only proper subtype is the absurd type =CA.

QUINE'S CRITERION. A tool for determining the implicit ontology that
underlies any language, natural or artificial. The
philosopher Willard van Orman Quine proposed a criterion that
has become famous: "To be is to be the value of a quantified
variable." That criterion makes no assumptions about what
actually exists in the world. Its purpose is to determine the
implicit assumptions made by the speakers who use L to talk
about the world. As stated, Quine's criterion applies
directly to languages like predicate calculus that have
explicit variables and quantifiers. But Quine extended the
criterion to languages of any form, including natural
languages, in which the quantifiers and variables are not
stated as explicitly as they are in predicate calculus. For
English, Quine's criterion means that the implicit ontological
categories are the concept types expressed by the basic
content words in the language: nouns, verbs, adjectives, and
adverbs.

TERMINOLOGICAL ONTOLOGY. An ontology whose concepts and relations need
not be fully specified by axioms and definitions that
determine the necessary and sufficient conditions for their
use. The concepts may be partially specified by relations
such as subtype-supertype or part-whole, which determine the
relative positions of the concepts with respect to one
another, but which do not completely define them. Although
a terminological ontology may be expressed in logic, the
versions of logic required are usually simpler, less
expressive, and more easily computable than full first-order
predicate calculus.

UNIFICATION. A partial compatibility of two ontologies A and B that has
been extended to a total compatibility that includes all
concepts and relations in both A and B. If the ontologies of
A and B have been unified, then any inference or computation
that can be expressed in either one can be mapped to an
equivalent inference or computation in the other.