Top 40 concepts

piek vossen (piek.vossen@let.uva.nl)
Mon, 4 May 1998 02:13:44 +0200

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Previous message: Josiah Lee Auspitz: "Re: Hayes' scrutiny of Top 40"

Dear all,

Sorry if you get this twice.

This is to (hopefully) clarify some of the pending issues around the 40
concepts. First of all my apologies for the list as it is published without
proper explanation. The list was originally only distributed for discussion
among the organizers. Since then there hardly has been any discussion and
we certainly have not reached any agreement whether this is a useful list.
It was not my intention to put it out on the WEB like this. Since it now is
I will try to explain more:

1. there are more than 40 concepts.

2. they all are Wordnet1.5 synsets (representing concepts shared by a pool
of synonyms) with information extracted from Wordnet1.5

3. they are translations of synsets in wordnets in Dutch, Spanish and
Italian or directly taken from Wordnet.

4. their selection is based on the fact that they play an important role in
at least 3 of these wordnets, where importance is measured in terms of:
- high number of relations (mostly hyponyms)
- high position in the hierarchy (top-3)

The list is thus not a well-designed ontology, or set of semantic
distinctions. If it reflects anything at all, it is the fact that these
synsets capture a semantic component which is strongly lexicalized in the
languages or that it is included in the top-level. The former may be
interesting since a top-ontology should be capable of dealing with this.
The latter is rather arbitrary since none of these wordnets have been
designed top-down, and they all use limited single-branched hierarchies
(with a few exceptions). See Fritz comment to the Wordnet hierarchy.

5. the set has not been minimalised. This means that it is very well
possible that there are concepts which are too specific for our purpose.

6. the set has not been maximized either. This means that there may be
distinctions which are important for our purposes but not present.

7. Any classification information given (glosses, hyperonym) is taken from
Wordnet. These classifications are there to help to understand what the
Wordnet people intended with the particular synset.

What is then the use of having this list?

a. to have a set of examples to pick from in Heidelberg
b. to see what components play a role in the lexicalization of languages.
Any reference ontology should be able to deal with this set.
c. to trigger a discussion on different types of ontologies and their
different purposes

It may very well be that the selection is not good at all for our purposes.
As I said it was distributed as a premature first proposal, not having the
status of a well-designed minimal set or anything close to that. I
therefore asked for other lists and selections to compare but these have
never been suggested.

I will try to distribute a neater version when I get access to the data
(university is closed until Wednesday), although it can never be perfect
given the reasons set out above. Besides that, if it was perfect there was
no reason for the Heidelberg workhop: we would already have the
top-ontology. The fact that a wordnet is not the same as an ontology, and
the fact that we want to clarify these differences is why we will meet.

However, if you think it is a worthless enterprise let me know. There are
other things I can do.

Piek

Previous message: Josiah Lee Auspitz: "Re: Hayes' scrutiny of Top 40"