The modeling of taxonomic data in a relational database is an
interesting challenge, and one I have faced several times in my
professional career. Like many aspects of museum computerization,
it's one of those topics that is relatively simple in theory,
but becomes complex and idiosyncratic in implementation.
These, then, are some rules of thumb based not on database
theory but on twenty-odd years of museum experience. (And very
odd years some of them were, too.....)
1) Decide up front whether the database is a collections catalog
or a representation of a taxonomic hierarchy. If it's the
former, the strict representation of the hierarchy is not
as critical, but you should make sure you have good substring
search capability. Specimens *will* be reidentified, and taxa
*will* be revised; you won't have the resources to recatalog
all the _Prionospio_ just because someone has assigned half
the species to _Paraprionospio_, for instance. And maybe
half of your curators don't agree with that revision, and
one of the post-docs is writing one of her own. Specimen
cataloging should concentrate on retrievability of the
specimens, not on up-to-the-minute taxonomic correctitude.
2) Never, *ever* try to pack information into the unique record
designator. Even something as straightforward as "accession
year+accession order" will turn around and bite you when you
discover you need to split a mixed lot, or someone finds
the backlog from 1957 tucked away at the back of the shelf.
Pick a meaningless unique designator to maintain relational
structure and use secondary sort keys to order the database.
3) The taxonomic binomen, though supposedly unique, is not always
so. Don't create any data structure that depends on nomenclatural
uniqueness unless you know your group sufficiently well to
guarantee that there will never, *ever* be a case of homonomy.
4) By all means keep separate data elements (e.g. generic and
specific names) in separate fields. The reasons for this have
been made abundantly clear by other posters. However....
5) although you can generate a proper taxonomic name with all
subtaxa, parentheses, etc. through program logic, sometimes it's
worth the inelegance of creating an extra field to hold the
name you want to appear on reports: _Xus_ (_Yus_) _albus_
(Linneaus, 1758 _sensu_ Eschmeyer, 1990) for instance.
This is more important if your database is modeling a taxonomic
hierarchy and less important for collections data.
6) Make some provision for "see" and "see also" records to point to
the taxonomic synonyms your classification recognizes. Your
visiting investigators will appreciate it.
If you really *do* want to model a taxonomic hierarchy with a
relational database, you can do so if you have a moderately flexible
DBMS and a moderately good programmer. The only information you
need in order to generate a taxonomic hierarchy is:
What is this record's parent record?
What is the order of this record among records with the
same parent?
What taxonomic level is this record?
A few simple algorithms then let you generate each record's taxonomy,
or a complete taxonomic report on a group. I implemented such a
system using dBASE III Plus for the "Mammal Species of the World"
project at the NMNH. The important part of the data structure is:
------------------------------------------------------------------
Fieldname Data type Purpose
------------------------------------------------------------------
ME Numeric The unique number of the taxon; assigned
automaticallly by the application.
MYPARENT Numeric The unique number of the parent of the
taxon; assigned automatically by the
application.
RORDER Numeric The relative order of this taxon within
its sibling group. RORDER can be
directly edited by the user to set taxon
order.
NAME Text The name of the taxon. The NAME of a
species level taxon does not include the
genus. Use the field PRINTNAME for
reports where the full name is required.
TAXON Text The taxonomic level (Class, Order, Family,
Subfamily, Genus, Subgenus, Species,
Subspecies, etc.)
PRINTNAME Text The name that appears on reports. All
appropriate names (e.g. genus, subgenus)
are included.
----------------------------------------------------------------------
If anyone would like the algorithms or source code, please contact
me directly. I'm not saying we had the best tool in the world
for managing a hierarchical database; the point is that it *can* be
done without exotic software to manage a hierarchical thesaurus.
+------------------------------+------------------------+
| Barbara Weitbrecht | [log in to unmask] |
| National Air & Space Museum | [log in to unmask] |
| Smithsonian Institution | (202) 357-4162 |
+------------------------------+------------------------+
|