MUSEUM-L Archives

Museum discussion list

MUSEUM-L@HOME.EASE.LSOFT.COM

Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
"Barbara Weitbrecht, Smithsonian" <[log in to unmask]>
Reply To:
Museum discussion list <[log in to unmask]>
Date:
Mon, 9 Jan 1995 11:14:04 EST
Content-Type:
text/plain
Parts/Attachments:
text/plain (107 lines)
The modeling of taxonomic data in a relational database is an
interesting challenge, and one I have faced several times in my
professional career.  Like many aspects of museum computerization,
it's one of those topics that is relatively simple in theory,
but becomes complex and idiosyncratic in implementation.
 
These, then, are some rules of thumb based not on database
theory but on twenty-odd years of museum experience.  (And very
odd years some of them were, too.....)
 
1) Decide up front whether the database is a collections catalog
   or a representation of a taxonomic hierarchy.  If it's the
   former, the strict representation of the hierarchy is not
   as critical, but you should make sure you have good substring
   search capability.  Specimens *will* be reidentified, and taxa
   *will* be revised; you won't have the resources to recatalog
   all the _Prionospio_ just because someone has assigned half
   the species to _Paraprionospio_, for instance.  And maybe
   half of your curators don't agree with that revision, and
   one of the post-docs is writing one of her own.  Specimen
   cataloging should concentrate on retrievability of the
   specimens, not on up-to-the-minute taxonomic correctitude.
 
2) Never, *ever* try to pack information into the unique record
   designator.  Even something as straightforward as "accession
   year+accession order" will turn around and bite you when you
   discover you need to split a mixed lot, or someone finds
   the backlog from 1957 tucked away at the back of the shelf.
   Pick a meaningless unique designator to maintain relational
   structure and use secondary sort keys to order the database.
 
3) The taxonomic binomen, though supposedly unique, is not always
   so.  Don't create any data structure that depends on nomenclatural
   uniqueness unless you know your group sufficiently well to
   guarantee that there will never, *ever* be a case of homonomy.
 
4) By all means keep separate data elements (e.g. generic and
   specific names) in separate fields.  The reasons for this have
   been made abundantly clear by other posters.  However....
 
5) although you can generate a proper taxonomic name with all
   subtaxa, parentheses, etc. through program logic, sometimes it's
   worth the inelegance of creating an extra field to hold the
   name you want to appear on reports:  _Xus_ (_Yus_) _albus_
   (Linneaus, 1758 _sensu_ Eschmeyer, 1990) for instance.
   This is more important if your database is modeling a taxonomic
   hierarchy and less important for collections data.
 
6) Make some provision for "see" and "see also" records to point to
   the taxonomic synonyms your classification recognizes.  Your
   visiting investigators will appreciate it.
 
If you really *do* want to model a taxonomic hierarchy with a
relational database, you can do so if you have a moderately flexible
DBMS and a moderately good programmer.  The only information you
need in order to generate a taxonomic hierarchy is:
 
       What is this record's parent record?
       What is the order of this record among records with the
         same parent?
       What taxonomic level is this record?
 
A few simple algorithms then let you generate each record's taxonomy,
or a complete taxonomic report on a group.  I implemented such a
system using dBASE III Plus for the "Mammal Species of the World"
project at the NMNH.  The important part of the data structure is:
 
------------------------------------------------------------------
Fieldname   Data type   Purpose
------------------------------------------------------------------
ME          Numeric     The unique number of the taxon; assigned
                        automaticallly by the application.
 
MYPARENT    Numeric     The unique number of the parent of the
                        taxon; assigned automatically by the
                        application.
 
RORDER      Numeric     The relative order of this taxon within
                        its sibling group.  RORDER can be
                        directly edited by the user to set taxon
                        order.
 
NAME        Text        The name of the taxon.  The NAME of a
                        species level taxon does not include the
                        genus.  Use the field PRINTNAME for
                        reports where the full name is required.
 
TAXON       Text        The taxonomic level (Class, Order, Family,
                        Subfamily, Genus, Subgenus, Species,
                        Subspecies, etc.)
 
PRINTNAME   Text        The name that appears on reports.  All
                        appropriate names (e.g. genus, subgenus)
                        are included.
----------------------------------------------------------------------
 
If anyone would like the algorithms or source code, please contact
me directly.  I'm not saying we had the best tool in the world
for managing a hierarchical database; the point is that it *can* be
done without exotic software to manage a hierarchical thesaurus.
 
       +------------------------------+------------------------+
       |  Barbara Weitbrecht          |  [log in to unmask]  |
       |  National Air & Space Museum |  [log in to unmask]       |
       |  Smithsonian Institution     |  (202) 357-4162        |
       +------------------------------+------------------------+

ATOM RSS1 RSS2