Biodiversity Informatics

Taxonomy

She Unnames Them
Most of them accepted namelessness with the indifference with which they had so long accepted and ignored their names. A faction of yaks protested. They said that “yak” sounded right. They discussed the matter all summer. The council of elderly females finally agreed that though the name might be useful to others it was so redundant from the yak point of view that they never spoke it themselves, and might as well dispense with it. Most of the domestic animals agreed to give their names back. The cats denied ever having had any name other than their self-given, unspoken personal names. The dogs and the verbally talented birds insisted that their names were important to them until they understood that the issue was one of individual choice. Then not one objected to parting with the generic appellations. None were left now to unname, and they seemed far closer to me than when their names had stood between us: so close that my fear of them and their fear of me became one. And the attraction that many of us felt was one with the fear. The hunter could not be told from the hunted. This was more or less the effect I had been after, and I could not now make an exception of myself. I went to Adam, and said, “You and your father gave me this. It’s been really useful, but it doesn’t exactly seem to fit.” He was not paying much attention, and only said O.K. and went on with what he was doing. I said goodbye and went out. I had only just realized how hard it would have been to explain. My words now must be as slow, new, and tentative as the steps I took going down the path away from the house, between the dark-branched, tall dancers, motionless against the winter shining.
- Ursula K. Le Guin, The New Yorker, January 13, 1985

Lecture

A crash course in nomenclature.

Codes of nomenclature

Codes of nomenclature exist for the "major" groups of organisms.

There are two other codes to be aware of. One is the BioCode, which attenpts to merge the existing codes. The other is the Phylocode, which consists of rules for naming clades. It seems to be enjoying as much success as the BioCode.

While I think most would agree that if we had started nomenclature knowing what we know now about phylogeny, we wouldn't have done it they way we did. However, there is a huge legacy of existing names, and data linked to those names, making fundamental change unlikely.

As an example of the compromise the current system requires, the Microsporidia are treated under the zoological code (see A new dawn for the naming of fungi: impacts of decisions made in Melbourne in July 2011 on the future publication and regulation of fungal names, http://dx.doi.org/10.3897/mycokeys.1.2062), despite being fungi (and hence should be under the botanical code, although fungi are actually more closely related to animals than plants).

Names and opaque identifiers

Names serve as identifiers, and some have argued that in general identifiers should be opaque (see Universal Resource Identifiers -- Axioms of Web Architecture), that is, we shouldn't read anything into the name itself.

Taxonomic names aren't opaque, although they can mislead us (a species name minor need not actually be small). Binomials in particular are invested with meaning. If two species are in the same genus, we expect that they are more closely related to each other than to a species in another genus. If this isn't the case there is a strong incentive to change the names to reflect the relationship of the taxa. Sometimes this is not a trvial undertaking, as shown by the case of Drosophila melanogaster.

Drosophila melanogaster

A recent example of the clash between having a name "make sense" and the stability of names is the case of Drosophila melanogaster, perhaps the best known model organism in biology. Based on a recent phylogeny Kim van der Linde argued that because the genus Drosophila melanogaster was paraphyletic, should be renamed Sophophora melanogaster (see Revising the paraphyletic genus Drosophila sensu lato for details).

Below is a tree from TreeBASE (TB2:Tr20084) showing the relationship between Drosophila species:

Drosophila ornatifrons Drosophila guaru Drosophila mediostriata Drosophila unipunctata Drosophila mediodiffusa Drosophila cardini Drosophila acutilabella Drosophila arawakana Drosophila caribiana Drosophila ornatipennis Drosophila pallidipennis Drosophila metzii Drosophila maculifrons Drosophila histrio Drosophila macroptera Drosophila rubrifrons Drosophila macrospina Drosophila phalerata Drosophila falleni Drosophila occidentalis Drosophila recens Drosophila guttifera Drosophila bizonata Drosophila neotestacea Drosophila testacea Drosophila putrida Drosophila tripunctata Drosophila funebris Drosophila pinicola Drosophila sulfurigaster sulfurigaster Drosophila sulfurigaster bilimbata Drosophila nasuta Drosophila albomicans Drosophila sulfurigaster albostrigata Drosophila curviceps Drosophila siamana Drosophila hypocausta Drosophila ruberrima Drosophila immigrans Zaprionus spec. Zaprionus inermis Zaprionus sepsoides Zaprionus tuberculatus Zaprionus indianus Zaprionus ghesquierei Drosophila repletoides Liodrosophila aerea Hirtodrosophila spec. Hirtodrosophila pictiventris Hirtodrosophila thoracis Mycodrosophila spec. Mycodrosophila claytonae Mycodrosophila dimidiata Drosophila mojavensis Drosophila straubae Drosophila mulleri Drosophila stalkeri Drosophila buzzatii Drosophila hamatofila Drosophila mercatorum Drosophila peninsularis Drosophila neorepleta Drosophila repleta Drosophila hydei Drosophila ellisoni Drosophila gaucha Drosophila pavani Drosophila canalinea Drosophila aff. florae Drosophila bromeliae Drosophila acanthoptera Drosophila nannoptera Drosophila flexa Drosophila camargoi Drosophila aracataca Drosophila virilis Drosophila americana americana Drosophila americana texana Drosophila montana Drosophila ezoana Drosophila paramelanica Drosophila euronotus Drosophila melanica Drosophila nigromelanica Drosophila micromelanica Drosophila neokadai Drosophila quadrisetata Drosophila lacertosa Drosophila robusta Drosophila mimica Drosophila soonae Drosophila iki Drosophila tanythrix Drosophila adiastola Drosophila crucigera Drosophila grimshawi Drosophila gymnobasis Drosophila silvestris Drosophila eurypeza Drosophila biseriata Scaptomyza crassifemur Scaptomyza albovittata Scaptomyza palmae Scaptomyza adusta Dettopsomyia nigrovittata Drosophila busckii Drosophila erecta Drosophila teissieri Drosophila santomea Drosophila yakuba Drosophila mauritiana Drosophila simulans Drosophila sechellia Drosophila melanogaster Drosophila flavohirta Drosophila lutescens Drosophila takahashii Drosophila mimetica Drosophila biarmipes Drosophila eugracilis Drosophila ficusphila Drosophila lucipennis Drosophila fuyamai Drosophila elegans Drosophila auraria Drosophila biauraria Drosophila rufa Drosophila greeni Drosophila seguyi Drosophila nikananu Drosophila tsacasi Drosophila kikkawai Drosophila lini Drosophila serrata Drosophila birchii Drosophila ananassae Drosophila pallidosa Drosophila phaeopleura Drosophila malerkotliana malerkotliana Drosophila bipectinata Drosophila malerkotliana pallens Drosophila ercepeae Drosophila varians Drosophila persimilis Drosophila pseudoobscura bogotana Drosophila pseudoobscura pseudoobscura Drosophila algonquin Drosophila affinis Drosophila azteca Drosophila obscura Drosophila hubeiensis Drosophila bifasciata Drosophila guanche Drosophila subobscura Drosophila nebulosa Drosophila willistoni Drosophila paulistorum Drosophila equinoxialis Drosophila saltans Drosophila sturtevanti Drosophila emarginata Drosophila neocordata Hirtodrosophila duncani Scaptodrosophila lebanonensis lebanonensis Scaptodrosophila lebanonensis casteeli Scaptodrosophila galloi Scaptodrosophila stonei Scaptodrosophila deflexa Chymomyza procnemis Chymomyza amoena Scaptodrosophila dorsocentralis Scaptodrosophila latifasciaeformis Rhinoleucophenga bivisualis Rhinoleucophenga obesa Phortica picta Leucophenga varia

Below is a summary of the major groups of Drosophila:

The fundamental problem posed by Drosophila is that the existing taxonomy doesn't fit the tree. To fix this we could:

  1. Make everything Drosophila
    (Big genus, have to rename lots of species)
  2. Restrict Drosophila to a monophyletic group that includes the type species (D. funebris)
    (means Drosophila melanogaster can’t be Drosophila melanogaster)
  3. Change the type species to be Drosophila melanogaster
    (But nomenclature is not taxonomy, and ICZN said "no", see doi:10.21805/bzn.v67i1.a14)

At core of argument is notion that Linnean names have to match phylogeny.

Drosophila melanogaster

How would you assess the possible disruption caused by renaming the fruit fly Drosophila melanogaster to Sophophora melanogaster?

How could you deal with Drosophila melanogaster changing its name to Sophophora melanogaster?

Finding the original description of a name

Resolving what name to use for a taxon can require extensive bibliographic research, and in theory the entire scientific literature from Linneaus (1758 for animals) onwards is relevant (bacteria started with a clean slate in 1980).

Furthermore, there are few constraints on where a name can be published (but animal names can't be published online only [we've heard of this Internet thingy but we're having no truck with it], although this is changing (plant names can be published online only). As a consequence, names of organisms can be published in very obscure journals.

In an effort to try and link animal names to their original description online I've created http://bionames.org/. Hundreds of thousands of names are linked to the original publication. If the publication is freely available it is displayed on the site.

A more recent version of this resource is Species-Cite, which tries to link taxonomic names to both publications and people.

Homonyms

If two names are the same but refer to different organisms then those names are homonyms. For example, in July 2010 Lambert et al. (The giant bite of a new raptorial sperm whale from the Miocene epoch of Peru, http://dx.doi.org/10.1038/nature09067) published a paper in Nature that described an extinct sperm whale possessing the biggest bite of any tetrapod known. They named this formidable predator Leviathan melvillei, the genus name Leviathan being derived from the Hebrew 'Livyatan', the species name honouring Herman Melville (author of Moby Dick). As appropriate as this name was, it quickly ran foul of the rules of zoological nomenclature because Leviathan had already been used 169 years ago for an extinct species of mammoth (Description of Missourium, or Missouri leviathan, http://dx.doi.org/10.5962/bhl.title.35985). Although the name Leviathan Koch had lapsed into obscurity (as a synonym of Mammut Blummenbach) its existence meant the newly discovered whale had to be renamed, which it duly was in a month after the original publication.

Homonyms cause obvious problems when searching for data. If the same name is used for more than one taxon, then you may get a mixture of data for two unrelated taxa. Databases such as EOL maintain lists of homonyms (see Homonyms on EOL) to keep track of these problematic names.

Hemihomonyms

A special case of homonym are names that are used in different nomenclatural codes, e.g. Morus is both a plant and an animal. Many of these are included in EOL's homonym collection, but there is a database devoted just to hemihomonyms at http://herba.msu.ru/shipunov/os/homonyms/index.php.

Synonyms

Names can change over time, so that a single taxon can acquire a suite of names (synonyms). This can drive people a bit nuts.

Ryan Schenk's http://synynyms.com tool (now sadly ofline) displays the frequency of usage of taxonomic names in literature scanned by the Biodiversity Heritage Library (BHL), and was inspired by Google's Ngram viewer.

"Objective" and "subjective" synonyms

Objective synonyms occur when only the name changes, we are making no statement about the taxon. For example, if we move Pithecanthropus erectus to the genus Homo we get Homo erectus, which is an objective synonym of Pithecanthropus erectus.

Subjective synonyms occur when we assert, based on some data, that two taxa with different names are actually the same thing. It may be based on an explicit analysis using "objective" methods, but it is considered "subjective" in the sense that it isn't simply a logical consequence of nomenclature. An example of a subject synonym is the shrimp Rimicaris aurantiaca (described in A New Species Of Rimicaris (Crustacea: Decapoda: Bresiliidae) From The Snake Pit Hydrothermal Vent Field On The Mid atlantic Ridge, which subsequent genetic data showed likely to be juveniles of an already described species, Rimicaris exoculata (see Molecular systematics of shrimp (Decapoda: Bresiliidae) from deep-sea hydrothermal vents, I: Enigmatic 'small orange' shrimp from the Mid-Atlantic Ridge are juvenile Rimicaris exoculata).

Classifications

One of the challenges of dealing with biological classifications is simply navigating them. The following are examples of visualising classifications.

Map-like viewer

This is a simple viewer that uses the same interface as online maps to navigate a large tree.

Spacetree

This visualisation uses the JavaScript InfoVis Toolkit and is an updated version of a demo done in 2009.

View it at http://iphylo.org/~rpage/phyloinformatics/spacetree/

EOL on the iPad

A webapp to navigate three of the classifications provided by EOL (see EOL iPad web app using jQueryMobile).

View it at http://iphylo.org/~rpage/phyloinformatics/eoliphone/.
(this app no longer works due to changes in EOL's API and adoption of HTTPS)

Treemaps

This visualisation is an updated version of a demo done in 2008.

View it at http://iphylo.org/~rpage/phyloinformatics/treemap/

EOL Treemap is offline [2017]

EOL had a prettier treemap that was hosted at http://synthesis.eol.org/media/treemap but is now gone.

Lifemap

View it at Lifemap

Onezoom

View it at OneZoom

Other approaches

In many ways browsing classifications is similar to navigating the file system of a computer. There is a huge computer science literature on this problem, and some examples have made it into Hollywood movies:

The treevis.net site provides an extensive gallery of tree visualisations.

Browse treevis.net and pick a visualisation that you think would be useful for browsing classifications. What properties are looking for?

Exercises

Quiz on species names Menti quiz

Changing names for species

In this exercise we use Google's Ngram Viewer to explore the changing frequency of use of a name. The Ngram viewer is a fun tool to track how language changes. Enter some taxonomic names separated by a comma, e.g.:

Patterns of naming species over time

Timeline of new names

Go to BioNames and explore the timeline of new taxa. For example, you can see the timeline for animals:

What patterns do you see? Can you explain the peaks and troughs? What do you conclude about the current rate of description of new species?

You can use the treemap to navigate through the timelines. For example, you can navigate to snakes (http://bionames.org/timeline/Animalia/Chordata/Vertebrata/Reptilia/Lepidosauria/Squamata/Serpentes):

Why is there such a peak of new snake names in 2012? (Note, you can click on the peak in the graph to see a list of papers published that year).

Evaluate large tree viewers

A major visualisation challenge is viewing large trees, whether classifications or phylogenies. In this exercise we will compare three visualisations:

Menti link

Beyond names: DNA barcodes

Partly driven by new technology, and partly driven by an exasperation with the rate of taxonomic description, some researchers have developed DNA "barcodes" to help both identify existing species and discover new ones.

Barcodes in GenBank

"Dark taxa" is a term with at least two meanings, one is taxa that are unknown, that is they have yet to be discovered. The other is that taxa that we have discovered and sequenced but we haven't yet named, hence they languish in sequence databases without proper scientific name (see Dark taxa: GenBank in a post-taxonomic world).

Below are two DNA barcodes that are also in GenBank. Look at each barcode, does Genbank treat them the same? If not, why not?

DNA barcoding and robots

Rudolf Meier's group in Berlin (previously based in Singapore) is developing completely automated approaches to barcoding invertebrates.

Taxonomic groups based on images

Above is a striking visualisation of butterfly specimens from the Natural History Museum collections. For details on how this was constructed see Marian Kleineberg's blog post and the code on GitHub.

As you browse this, can you find any groups that might not be taxonomic?

Automatic taxonomic names

If indeed there are millions of species yet to be described then how do we name them? One suggestion is to generate new names algorithmically (a bit like What Three Words does for places). These could simply be pre-generated names that we choose from, or names that encode a measure of genetic similarity.

The fate of taxonomy

Taxonomists are continually complaining that they are underfunded, under appreciated, and in danger of going extinct. The data doesn't always support this. A controversial study "The population ecology and social behaviour of taxonomists" https://doi.org/10.1016/j.tree.2011.07.010 argued that there are more taxonomists than there have ever been, and that they may be running out of species to describe.

Getting data on taxonomists can be challenging, but we can use Wikidata to help out. Here are some example queries:

Questions to think about

  1. How many web sites have machine-readable data available?
  2. How important are taxonomic names to finding biological information?
  3. How can we deal with the multiple names? Do we try and reconcile across multiple sources, or pick one as definitive?
  4. What is the best way to browse a taxonomic classification? How would you decide (i.e., what criteria are relevant?)

Reading

Comments