Biodiversity Informatics

Introduction

What is "biodiversity informatics"?

According to Wikipedia:
Biodiversity Informatics is the application of informatics techniques to biodiversity information for improved management, presentation, discovery, exploration and analysis. It typically builds on a foundation of taxonomic, biogeographic, or ecological information stored in digital form, which, with the application of modern computer techniques, can yield new ways to view and analyse existing information, as well as predictive models for information that does not yet exist (see niche modelling). Biodiversity informatics is a relatively young discipline (the term was coined in or around 1992) but has hundreds of practitioners worldwide, including the numerous individuals involved with the design and construction of taxonomic databases. The term "Biodiversity Informatics" is generally used in the broad sense to apply to computerized handling of any biodiversity information; the somewhat broader term "bioinformatics" is often used synonymously with the computerized handling of data in the specialized area of molecular biology.
For some reviews of the field, see:

What is this course about?

This course provides an opinionated survey of a range of topics in biodiversity informatics, with an emphasis on data discovery and visualisation. Rather than focus on a particular set of questions, or a particular technology (e.g., R) the course ranges widely over a range of topics and aims to give you a sense of the diversity of relevant data, and the varied ways to explore and analyse that data.

Lecture

Notes on lecture

Below are some notes and updates on things discussed in the lecture.

Linking

Geography

Taxonomy

Exercises

Below are some activities that we will do during this session. If this course is being taught online we will make use of Mentimeter for many of these exercises.

Introducing yourself

How do you find information on a species you are interested in?

We will create a list of sites/databases/etc. that we can use to find out information on a species (Menti). Add your results to a shared spreadsheet at http://bit.ly/2ON8ldi.

Finding out about one species

One basic challenge is to find out what we know about a species. Pick an organism and find out the following:

  1. What does it look like?
  2. Were does it live?
  3. What is it's evolutionary history (e.g., closest living relative
  4. Where was it's original description?
  5. Has it been sequenced?
  6. How big is it?
  7. What does it eat?
  8. What parasites does it have? (or if it's a parasite, what are its hosts?)

Possible organisms (or try your own favourites):

Add your results to a shared spreadsheet at http://bit.ly/3qyHBuB.

Encyclopedia of Life (EOL)

Comparative biology, crossing the digital divide, has begun a still largely unheralded revolution: the exploration and analysis of biodiversity at a vastly accelerated pace. Its momentum will return systematics from its long sojourn at the margin and back into the mainstream of science. Its principal achievement will be a single-portal electronic encyclopedia of life.
- E. O. Wilson https://doi.org/10.1016/S0169-5347(02)00040-X

Inspired by the late E. O. Wilson the Encyclopedia of Life aims to be a "one-stop-shop for biodiversity information. Using a species that you are interested (or one from the list above), how well do you think it meets this goal? How does it compare to simple mashups such as http://ispecies.herokuapp.com?

The challenge of getting data from web sites

Web sites may have lots of interesting data, but in a format intended to look nice to people. This can make it hard to extract the underlying data if you want to do soemthing else with that data.

For example below is the Encyclopedia of Life Images group on Flickr (a photo sharing site that predates Instagram). If you wanted to make your own map how would you get this data?

Most web pages are designed to be viewed by people not computers, for example http://en.wikipedia.org/wiki/Mantidactylus.

Viewing web sites as a machine

Data typeURL
HTMLhttp://en.wikipedia.org/wiki/Mantidactylus
CSVhttp://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=DESCRIBE+%3Chttp://dbpedia.org/resource/Mantidactylus%3E&format=text%2Fcsv
JSONhttp://dbpedia.org/data/Mantidactylus.json
XMLhttp://dbpedia.org/data/Mantidactylus.rdf

Using APIs to make something new

If a web site has an API then we can get the data and do something new and different with that data. For example, Flickr has an RSS feed. This provides the underlying data in a form that computers can use.

Using APIs to make something new

If we can access the data then we can start to build completely new tools. For example, BioRSS takes RSS feeds from multiple sources (such as scientific journals) and creates a web site that lists the most recent species discoveries.

Creating data

We can be more than passive consumers of data, we can contribute data, either via data archives (such as FigShare), or via citizen science projects such as iNaturalist.

Download the iNaturist app (available for both iOS and Android) and explore its functionality.

Note that community-based tools such as iNaturalist depend on the community being well-behaved, for an example of where thiungs can get messy see [Taxacom] iNaturalist and the dangers of community ID sites!.

AI

Since the end of 2022 there has been a lot of interest in AI tools such as ChatGPT and how they can be used to summarise knowledge and answer questions. ChatGPT is sometimes overloaded, but we can try to use it. Some users have reported problems getting sensible answers out of it, e.g. @itstimconnors.

Questions to think about

Comments