Biodiversity Informatics

Geography

Lecture

Maps

The availability of easy to use online maps (e.g., Google Maps), together with the relative easy of geotagging data using GPS has resulted in an explosion of mapping services. These can be used to display spatial data, or to navigate through data using geography.

Alternatives to Google

There are numerous alternatives to Google Maps and Google Earth. Perhaps the best known is OpenStreetMap, which is a free, collaboratively edited map of the world (see video below for an animation of one year's edits).

Other mapping projects include Polymaps, and spatial analysis tools such as CartoDB.

GBIF

GBIF currently has over 1.9 billion data records, of which 1.8 billion have geographic coordinates, meaning that they can be placed on a map. The data is available from https://www.gbif.org.

To get an idea of the impact of GBIF data you can browse GBIF literature resources. Look for the DOI tags for GBIF data cited by those papers.

Georeferencing errors

A side effect of aggregating and displaying huge amounts of geographic data, for example, in GBIF is the discovery of errors. Although errors can be frustrating, exposing them means they can be fixed. For example, in 2008 I discovered errors in the GBIF map for Pagellus erythrinus). The map below shows two sets of points. The points were the localities displayed by GBIF, which seemed odd given that this is a Mediterranean fish. However, all that is needed to move the fish to where it might be expected ( symbols) is to swap the latitude and longitudes.

Error, browser must support "SVG"

Another example is of a frog that ended up in the Atlantic Ocean. GBIF originally displayed the following map for the frog Boophis periegetes:

In this case, one record had latitude and longitude swapped, the other had the longitude mistakenly recorded as being from the Western Hemisphere, instead of the Eastern Hemisphere. Once these are corrected, the frog records return to Madagascar:

Error, browser must support "SVG"

Large scale cleaning

The examples above are fairly small scale, but GBIF is a huge database so cleaning it requires some impressiv eprocessing power. For details see Here be dragons - mapping occurrence data which explains how the maps below were generated:


Map 1: Verbatim (unprocessed) occurrence data coordinates for the USA


Map 2: Results of current data portal processing for occurrences in the USA


Map 3: Results of new processing workflow for occurrences in the USA

Metabarcoding and metacrap

An example of a strange distribution (Paroecanthus).

In 2020 I noticed that the distribution of Paroecanthus seemed odd. I wrote about this on the GBIF forum. There was a problem with mapping picoplankton DNA sequences to taxonomic names, resulting in some rather strange distributions.

Geophylogenies

Around 2007 several researchers started exploring displaying phylogenies on Google Earth (see Google Earth phylogenies for some links, and David Kidd's paper Geophylogenies and the Map of Life, http://dx.doi.org/10.1093/sysbio/syq043 for a review. Here we will look at some examples, and create a geophylogeny.

Drawing trees

To draw a tree on a computer screen we need to locate the nodes in the Euclidean space, in other words, each node needs to be assigned x and y coordinates. The x and y coordinates for the leaf nodes are straightforward to work out. The coordinates for an internal node is based on those of its descendants. Hence, in order to work out where to put the internal nodes we need to first visit their descendants. Walking around a tree in this order is called a postorder traversal.

Below we've put the tree on a 3 × 2 grid (note that the origin of the coordinates is the top left). We are going to evenly space the internal nodes along the x-axis, and the leaves along the -axis. All the leaves will line up at x=3, so it just remains to find the y values for the internal nodes. The y value for node 3 is the midpoint of the values for node 1 (2) and 2 (1), i.e. 1.5. Now that we have the y coordinate for node 3, we can compute the y coordinate for node 5, which is the midpoint of the values for node 3 (1.5) and 4 (0), i.e. 0.75.

Drawing trees on Google Earth

Now that we can draw trees, we need to translate the method for computing simple x-y Euclidean coorindates into latitude and longitude. For details on how to do this see, for example, Earth not flat - official. The shortest path between two points on a sphere is not a straight line but a great circle, so when we compute the coordinates of an internal node we need to find the great circle upon which the two descendant nodes fall, and then find the midpoint of the arc between the two child nodes:

Examples

Download

Avian flu virus

David Kidd has created an ArcGlobe visualisation for terrestrial mammals.

This is just one of a series of videos David has created.

Spatial browsing

An obvious use of a map is to display the spatial distribution of some data, for example the occurrences of a particular taxon. But we can also use a map to explore data. For example, if we wanted to find taxonomic papers related to Australian species, we could do a Google search that included the term "Australia". Alternatively, if we had a database of taxonomic descriptions, and these where geotagged (for example, if a paper mentions the latitude and longitude of a locality), then we could do a spatial query. For example, we could define the "bounding box" of Australia as the rectangle that encloses that continent. We could then search our database for papers with localities inside that bounding box. For an example of this see https://biostor.org/map.php.

The part of the map being displayed automatically defines a bounding box, so we could implement a simple spatial browser that shows us the taxonomic papers (or species, or whatever object is of interest) for just that region.

Timelines and maps

Below is a nice exmaple of combining a map and a timeline by @Alex_Abair (source code on Github).

Grids rather than points

For spatial analysis we often want to compare measures (e.g., diversity) in different areas. If those areas are of different sizes (e.g., countries) then comparisons become problematic. Another approach is to divide the surface of the planet into equally sized areas so each unit is comparable.

Examples of this approach include Uber H3 and Foursquare.

Load a discrete grid into Google Earth. For example, try this file: isea3h5.kml

Physical models

The Great Globe project is a 1:100,000 scale model of the earth to be constructed in the southwest United States.

Latitude and longitude

Some simple exercises to understand latitude and longitude (Mentimeter).

Drawing on maps with GeoJSON

GeoJSON is a simple format for describing geographic features, for example:

{
    "type": "Point",
    "coordinates": [
        -105.01621,
        39.57422
    ]
}

You can experiment with GeoJSON at GeoJSONLint.

Google Earth

You can download Google Earth, or run it online at https://earth.google.com/web/ (I've found that it works best in Google Chrome).

If you install it on your computer it will open KML files for you. If you are using the web version you need to add KML files by creating a new "Project" and importing a KML file from your computer.

Adding points

Adding points

We can use KML tool to make a KML file for a set of points.

Where are new species of frogs found?

The KML file frogs.kml contains the locations of recently described frog species (obtained by data mining the Amphibia Species of the World web site). Load this file into Google Earth. Where are the new frogs species being found? Are there any records you might be suspicious of?

Now that we know where the new species of frogs are being found, we can ask whether these new species in conservation areas or not (presumably being in a conservation area increases there chances of avoiding extinction). To answer this question we need to be able to plot conservation area on Google Earth.

One source of KML files for Google Earth used to be protectedplanet.net. This feature has since been removed, but you can download SHP files and convert them using a tool such as MyGeodata Converter.

As a starting point, here is the KML file for the Ranomafana National Park in Madagascar. If you download this file and open it in Google Earth you will be able to see whether any recently described frogs are in that protected area.

Comparing two sets of data about a species distribution

GBIF data comes from museum collections and observations of living organisms. Inevitably this data will be affected by sampling bias (organisms won't be found where people haven't looked) as well as digitisation bias (not every museum collection or database of observations). There may also be data that hasn't been added to GBIF. In this exercise we will explore Flickr as a source of distributional data. Flickr contains a wealth of images of organisms, many tagged by their scientific name. Many of these images are geotagged (e.g., they were taken with a camera or phone that has GPS built in), which means we can place them on a may. Given that Flickr exports KML files, it is easy to add these images to Google Earth and compare them with GBIF data.

As an example, we can get images for the spider Leucauge magnifica by searching Flickr for the tag "leucaugemagnifica". Grab the search results as a KML file (you should see a link near the bottom of the page, or local copy here) and load this in to Google Earth.

Now go to the GBIF data portal (http://www.gbif.org) and search for Leucauge magnifica (if you are impatient then click on http://www.gbif.org/species/2151499/ or grab a local copy of the KML file here).

Are the Flickr photos within the distributional range from GBIF? If not, how would you interpret this?

Make a geophylogeny

To create a geophylogeny we need a phylogeny where all the taxa have latitude and longitude data. For this exercise we will use the Banza example discussed at Google Earth phylogenies, based on data from Molecular phylogeny of Banza (Orthoptera: Tettigoniidae), the endemic katydids of the Hawaiian Archipelago (http://dx.doi.org/10.1016/j.ympev.2006.04.006).

Below is a NEXUS file (a format widely used in phylogenetics) for the Banza that has geographic location encoded in the taxon name (the part beginning lat=).

To convert this into KML go to tkml.php and paste the NEXUS file into the form. The web site will create a KML file and your browser will download this file (called tree.kml). Open it in Google Earth. It should look something like this:

Given what we know about the Hawaiian islands, how would you interpret this geophylogeny?

Evaluate online maps

The following web sites use maps to display biodiversity information.

Visit these web sites and evaluate them using the following criteria

Maps for anything

Wikidata is a structured database of human knowledge, originally designed to underpin the various Wikipedias, but which has grown in scope. It has a powerful query language that enables you to query almost anything. Here are some example queries that generate maps:

Where next?

In this session we've focussed on data that is georeferenced, that is, we know where on the Earth's surface it was located. For a lot of legacy data we will not have GPS coordinates. Tools such as BioGeomancer (2012-01021 link seems to be broken, try https://sites.google.com/site/biogeomancerworkbench/)aim to take a verbal description of where an observation was made (e.g., "Cambodia: Ratanakiri Province, Virachey National Park") and convert that into latitude and longitude.

As an experiment I've built a tool called the GBIF geocoder that takes a string such as "Cambodia: Ratanakiri Province, Virachey National Park" and converts that into a set of coordinates based on searching GBIF for records with similar localities.

Tools to create geophylogenies are in their infancy, and not widely used. It would be interesting to think about how to analyse geophylogenies, for example, can we make inferences about biogeographic processed based on how many lines cross particular geographic features? Can we compare heights of lines (a function, say, of molecular evolution) to infer whether geographic patterns are the same age?

Comments