Biodiversity Informatics

Projects

For this course the project component is deliberately not prescriptive so that you have maximum freedom. However, freedom can be a little scary, so there are some suggestions for possible topics below. Any project will need to be submitted via Moodle. If you have created something online (e.g., a web site, a spreadsheet, some scripts, etc.) then the project report itself can be an A4 page describing what you did and why. If you have not created something online, for example, you are reviewing some existing web sites of data sources, or doing a data analysis, then something closer to 3 pages (approx 1500 words) would be more appropriate. Below are the guidlines used when marking the project.

Project that used tools to do an analysis

You could use a tool described on the course, or one you found to analyse some data (either publicly available, or data you are working on).

Project that created a tool/website/dataset

You could create a dataset (e.g., a Google Spreadsheet, a KML file, etc.), a web site (e.g., a “shiny” app, and online map), etc.

Project that evaluates an existing datasource or website

A review or evaluation of a database or website, either from the course or one you have found.

Create something cool!

This category exists for things that maybe unusual but also interesting.

Past projects

Some examples of past projects.

Title/topicTools
Is the ‘Blue Planet Effect’ real? Assessing the impact of the 2017 documentary on public awareness of our oceans using Google trend dataGoogle Trends,R
Global trends of captive cetacean births, deaths, captures, and transports: An analysis using CETA-Base data
Animal extinctions for the past 90 yearsKML
EpiMap (Time Map) Design Reportreview
Comparing the information available in large-scale citizen science data sources with Global Biodiversity Information Facility data regarding endangered invertebrate speciesWikidata, iNaturalist
Dashboard of Middle East Corona Virus Cases in Saudi Arabia 2019Google Data Studio
Corona virus in ChinaExcel, R
Review of large tree viewing appsReview
Review of telemetry dataMovebank, R
Malaria vector distributionsGBIF, R, shiny
IUCN and GeoCAT redlist comparisonsGBIF, IUCN
Accuracy of automated identificationsiNaturalist, Google Lens, R
Bird distribution in UgandaKML, Python, Tableau
Survey of people's ability to interpet evolutionary treesGoogle Forms
Data extraction from a PDF, upload to FigshareTabula, Figshare, iLovePdf

For some possible project ideas please the list below, or see Possible project ideas (these are intended for 4th year and/or Masters students and so are intended solely to get you thinking).

Ideas

The ideas below are designed to give you some idea of the sort of projects that you could do. Feel free to suggest your own topic.

Data cleaning

Getting data from a paper

One of the first steps in analysing data is to extract it from whatever format it is in. Can you find a dataset that would be useful but is not easy to access? For example, maybe there is a list of animal traits which is in a PDF and you want to convert it to a file for further analysis (maybe even make it available to others to use through, say, figshare. Find a dataset, and use any available tools (e.g., Tabula) to extract the data.

Cleaning a list of host names

GenBank has a lot of sequences with host information, but this if often recorded in an informal way. Can we clean these names and link them to external identifiers, such as EOL taxon ids? One approach would be to use Open Refine. If interested in this project, data will be provided.

Data mining

Can we extract host-parasite associations from the titles of papers?

Using the wordtree example as a starting point, can we extract a list of host parasite associations from paper titles? For example, can you write one or more regular expressions that extract the associations? You could use the regular expression tester to try out various regular expressions.

Evaluate the performance of taxon name finding tools

There are several tools to automatically find taxonomic names in text, e.g., http://gnrd.globalnames.org and http://www.ubio.org/tools/recognize.php. Test how effective they are by appling them to examples of scientific text. You could use, for example, articles in the journal Zookeys which has names already identified for you.

Visualisation

Navigating taxonomies on small screens

Find at least three smart phone apps (or webs sites designed for smart phones) that enable you to navigate through a taxonomic classification (e.g., Field Guide to Victorian Fauna). Comapre and contrast how easy it is to navigate using those apps.

Geophylogenies

Construct a geophylogeny for a set of species

Can you create some geophylogenies (using, say the tools on this site, or GenGIS) for a group of taxa? Can you use these tools to test a specific hypothesis of interest? Try alternative visualisations (e.g., Google Earth versus GenGIS), what are the strengths and weakness of each approach?

Taxonomy

Reconciling taxonomic names

There are an increasing number of tools to clean lists of taxonomic names, can you evaluate how useful these are? For example, create a list of taxonomic names and compare how each tool peforms with those names.

Name changes and obsolete labels

How fast do taxonomic names change? What are the implications of this for people using these names? For example, for a public museum such as the Hunterian Zoology museum, how many names for the animals exhibited have changed since the labels were printed?

Linking bird names to literature

Avibase has a copy of Peter's checklist of birds of the world as an Excel spreadsheet, see The Peters' Check-list of the Birds of the World Database (local copy here). Can you parse the text and locate the references online?

Mapping

Where are new species being found?

Find a list of recently described species (for example, follow a journal like Zootaxa on Twitter, or Wikispecies, and locate as many on a map. Compare this to the distribution of protected area on http://protectedplanet.net.

Extracting geographic data from papers

A lot of geographic information is locked up in scientific papers. Can you write a regular expression to extract latitude and longitudes from a paper? Can you add this information to Google Earth?

Conservation

How useful is GBIF for conservation?

Take a number of species on the IUCN Red List and find them in GBIF. Is there sufficient information in GBIF for you to assess the status of those species? You could investigate the use of GeoCAT using GBIF data. See also IUCN Red List assessments

Ecological associations

Add data to Global Biotic Interactions (GloBI)

Find a data set for ecological interactions (e.g., host and parasites) and add it to Global Biotic Interactions (GloBI).

Biodiversity knowledge

What are the natural language questions people want answered about biodiversity?

Google knows the answer to "how many species are there?" (try it). What are the other questions we could ask (Google can help tell us, see Possible project: natural language queries, or answering "how many species are there?" ). Can we create a list of these questions, and can we work out how we could answer them. For example, could you answer these questions using GBIF?

Metascience

Diversity in taxonomy

Morgan Jackson wrote a blog post entitled Gender Issues in Taxonomy: more than just Latin. He proposed a challenge:

what proportion of authors in taxonomic papers are women, are they more likely to be first author, last author, or somewhere in the middle, and what proportion of taxa have been described by women?
The project would be to tackle this challenge.

The population ecology and social behaviour of taxonomists

This project would be a reanalysis and extension of Joppa et al. "The population ecology and social behaviour of taxonomists" (doi:10.1016/j.tree.2011.07.010). They analysed a subset of taxa and suggested that taxonomy is far from being in decline, but do their conclusions hold across all taxa? They provide R scripts for their analyses (but not the data), but there are other sources of data such as BioNames).

Comments