Projects

For this course the project component is deliberately not prescriptive so that you have maximum freedom. However, freedom can be a little scary, so there are some suggestions for possible topics below.

Use any of the tools we explored to analyse some data (which could be your data, or soem publicly available data).
Do an analysis based on some data discovered in the course. Instead of discovering a new tool you may have discovered some new source of data that you want to explore in more detail.
Evaluate an existing data source or web site (e.g., write a critical review of a database).
Do something cool! "Cool" is deliberately not defined, but past projects have included a 3D phylogeny created in MineCraft.

Any project will need to be submitted via Moodle. If you have created something online (e.g., a web site, a spreadsheet, some scripts, etc.) then the project report itself can be an A4 page describing what you did and why. If you have not created something online, for example, you are reviewing some existing web sites of data sources, or doing a data analysis, then something closer to 3 pages (approx 1500 words) would be more appropriate. Below are the guidlines used when marking the project.

Project that used tools to do an analysis

You could use a tool described on the course, or one you found to analyse some data (either publicly available, or data you are working on).

why did you use this tool (as opposed to any alternatives?)
what problem does the tool solve? (Why does the problem matter?)
did the tool work as advertised (can you critically evaluate its strengths and weaknesses)
presentation of results (how well did you summarise the results of the analysis)

Project that created a tool/website/dataset

You could create a dataset (e.g., a Google Spreadsheet, a KML file, etc.), a web site (e.g., a “shiny” app, and online map), etc.

justification of problem solved (why does the problem matter?))
is something new needed (why are existing solutions inadequate?))
what did you learn in making the tool (was it easy or hard?))
how can others make use of the tool (can they use their own data, can they use it as inspiration?)

Project that evaluates an existing datasource or website

A review or evaluation of a database or website, either from the course or one you have found.

justification for choice of source or website (why does it matter?)?
description of criteria for evaluation (how did you evaluate it?)?
comparison with other sources (are you aware of similar resources?)?
suggestions for improvements (did you offer constructive criticism?)?

Create something cool!

This category exists for things that maybe unusual but also interesting.

why is the thing cool?
what problem does it solve?
is this a one off or could people apply the same approach to other problems?
is this a gimmick or something actually useful?

Past projects

Some examples of past projects.

Title/topic	Tools
Is the ‘Blue Planet Effect’ real? Assessing the impact of the 2017 documentary on public awareness of our oceans using Google trend data	Google Trends,R
Global trends of captive cetacean births, deaths, captures, and transports: An analysis using CETA-Base data
Animal extinctions for the past 90 years	KML
EpiMap (Time Map) Design Report	review
Comparing the information available in large-scale citizen science data sources with Global Biodiversity Information Facility data regarding endangered invertebrate species	Wikidata, iNaturalist
Dashboard of Middle East Corona Virus Cases in Saudi Arabia 2019	Google Data Studio
Corona virus in China	Excel, R
Review of large tree viewing apps	Review
Review of telemetry data	Movebank, R
Malaria vector distributions	GBIF, R, shiny
IUCN and GeoCAT redlist comparisons	GBIF, IUCN
Accuracy of automated identifications	iNaturalist, Google Lens, R
Bird distribution in Uganda	KML, Python, Tableau
Survey of people's ability to interpet evolutionary trees	Google Forms
Data extraction from a PDF, upload to Figshare	Tabula, Figshare, iLovePdf

For some possible project ideas please the list below, or see Possible project ideas (these are intended for 4th year and/or Masters students and so are intended solely to get you thinking).

Ideas

The ideas below are designed to give you some idea of the sort of projects that you could do. Feel free to suggest your own topic.

Data cleaning

Getting data from a paper

One of the first steps in analysing data is to extract it from whatever format it is in. Can you find a dataset that would be useful but is not easy to access? For example, maybe there is a list of animal traits which is in a PDF and you want to convert it to a file for further analysis (maybe even make it available to others to use through, say, figshare. Find a dataset, and use any available tools (e.g., Tabula) to extract the data.

Cleaning a list of host names

GenBank has a lot of sequences with host information, but this if often recorded in an informal way. Can we clean these names and link them to external identifiers, such as EOL taxon ids? One approach would be to use Open Refine. If interested in this project, data will be provided.

Data mining

Can we extract host-parasite associations from the titles of papers?

Using the wordtree example as a starting point, can we extract a list of host parasite associations from paper titles? For example, can you write one or more regular expressions that extract the associations? You could use the regular expression tester to try out various regular expressions.

Evaluate the performance of taxon name finding tools

There are several tools to automatically find taxonomic names in text, e.g., http://gnrd.globalnames.org and http://www.ubio.org/tools/recognize.php. Test how effective they are by appling them to examples of scientific text. You could use, for example, articles in the journal Zookeys which has names already identified for you.

Visualisation

Navigating taxonomies on small screens

Find at least three smart phone apps (or webs sites designed for smart phones) that enable you to navigate through a taxonomic classification (e.g., Field Guide to Victorian Fauna). Comapre and contrast how easy it is to navigate using those apps.

Geophylogenies

Construct a geophylogeny for a set of species

Can you create some geophylogenies (using, say the tools on this site, or GenGIS) for a group of taxa? Can you use these tools to test a specific hypothesis of interest? Try alternative visualisations (e.g., Google Earth versus GenGIS), what are the strengths and weakness of each approach?

Taxonomy

Reconciling taxonomic names

There are an increasing number of tools to clean lists of taxonomic names, can you evaluate how useful these are? For example, create a list of taxonomic names and compare how each tool peforms with those names.

Name changes and obsolete labels

How fast do taxonomic names change? What are the implications of this for people using these names? For example, for a public museum such as the Hunterian Zoology museum, how many names for the animals exhibited have changed since the labels were printed?

Linking bird names to literature

Avibase has a copy of Peter's checklist of birds of the world as an Excel spreadsheet, see The Peters' Check-list of the Birds of the World Database (local copy here). Can you parse the text and locate the references online?

Mapping

Where are new species being found?

Find a list of recently described species (for example, follow a journal like Zootaxa on Twitter, or Wikispecies, and locate as many on a map. Compare this to the distribution of protected area on http://protectedplanet.net.

Extracting geographic data from papers

A lot of geographic information is locked up in scientific papers. Can you write a regular expression to extract latitude and longitudes from a paper? Can you add this information to Google Earth?

Conservation

How useful is GBIF for conservation?

Take a number of species on the IUCN Red List and find them in GBIF. Is there sufficient information in GBIF for you to assess the status of those species? You could investigate the use of GeoCAT using GBIF data. See also IUCN Red List assessments

Ecological associations

Add data to Global Biotic Interactions (GloBI)

Find a data set for ecological interactions (e.g., host and parasites) and add it to Global Biotic Interactions (GloBI).

Biodiversity knowledge

What are the natural language questions people want answered about biodiversity?

Google knows the answer to "how many species are there?" (try it). What are the other questions we could ask (Google can help tell us, see Possible project: natural language queries, or answering "how many species are there?" ). Can we create a list of these questions, and can we work out how we could answer them. For example, could you answer these questions using GBIF?

Metascience

Diversity in taxonomy

Morgan Jackson wrote a blog post entitled Gender Issues in Taxonomy: more than just Latin. He proposed a challenge:

what proportion of authors in taxonomic papers are women, are they more likely to be first author, last author, or somewhere in the middle, and what proportion of taxa have been described by women?

The project would be to tackle this challenge.

The population ecology and social behaviour of taxonomists

This project would be a reanalysis and extension of Joppa et al. "The population ecology and social behaviour of taxonomists" (doi:10.1016/j.tree.2011.07.010). They analysed a subset of taxa and suggested that taxonomy is far from being in decline, but do their conclusions hold across all taxa? They provide R scripts for their analyses (but not the data), but there are other sources of data such as BioNames).