Some examples of past projects.
Title/topic | Tools |
---|---|
Is the ‘Blue Planet Effect’ real? Assessing the impact of the 2017 documentary on public awareness of our oceans using Google trend data | Google Trends,R |
Global trends of captive cetacean births, deaths, captures, and transports: An analysis using CETA-Base data | |
Animal extinctions for the past 90 years | KML |
EpiMap (Time Map) Design Report | review |
Comparing the information available in large-scale citizen science data sources with Global Biodiversity Information Facility data regarding endangered invertebrate species | Wikidata, iNaturalist |
Dashboard of Middle East Corona Virus Cases in Saudi Arabia 2019 | Google Data Studio |
Corona virus in China | Excel, R |
Review of large tree viewing apps | Review |
Review of telemetry data | Movebank, R |
Malaria vector distributions | GBIF, R, shiny |
IUCN and GeoCAT redlist comparisons | GBIF, IUCN |
Accuracy of automated identifications | iNaturalist, Google Lens, R |
Bird distribution in Uganda | KML, Python, Tableau |
Survey of people's ability to interpet evolutionary trees | Google Forms |
Data extraction from a PDF, upload to Figshare | Tabula, Figshare, iLovePdf |
For some possible project ideas please the list below, or see Possible project ideas (these are intended for 4th year and/or Masters students and so are intended solely to get you thinking).
The ideas below are designed to give you some idea of the sort of projects that you could do. Feel free to suggest your own topic.
One of the first steps in analysing data is to extract it from whatever format it is in. Can you find a dataset that would be useful but is not easy to access? For example, maybe there is a list of animal traits which is in a PDF and you want to convert it to a file for further analysis (maybe even make it available to others to use through, say, figshare. Find a dataset, and use any available tools (e.g., Tabula) to extract the data.
GenBank has a lot of sequences with host information, but this if often recorded in an informal way. Can we clean these names and link them to external identifiers, such as EOL taxon ids? One approach would be to use Open Refine. If interested in this project, data will be provided.
Using the wordtree example as a starting point, can we extract a list of host parasite associations from paper titles? For example, can you write one or more regular expressions that extract the associations? You could use the regular expression tester to try out various regular expressions.
There are several tools to automatically find taxonomic names in text, e.g., http://gnrd.globalnames.org and http://www.ubio.org/tools/recognize.php. Test how effective they are by appling them to examples of scientific text. You could use, for example, articles in the journal Zookeys which has names already identified for you.
Find at least three smart phone apps (or webs sites designed for smart phones) that enable you to navigate through a taxonomic classification (e.g., Field Guide to Victorian Fauna). Comapre and contrast how easy it is to navigate using those apps.
Can you create some geophylogenies (using, say the tools on this site, or GenGIS) for a group of taxa? Can you use these tools to test a specific hypothesis of interest? Try alternative visualisations (e.g., Google Earth versus GenGIS), what are the strengths and weakness of each approach?
There are an increasing number of tools to clean lists of taxonomic names, can you evaluate how useful these are? For example, create a list of taxonomic names and compare how each tool peforms with those names.
How fast do taxonomic names change? What are the implications of this for people using these names? For example, for a public museum such as the Hunterian Zoology museum, how many names for the animals exhibited have changed since the labels were printed?
Avibase has a copy of Peter's checklist of birds of the world as an Excel spreadsheet, see The Peters' Check-list of the Birds of the World Database (local copy here). Can you parse the text and locate the references online?
Find a list of recently described species (for example, follow a journal like Zootaxa on Twitter, or Wikispecies, and locate as many on a map. Compare this to the distribution of protected area on http://protectedplanet.net.
A lot of geographic information is locked up in scientific papers. Can you write a regular expression to extract latitude and longitudes from a paper? Can you add this information to Google Earth?
Take a number of species on the IUCN Red List and find them in GBIF. Is there sufficient information in GBIF for you to assess the status of those species? You could investigate the use of GeoCAT using GBIF data. See also IUCN Red List assessments
Find a data set for ecological interactions (e.g., host and parasites) and add it to Global Biotic Interactions (GloBI).
Google knows the answer to "how many species are there?" (try it). What are the other questions we could ask (Google can help tell us, see Possible project: natural language queries, or answering "how many species are there?" ). Can we create a list of these questions, and can we work out how we could answer them. For example, could you answer these questions using GBIF?
Morgan Jackson wrote a blog post entitled Gender Issues in Taxonomy: more than just Latin. He proposed a challenge:
what proportion of authors in taxonomic papers are women, are they more likely to be first author, last author, or somewhere in the middle, and what proportion of taxa have been described by women?The project would be to tackle this challenge.
This project would be a reanalysis and extension of Joppa et al. "The population ecology and social behaviour of taxonomists" (doi:10.1016/j.tree.2011.07.010). They analysed a subset of taxa and suggested that taxonomy is far from being in decline, but do their conclusions hold across all taxa? They provide R scripts for their analyses (but not the data), but there are other sources of data such as BioNames).