The cyber-centipede: From Linnaeus to big data
Taxonomic descriptions, introduced by Linnaeus in 1735, are designed to allow scientists to tell one species from another. Now there is a new futuristic method for describing new species that goes far beyond the tradition. The new approach combines several techniques, including next generation molecular methods, barcoding, and novel computing and imaging technologies, that will test the model for big data collection, storage and management in biology. The study has just been published in the Biodiversity Data Journal.
While 13,494 new animal species were discovered by taxonomists in 2012, animal diversity on the planet continues to decline with unprecedented speed. Concerned with the rapid disappearance rates scientists have been forced towards a so called 'turbo taxonomy' approach, where rapid species description is needed to manage conservation.
While acknowledging the necessity of fast descriptions, the authors of the new study present the other 'extreme' for taxonomic description: "a new species of the future". An international team of scientists from Bulgaria, Croatia, China, UK, Denmark, France, Italy, Greece and Germany illustrated a holistic approach to the description of the new cave dwelling centipede species Eupolybothrus cavernicolus, recently discovered in a remote karst region of Croatia. The project was a collaboration between GigaScience, China National GeneBank, BGI-Shenzhen and Pensoft Publishers.
Eupolybothrus cavernicolus has become the first eukaryotic species for which, in addition to the traditional morphological description, scientists have provided a transcriptomic profile, DNA barcoding data, detailed anatomical X-ray microtomography (micro-CT), and a movie of the living specimen to document important traits of its behaviour. By employing micro-CT scanning in a new species, for the first time a high-resolution morphological and anatomical dataset is created - the 'cybertype' giving everyone virtual access to the specimen.
"Communicating the results of next generation sequencing effectively requires the next generation of data publishing" says Prof. Lyubomir Penev, Managing director of Pensoft Publishers. "It is not sufficient just to collect 'big' data. The real challenge comes at the point when data should be managed, stored, handled, peer-reviewed, published and distributed in a way that allows for re-use in the coming big data world", concluded Prof. Penev.
"Next generation sequencing is moving beyond piecing together a species genetic blueprint to areas such as biodiversity research, with mass collections of species in "metabarcoding" surveys bringing genomics, monitoring of ecosystems and species-discovery closer together. This example attempts to integrate data from these different sources, and through curation in BGI and GigaScience's GigaDB database to make it interoperable and much more usable," says Dr Scott Edmunds from BGI and Executive Editor of GigaScience.