The integrative future of taxonomy
© Padial et al; licensee BioMed Central Ltd. 2010
Received: 22 December 2009
Accepted: 25 May 2010
Published: 25 May 2010
Skip to main content
© Padial et al; licensee BioMed Central Ltd. 2010
Received: 22 December 2009
Accepted: 25 May 2010
Published: 25 May 2010
Taxonomy is the biological discipline that identifies, describes, classifies and names extant and extinct species and other taxa. Nowadays, species taxonomy is confronted with the challenge to fully incorporate new theory, methods and data from disciplines that study the origin, limits and evolution of species.
Integrative taxonomy has been proposed as a framework to bring together these conceptual and methodological developments. Here we review perspectives for an integrative taxonomy that directly bear on what species are, how they can be discovered, and how much diversity is on Earth.
We conclude that taxonomy needs to be pluralistic to improve species discovery and description, and to develop novel protocols to produce the much-needed inventory of life in a reasonable time. To cope with the large number of candidate species revealed by molecular studies of eukaryotes, we propose a classification scheme for those units that will facilitate the subsequent assembly of data sets for the formal description of new species under the Linnaean system, and will ultimately integrate the activities of taxonomists and molecular biologists.
There is little doubt that the central unit for taxonomy is the species, and that associating scientific names unequivocally to species is pivotal for a reliable reference system of biological information . Since the advent of Linnaean nomenclature in 1758, taxonomists have been describing and naming thousands of species every year--currently around 15.000-20.000 among animals only [2, 3]--numbers that rapidly increase for many groups of organisms due to the incorporation of new tools for discovery and the exploration of poorly known areas of the planet [4–8]. Indeed, this progress is being made possible despite important impediments  because species taxonomy is resurging as a solid scientific discipline  that incorporates technological advances, such as virtual access to museum collections , high-throughput DNA sequencing , computer tomography , geographical information systems , and multiple functions of the internet . Also, taxonomic information is increasingly digitized and made available through several global initiatives, such as Species2000, The Encyclopaedia of Life (EOL), The Global Biodiversity Information Facility (GBIF), or ZooBank. The future has been envisioned to be an interactive "cybertaxonomy" with dynamic online description and publication of new species, and where updated taxonomic information would be accessible for almost everybody from everywhere [16, 17].
However, modern taxonomy still faces two major challenges. First, a qualitative challenge is to reach scientific consensus about the basic category around which taxonomy is built --the species-- and thus improve species delimitation. The second, a quantitative challenge, is the sheer number of species on earth that require discovery and description, estimated in at least 10 million, only considering eukaryotes, and of which a small fraction of less than 2 million have so far been named . These challenges are closely tied to two deliverables that the scientific community and society expects from taxonomy. On one hand, to provide empirical rigor to species hypotheses and stability to their names, which requires a careful and often painstaking and time-consuming labor of species delimitation. On the other hand, an acceleration in the pace of species description, with the peril of erroneous species hypotheses and thus of unstable names.
We here review recent proposals for the development of taxonomy that have been launched to meet both challenges. We argue that recent conceptual advances will allow taxonomy to improve species delimitation through the integration of theory and methods from disciplines studying the origin and evolution of species. Also, we emphasize the importance of applying existing integrative protocols and of developing novel ones for proceeding toward a complete inventory of life on Earth in a reasonable time.
Most evolutionary biologists will now agree that species are separately evolving lineages of populations or metapopulations [sensu ], with disagreements remaining only about where along the divergence continuum separate lineages should be recognized as distinct species [20, 21]. This emerging consensus might appear as a minor advance, but it has led to a renewed discussion about species delimitation that is paramount for catalyzing the integration of new knowledge and methods of population biology, phylogenetics, and other evolutionary disciplines into taxonomy [22–32]. Taxonomists are realizing that what matters for the study of speciation matters for taxonomy as well, and that species will be better delimited if we know what caused their origin and determined their evolutionary trajectories. As illustrative examples, the discovery and description of three Californian species of trapdoor spiders  required inferences about the origin, genetic structure and degree of ecological interchangeability of divergent lineages; and a recent taxonomic revision of cardinal birds  involved the reconstruction of the populational, phylogenetic and biogeographic history of lineages.
The framework of integration by cumulation is based on the assumption that divergences in any of the organismal attributes that constitute taxonomic characters can provide evidence for the existence of a species . This approach defends the view that because all taxonomic characters are contingent in existence, order of appearance, and magnitude of divergence during speciation, the only way for true integration is allowing any source of evidence--even a single one--to form the basis for species discovery. In this approach congruence is desired but not considered necessary . In practice, evidence from all character sets is assembled cumulatively, concordances and discordances are explained from the evolutionary perspective of the populations under study, and a decision is made based on the available information, which can lead to recognition of a species on the basis of a single set of characters if these characters are considered good indicators of lineage divergence (Figure 3b[41, 46]).
A major advantage of this approach is that it does not bind species delimitation to the identification of any particular biological property. Taxonomists can thus select and focus on the most appropriate set of taxonomic characters for each group of organisms. Indeed, this has been the traditional approach of morphological taxonomy (Figure 3b) before the massive incorporation of other characters. Also, cumulation is probably most suitable to uncover recently diverged species in adaptive radiations  due to the stepwise process of speciation along ecological gradients [57, 58]. The main limitation of the cumulative approach is that the uncritical use of a single line of evidence (e.g. a single locus of mtDNA) can lead to overestimation of species numbers (Figure 1). For example, because of genetic drift, small populations isolated only for a very short period could already become reciprocally monophyletic with respect to some character and be thus diagnosable. Such situations do not represent the type of diversity of interest to most ecologists and evolutionary biologists  and the question remains if these populations should be recognized and named as species or not. As an example, Meiri and Mace  disagreed with the recognition of Bornean and Sumatran populations of the clouded leopard Neofelis nebulosa diardi as a full species  for exactly that reason. Indeed, other cases of elevation of subspecies to species rank in several groups of organisms [review by ], but especially in birds and primates , have been criticized as an unjustified inflation of biodiversity with detrimental consequences for macroecology and conservation . However, in many situations the steep increase of species numbers reflects a genuine discovery of previously unknown evolutionary lineages and thus valuable taxonomic progress [5–7].
Further discussion about the alternatives for integration requires a closer look to what these approaches try to integrate--the characters. Taxonomic characters are organismal traits used as evidence for species discovery . Characters can (i) be classified by the level of biological organization of the attributes to which they refer: biochemical, molecular, morphological, behavioral or ecological. They can (ii) be qualitative or quantitative, to describe variation that is discrete or continuous. Characters can (iii) have fixed states (that is, for each of the compared species or population there is a unique state for all individuals); or be polymorphic within species but with states distributed in different frequencies across species. Most important for taxonomy--and often-neglected--is the classification of characters (iv) by the evolutionary processes that shaped them (sexual or natural selection, or genetic drift), and by the role they play in the speciation process (Figure 4).
Taxonomists need to resort to different taxonomic characters to conform to the biological peculiarities of particular taxa. For example, behavioral characters--especially those genetically fixed without ontogenetic learning--such as call patterns of insects, bats and frogs, are routinely used by taxonomists working on those groups [e.g. ], and their use has led to the discovery of many cryptic species in some groups of organisms . Also, the ecology of organisms can be an important source of evidence in some cases. For example, the degree of ecological interchangeability may be a decisive taxonomic character to distinguish between closely related species [26, 33, 65]. In bacteria, the lack of a conspicuous morphology coupled with extensive gene transfer has forced taxonomists to develop a model-based strategy that combines data on ecology and on genetic diversity to delimit species [66, 67].
However, much of the discussion around integrative taxonomy deals with the merits of morphological versus molecular characters [36–39, 41, 49]. For practical and historical reasons most species have been primarily described based on morphology, including color. As a main advantage, morphological characters often serve to allocate individuals to species immediately by visual inspection, and are applicable to living as well as preserved specimens and fossils. Disadvantages are: (i) there is always a subjective component when defining and interpreting character states, (ii) demonstrating the fixation of a state requires large sample sizes , (iii) the continuous rather than discrete nature of many characters on which taxonomists heavily rely, e.g., reptile scale counts  or mollusk shell size and shape , and (iv) their unsuitability for some groups of organisms, either because speciation occurs without morphological change , which leads to morphologically cryptic species , or because morphological structures are labile or difficult to study--e.g. as in prokaryotes [66, 67].
Molecular characters used in taxonomy have historically been allozymic or chromosomal (number and structure of chromosomes), but today these are mostly sequences of mitochondrial (or chloroplast) DNA and, increasingly, of nuclear genes. While the analysis of allozymes has the advantage of simultaneously screening variation at several presumably unlinked nuclear loci, DNA sequences provide many more characters (nucleotide sites), can be amplified from much smaller samples, and can be obtained in unprecedented amounts through high-throughput sequencing from fresh samples, preserved historical collection material, and even from Pleistocene fossils . DNA sequences can be examined using non-tree based methods to provide diagnostic differences among species , but most frequently they are analyzed using tree-based methods to search for monophyletic groups that could represent species . A limitation of tree-based methods is that it remains difficult to choose which among the multiple strongly supported clades detected represent species. Also, a growing body of evidence shows that discordance between species trees and gene trees is a common phenomenon caused by processes such as incomplete lineage sorting, hybridization, gene duplication, reticulated evolution, or recombination . These situations greatly complicate the resolution of taxonomic problems [72, 73]. Most promising are tree-based methods that rely on coalescent theory [54, 74], because they can identify signals of species divergence even under complex circumstances of gene tree incongruence and non-monophyly [e.g. ]. A recent large-scale study on insects of Madagascar  shows that single-locus coalescent models perform well for both testing and discovering species from large sample sets even without prior hypotheses of population coherence, providing thus a potential empirical substitute for traditional tree-based methods for preliminary biodiversity screening and species identification (e.g., DNA barcoding approaches).
To be usable in an integrative taxonomy, characters should be evaluated taking into account the evolutionary forces driving the speciation process (Figure 4). This new perspective may help to overcome many of the long-standing discussions about characters in taxonomy. For example, as long as characters have a genetic basis, are unlinked and are not influenced by the same selective pressures, any character should be considered as an independent, equivalent and combinable unit. In other words, the potential of a character to clarify a taxonomic problem has to be carefully evaluated in every situation, and a particular nucleotide or morphological character state might be deemed more important than all other nucleotides and morphological characters, because it might be particularly informative to understand the process of lineage splitting and divergence. For instance, in those cases where the speciation process is driven by sexual and/or by natural selection, characters known to be under the influence of any of these forces in one species might be directly indicative of lineage divergence in the whole taxonomic group to which this species belongs and, thus, be more informative than those that are known to evolve neutrally.
Such inferences need to be based on careful evaluations to avoid circularity of arguments. As a very obvious example, in a group of animals where speciation has been demonstrated to be mainly driven by sexual selection of male colouration, differences in colour will be given a higher importance as taxonomic character than in a group of subterraneous and blind species. If speciation in a clade of phytophagous insects is known to be driven mainly by the switch to new host plants, then the discovery that a new population belonging to this clade feeds on a novel host plant might be a more relevant taxonomic character than it would be in a clade of host plant generalists. And in a group of microendemic species where specialization to narrow and distinct bioclimatic envelopes has been demonstrated to be the main force leading to speciation, the specialization of a newly discovered population to a bioclimatic niche distinct from all known species in the group might be a suitable argument to advocate its species status, while in groups where most species are known to be tolerant of a variety of bioclimates such data would be less relevant.
In general, sexually selected characters might be more likely to represent species-specific differences than naturally selected characters because they contribute to the reproductive cohesion and isolation of species while, at the same time, their rapid evolution contribute to create larger gaps between closely related species. Thus, the most important taxonomic characters would be those indicative of reproductive isolation or limited gene flow, such as crucial mutations in "speciation genes"  or any trait that directly mediates a premating or postmating reproductive barrier, such as wave form and frequency differences in advertisement calls of insects or frogs, or genital structures of arthropods and squamate reptiles. For example, a single amino acid substitution can suffice to produce striking plumage differences mediating species recognition and leading to speciation in birds . Also, differences in a coding vision gene can affect female mating preferences and initiate lineage divergence in fish . Some researchers further argue that molecular markers prone to high intraspecific gene flow might be less affected by interspecific gene flow, and be thus more effective for delimiting species , because in cases of gene introgression among sister species substantial intraspecific gene flow will reduce the frequency of introgressed alleles. But, also, the first move in speciation can result of the accumulation of multiple neutral mutations in DNA sequences causing hybrid incompatibilities , which make those mutations ideal diagnostic characters to separate species.
In short, future discussions in taxonomy should not be about morphology versus molecules but about how characters reflect lineage divergence or about the functional relevance of some characters in the speciation process. As a consequence, taxonomy will no longer be a science restricted to the description of patterns but will be tightly linked to the study of processes generating diversity.
In the practice of current (increasingly molecular) systematics, phylogeography and DNA barcoding studies of eukaryotes are revealing units that might represent potential new species at a faster pace than results can be followed up by taxonomists. This situation suggests a need for guidelines to order and classify this undescribed diversity. The bacteriological concept of candidate species  has recently been explored and applied to vertebrates for such units [8, 63, 80, 81]. A further developed stepwise working protocol (Figure 2c) recognizes three subcategories of candidate species . Groups of individuals within nominal species showing large genetic distances, but without further information, are considered unconfirmed candidate species (UCS) deserving further study. When additional data indicate that these genealogical units are not differentiated at the species level, they are flagged as deep conspecific lineages (DCL). The third category, confirmed candidate species (CCS), applies to those deep genealogical lineages that can be considered good species following standards of divergence for the group under study but that have not yet been formally described and named. For example, confirmed candidate species are sister lineages in syntopy showing no evidence of interbreeding, or allopatric lineages with distinct morphological or bioacoustical character divergences.
A more standardized nomenclatural system might help to communicate with precision about candidate species, inventory them and track their changing status. Murray and Schleifer  proposed a formal naming system for candidate prokaryotes that consists in placing the epithet Candidatus before a preliminary species name, as for example: "Candidatus Liberobater asiaticus". This system has since been broadly accepted and implemented in the Bacteriological Code. However, it is inapplicable for eukaryotes because the zoological and the botanical codes do not specify minimum scientific criteria for recognizing species names as valid. Thus, while "Candidatus Liberobater asiaticus" can be kept as an informal name until the species is proved to be valid by accepted standards of the discipline and becomes thus formally described, any informal name given to a candidate species of animal could qualify as valid name under the Zoological Code if the proposal is accompanied by a voucher and diagnostic differences (e.g. exclusive haplotypes).
For different groups of animals, naming schemes of candidate species have been established. As one example, catfishes of the family Loricariidae are provided with so-called L-numbers, a system of consecutive numbers introduced in 1988 by R. Stawikowski, A. Werner and U. Schliewen, in which each putative new species is referred by a unique number combination after the letter "L"--from Loricariids. These numbers are designated upon publication of photographs of unknown color variants of Loricariids in the German journal "Die Aquarien und Terrarien Zeitschrift" and are used beyond the realm of aquariologists.
A standardization of such schemes across taxonomic groups of eukaryotes would be clear progress for data retrieval systems. A naming scheme for candidate species should not be mistaken for a substitute or competition with the established Linnaean system of nomenclature but, rather, it is a mean to facilitate the assembly of data sets that could eventually lead to the description of new species under the Linnaean system. To avoid conflicts with the Codes of nomenclature, we propose to designate candidate species of eukaryotes through the combination of the binomial species name of the most similar or closely related nominal species, followed (in square brackets) by the abbreviation "Ca" (for candidate) with an attached numerical code referring to the particular candidate species (more than one candidate might be recognized under a valid species), and terminating with the author name and year of publication of the article in which the lineage was first discovered. The vouchers the candidate species cfould be the GenBank accession numbers of the sequences used to propose the candidate status, or any equivalent information (e.g. MorphoBank accession numbers for morphological candidate species, or a voucher specimen number from a public collection). As an example, Hirudo medicinalis [Ca3 Siddall et al. 2007], would be the exclusive name combination referring to a particular candidate species of European leech (Ca3), as defined in the corresponding reference . This system should be flexible enough to accommodate situations in which no unambiguous candidate species definition has been proposed in a study--in this case, referring to a GenBank accession number should be possible: Hirudo medicinalis [Ca3 EF405599], where the number refers to a highly divergent sequence of H. medicinalis. When not even a tentative assignment of a candidate species to a most similar nominal species is possible, then it would be possible to assign a candidate species just to a genus or family, i.e., Hirudo sp. [Ca3 EF405599] or Hirudinidae sp. [Ca3 EF405599].
This system maintains the traditional structure of binomial species names and helps to list together both valid and candidate species in hierarchical and alphabetically ordered databases as GenBank, or repositories of morphological or geographical data. Adding a numerical code helps to avoid repetitive proposal of candidates; and GenBank accession numbers provide a direct link to source data. Candidate species should create a link between the activities of molecular biologists (e.g. ongoing DNA barcoding initiatives) and taxonomists to redirect taxonomic efforts and accelerate species descriptions.
Work areas for the scientific and technical development of integrative taxonomy
Improving taxonomic work protocols
Development of pragmatic operational protocols to discovering and describing species (Figure 2d). There is an inevitable trade-off between using complex integrative approaches for delimiting species that may provide stable names, and the need to accelerate the pace of taxonomic descriptions . Indeed, of the many empirical methods available for species delimitation [22, 25], most require extensive sampling, absence of missing data, and/or complete species-level molecular phylogenetic trees. Clearly, for most areas and groups of diverse organisms of the world, data at hand will be insufficient for in-depth studies of evolutionary separation of lineages.
Refinement of probabilistic procedures to evaluate character congruence
New methods should be able to deal with the heterogeneity of the evolutionary process, with situations of character incongruence, and to include fixed characters states as well as states distributed in different frequencies across populations. In this sense population geneticists have efficient tools to estimate if combinations of alleles occur more frequently than expected randomly -a situation termed linkage disequilibrium - and this method can be applied to discover cryptic sympatric species [e.g., ]. Also, phylogeneticists have developed approaches as CONCATERPILLAR , which take into account different evolutionary rates of different loci and allow identification of those that should be analyzed independently or concatenated. Extending such approaches to non-molecular characters could result in more rigorous protocols of integrative taxonomy.
Development of modular software for species delimitation, description and publishing.
Besides including phylogenetic and population genetics modules, as in Mesquite , such software should include modules for statistical analysis of morphological data, should be able to extract character information from bi- and tri-dimensional imagery  and from sequence data (such as pure and private diagnostic nucleotide substitutions, e.g. ), and should also incorporate packages for ecological and geographical modeling and mapping, as well as for bioacoustics. It could also implement a package for building standardized species descriptions that could be directly submitted for peer-review to major taxonomic journals at the same time that supporting data are automatically sent to biodiversity databases (e.g. GBIF, Species2000, Zoobank, GenBank, CBOL, MorphoBank); hyperlinked species descriptions represent an advance in this sense .
Automated identification of candidate species
Development of methods for automatically identifying, naming, documenting and cataloguing candidate species through the combination of DNA barcoding and digital image processing [12, 103]. These approaches could be especially helpful for the preliminary screening of hyperdiverse groups such as small arthropods and nematodes, or for geographical areas facing imminent habitat destruction (and therefore in need of rapid inventories of species diversity and conservation priorities).
Application of genomic analyses to taxonomy (GenoTaxonomy).
Population genomics aims to identify regions of the genome with greater differentiation than expected from the average across many loci affected by reduced gene flow due to reproductive isolation or local adaptation . The automatic identification of those regions, to be used as diagnostic characters, might be the key to substantially accelerate species discovery, especially if applicable through modular taxonomic software. Given the enormous expected increases of genomic data  such approaches will soon become applicable.
Although traditional procedures will remain useful in many cases, taxonomy needs to be pluralistic and integrate new approaches for species delimitation if it is to become a modern evolutionary discipline. Thus, for example, the "Family Union"--in Joe Felsenstein's words--between the fields of population genetics and phylogeography with phylogenetics through coalescent theory, which has been considered as one of the most exciting recent developments in systematics  (see also http://treethinkers.blogspot.com/), should also strongly benefit taxonomy. Shadows of past conflicts between morphologists and molecular biologists should now fade and discussions will not be about simply integrating different kinds of characters, but rather different concepts and methods of population genetic, phylogeographic, and phylogenetic analyses. There is probably no magic bullet for species discovery and delimitation, but an integrative and evolutionary framework provides taxonomists with a larger arsenal to face the realities of inventorying the actual--and woefully underestimated--biodiversity of the planet.
Allele: one of two or more alternative forms of a gene that arise by mutation and are found at the same locus in a chromosome.
Allopatry: the condition of species or populations occurring in separate, non-overlapping geographical areas.
Candidate species: a set of organisms identified as a putative new species.
Coalescent theory: retrospective model of population genetics that employs a sample of individuals from a population to trace all alleles to the most recent common ancestor.
DNA barcoding: the use of short standardized DNA sequences to identify species.
Ecological niche: environmental conditions under which a species exist.
Gene lineage: ancestor-descendant series of alleles.
Locus (plural loci): a fixed position on a chromosome that may be occupied by one or more alleles of a gene.
Monophyletic group: a group consisting of an ancestor and all its descendants.
Neutral character: observable or quantifiable organismal trait whose evolution and variation can be explained by random processes.
Non-neutral character: observable or quantifiable organismal trait whose evolution and variation can be explained by natural or sexual selection.
Parapatry: the condition of species or populations occurring in contiguous geographical areas.
Paraphyletic group: a group consisting of an ancestor and some of its descendants.
Phylogenetics: biological discipline focused on reconstruction of the evolutionary relationships among organisms.
Phylogeography: study of historical processes responsible for intraspecific patterns (or patterns among closely-related species) of geographical distribution and diversity of gene lineages.
Species hypothesis: the hypothesis that a group of populations represents a separately evolving and divergent lineage.
Species lineage: ancestor-descendant series of metapopulations.
Speciation: the array of processes that result in the origination of new species.
Subspecies: infraspecific Linnaean category sometimes used to classify allopatric or parapatric populations showing some degree of divergence--traditionally in morphological traits--not considered large enough for the species rank.
Sympatry: the condition of species occurring in the same geographical area.
Syntopy: the condition of species occurring in the same locality at the same time.
Systematics: biological discipline that studies evolutionary patterns of biological diversity, including the fields of taxonomy and phylogenetics.
Taxonomy: biological discipline that identifies, describes, classifies and names extant and extinct living beings and deals with the theory of classification.
Taxonomic character: observable or quantifiable organismal trait used to separate species.
JMP was founded by the EU Marie Curie Mobility and Training Programme (FP7, proposal 220714). AM was supported by a postdoctoral Research Fellowship from the Alexander von Humboldt Foundation. This work was partially funded by project CGL2008-04164 of the Spanish Ministry of Science and Innovation (IDlR, Principal Investigator). We are grateful to S. Castroviejo-Fisher, C. Vilà, and A. González-Voyer for constructive criticism on previous drafts of the manuscript.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.