In times of climate change and massive habitat destruction, the reliable identification of species represents a pivotal component for biodiversity studies and conservation planning. However, routine identification of many species can be difficult and time-consuming, often requiring highly specialized knowledge, and therefore represents a limiting factor in biodiversity assessments and ecological studies [1–3]. In addition to this, the identification of larval stages or fragments of organisms using conventional morphological methods constitutes an impossible task for many taxa [4–6].
In this context, the use of DNA sequences represents a promising and effective tool for fast and accurate species identification [7–9]. Animal mitochondrial DNA exhibits several characteristics that makes it attractive for molecular taxonomy, namely the generally high substitution rates, the almost exclusively maternal inheritance, and the lack of recombination [10, 11]. Moreover, because of uniparental inheritance and haploidy, mtDNA has a four-fold smaller effective population size compared to nuclear DNA, leading to faster lineage sorting . A 650 base pair fragment of the 5' end of the mitochondrial cytochrome c oxidase I (COI) gene was proposed as global standard, the so-called "barcode region" for animals [7, 13]. This barcode approach has been successfully applied in various vertebrate and invertebrate taxa for species delimitation and identification [14–19]. Subsets of the standard COI barcode have been shown to be effective for species-level identification in specimens whose DNA is degraded [20, 21]. Nevertheless, the exclusive use of mitochondrial gene fragments is not without risks. The concept of DNA barcoding relies on low levels of mtDNA variation within species in combination with clear genetic differentiation between species, the so-called barcoding gap. Various studies found high levels of overlap in intra- and interspecific genetic distances for some selected taxa [22, 23]. DNA barcoding can also overestimate the number of species when nuclear mitochondrial pseudogenes (numts) are coamplified [24–27]. Introgression events and/or incomplete lineage sorting can cause trans-specific polymorphisms in mitochondrial DNA, contorting the mitochondrial variability of studied organisms . Such events have been demonstrated for various arthropod taxa, for example insects [29–33] or spiders [34, 35]. Heteroplasmy events can also confuse the identification system also , but are rare . Finally, maternally inherited endosymbionts such as the α-proteobacteriae Wolbachia or Rickettsia may cause linkage disequilibrium with mtDNA, resulting in a homogenization of mtDNA haplotypes [38–40].
All these problems show that standardised and complementing nuclear markers are necessary if a provisional species, uncovered using COI barcodes, is to be considered as species. In this context, nuclear ribosomal genes may represent potential supplementary markers for species identification. Nuclear ribosomal genes are generally considered to be highly conserved, but are actually composed of a mixture of conserved and variable regions that are organized in clusters that contain hundreds of copies per haploid genome. In metazoan taxa, these tandem rDNA units are highly uniform within a species [41–44], but differ between closely related species [e.g. [45–49]]. Until now, there have only been a few studies using nuclear rDNA sequences for DNA taxonomy: complete small ribosomal subunit DNA (18S rDNA) sequences were used to identify invertebrate taxa [1, 5], while the variable D1-D2 or D3-D5 regions of the large ribosomal subunit DNA (28S rDNA) were found to be suitable markers for various fungi [50, 51], arthropods [2, 52, 53] freshwater meiobenthic communities , and a broad range of metazoan taxa . The main limitation to these approaches lies in the length of the analysed sequences (usually >>1000 base pairs (bp)), preventing a simple amplification of degraded DNA (e.g. from collection specimens in museums) and, most important, efficient use in large-scale biodiversity studies . Nevertheless, it should be also noted that various potential problems can be associated with the use of ribosomal genes, for example intragenomic variations among rRNA gene copies. As far as we know, very few cases of intragenomic variations have been observed for Metazoa until now [57–63]. Multiple variants of the 18S rRNA gene were found in a dinoflagellate , a platyhelminth , and the Lake sturgeon Acipenser fulvescens[66, 67].
While core elements of the eukaryotic ribosomal RNA genes are considered to be essential for ribosome functions that evolve slowly and evenly [68, 69], the so-called divergence or expansion segments show a high variability in primary sequence and length between even closely related species as a consequence of DNA slippage-like processes [70–73]. In most cases, expansion segments have highly conserved flanking sites [68, 69, 74]. Although the exact functions remain elusive, various studies of eukaryotic ribosomes provide some clues about the functional aspects of expansion segments in rRNAs [75–77], including intersubunit bridges and scaffolds allowing proteins to bind to ribosomes . In addition, some of their structural features seem to be important for the stability of rRNA [75, 79, 80].
Following these considerations, we analysed and compared the usefulness of nuclear ribosomal expansion segments and COI barcodes for the molecular identification of Central European carabid beetles. The Carabidae are among the largest and most diverse insect families, with no less than an estimated 40,000 described species that inhabit all terrestrial habitat types from the sub-arctic to wet tropical regions [81, 82]. This diversity and wide distribution, along with the predominance of these beetles in a large variety of habitats, has resulted in a considerable interest in many aspects of their biology, including systematics, phylogeny, biogeography, ecology and evolution [83–87]. Ground beetles show different levels of habitat selectivity, ranging from generalists to specialists, and therefore carabid assemblages can be used as highly valuable bioindicators for characterizing disturbances in various habitats such as forests, meadows or fens . Due to the continuous and intensive study of ground beetles in Central Europe, their taxonomic classification is well-established. In Central Europe, more than 750 species are known . Nevertheless, the identification of many species and especially of larval stages can be very difficult as a consequence of high morphological variability within species and due to the existence of sibling species.
Our study examined the effectiveness and suitability of one mitochondrial (COI) and three nuclear markers, the expansion segments V4 and V7 of the 18S rDNA and the D3 expansion segment of the 28S rDNA as molecular identification tools for 75 selected ground beetle species out of 26 genera from Central Europe. We compared intra- and interspecific divergences using Kimura 2-parameter (K2P) distances and uncorrected p-distances between all analysed COI sequences and p-distances for all rDNA gene fragments of many closely related species, e.g. Agonum emarginatum/viduum, Clivina collaris/fossor, or Harpalus affinis/rubripes. Furthermore, we analysed the discrimination capacity of the used marker systems within two well-known pairs of sibling species, Bembidion lampros/properans[90–94] and Pterostichus nigrita/rhaeticus[95–98].