Biological diversity often poses a major challenge for ecologists who seek to understand ecological processes or conduct biomonitoring programs. Environmental samples commonly contain a high taxonomic diversity of small-sized organisms (e.g., meiofauna in marine benthic sediments), with numerous specimens lacking diagnostic morphological characters (i.e. larval stages in plankton tows) or partially digested organisms in gut or faecal contents), making it difficult to identify species within a reasonable timeframe and with sufficient accuracy. Yet, DNA-based community analyses have offered some alternatives to traditional methods and have become even more promising with the availability of ultrasequencing platforms now supplanting cloning. Taxon detection from bulk samples can be achieved using PCR amplification followed by deep sequencing of homologous gene regions. Sequences are then compared to libraries of reference barcodes for taxonomic identification. This so-called “metabarcoding” approach has been used as a powerful means to understand the diversity and distribution of meiofauna. It has also been found to be an effective tool for assessing the diversity of insects collected from traps and characterize the diet of predators[8–11] and herbivores[12, 13] through analysis of their feces or gut content. Nevertheless, metabarcoding is still a relatively new approach, and both methodological and analytical improvements are necessary to further expand its range of applications[7, 14].
The success of a metabarcoding analysis is particularly contingent upon the primer set used and the target loci, because they will determine the efficiency and accuracy of taxon detection and identification. In general, primers should preferentially target hypervariable DNA regions (for high resolution taxonomic discrimination) for which extensive libraries of reference sequences are available (for taxonomic identification). Furthermore, primers should preferentially target short DNA fragments (e.g., < 400 bp) to maximize richness estimates[15, 16] and increase the probability of recovering DNA templates that are more degraded (sheared), such as samples preserved for extended periods of time or prey items in the gut and faecal contents of predators[18, 19]. The taxonomic coverage of the primer set will then depend upon the question addressed. For example, when the goal is to describe the diet of specialised predators (i.e. insects consumed by bats[20, 21]) or more generally to describe the diversity and composition of a specific functional group (i.e. nematodes in sediments), “group-specific” primers will be effective. Alternatively, when the goal is to obtain a comprehensive analysis of samples containing species from numerous phyla (as most environmental samples do), primers should target a locus found universally across all animals or plants.
Despite the inherent difficulty of designing versatile primers (also referred to as broad-range or universal primers), several sets are readily available to amplify nuclear and mitochondrial gene fragments across animals. For example, there are primers to amplify short fragments of the nuclear 18S and 28S ribosomal markers[22, 23], but these regions evolve slowly and may underestimate diversity[24–27]. Versatile primers have also recently become available to target a short fragment of the mitochondrial 12S gene, a region with high rates of molecular evolution suitable for species delineation and identification, but taxonomic reference databases are currently highly limited for this marker. The mitochondrial Cytochrome c Oxidase I gene (COI) has been adopted as the standard ‘taxon barcode’ for most animal groups and is by far the most represented in public reference libraries. As of January 2013, the Barcode of Life Database included COI sequences from >1,800,000 specimens belonging to >160,000 species collected among all phyla across all ecosystems. However, versatile primers are only available to amplify the barcoding region of 658 bp[30, 31] and are known to be poorly conserved across nematodes[6, 26], gastropods and echinoderms among others. A single attempt was made at designing a versatile primer to amplify a shorter “mini-barcode” COI region, but it has received limited use due to large numbers of mismatches in the priming site that affects its efficiency across a broad range of taxa.
In the first part of this paper, we use an extensive library of COI barcodes provided by the Moorea BIOCODE project, an “All Taxa Biotic Inventory” (http://www.mooreabiocode.org), to locate a conserved priming site internal to the highly variable 658 bp COI region. The newly designed internal primer is combined with a modified version of the classic reverse barcoding primer HCO2198 proposed by Folmer et al. (1994) (“jgHCO2198” - to target a 313 bp COI region. We test the effectiveness of the primer set across 287 disparate taxa from 30 phyla and we compare its performance against versatile primer sets commonly employed for DNA barcoding.
In the second part of this paper, we demonstrate how the new COI primer set coupled with an effective bioinformatics pipeline allows high throughput DNA-based characterization of prey diversity from the gut contents of coral reef fish species with three distinct feeding modes. Analysis of predator’s gut or faecal contents is one of the promising applications of the DNA metabarcoding approach. Efficient prey detection combined with high-resolution prey identification offers the potential for improving our understanding of food webs, animal feeding behaviour and prey distribution[35, 36]. Previously, due to the large amplicon size, COI was often considered a non-suitable marker ([8, 19, 37], reviewed in). We propose that this new primer set will be a powerful asset for understanding various ecological processes and conducting biomonitoring programs.