The complete sequences and gene organisation of the mitochondrial genomes of the heterodont bivalves Acanthocardia tuberculata and Hiatella arctica – and the first record for a putative Atpase subunit 8 gene in marine bivalves

Background Mitochondrial (mt) gene arrangement is highly variable among molluscs and especially among bivalves. Of the 30 complete molluscan mt-genomes published to date, only one is of a heterodont bivalve, although this is the most diverse taxon in terms of species numbers. We determined the complete sequence of the mitochondrial genomes of Acanthocardia tuberculata and Hiatella arctica, (Mollusca, Bivalvia, Heterodonta) and describe their gene contents and genome organisations to assess the variability of these features among the Bivalvia and their value for phylogenetic inference. Results The size of the mt-genome in Acanthocardia tuberculata is 16.104 basepairs (bp), and in Hiatella arctica 18.244 bp. The Acanthocardia mt-genome contains 12 of the typical protein coding genes, lacking the Atpase subunit 8 (atp8) gene, as all published marine bivalves. In contrast, a complete atp8 gene is present in Hiatella arctica. In addition, we found a putative truncated atp8 gene when re-annotating the mt-genome of Venerupis philippinarum. Both mt-genomes reported here encode all genes on the same strand and have an additional trnM. In Acanthocardia several large non-coding regions are present. One of these contains 3.5 nearly identical copies of a 167 bp motive. In Hiatella, the 3' end of the NADH dehydrogenase subunit (nad)6 gene is duplicated together with the adjacent non-coding region. The gene arrangement of Hiatella is markedly different from all other known molluscan mt-genomes, that of Acanthocardia shows few identities with the Venerupis philippinarum. Phylogenetic analyses on amino acid and nucleotide levels robustly support the Heterodonta and the sister group relationship of Acanthocardia and Venerupis. Monophyletic Bivalvia are resolved only by a Bayesian inference of the nucleotide data set. In all other analyses the two unionid species, being to only ones with genes located on both strands, do not group with the remaining bivalves. Conclusion The two mt-genomes reported here add to and underline the high variability of gene order and presence of duplications in bivalve and molluscan taxa. Some genomic traits like the loss of the atp8 gene or the encoding of all genes on the same strand are homoplastic among the Bivalvia. These characters, gene order, and the nucleotide sequence data show considerable potential of resolving phylogenetic patterns at lower taxonomic levels.


Background
Metazoan mitochondrial genomes are typically conserved in gene content and length. They are usually circular, 14 to 20 kb long, and encode for 13 proteins of the respiratory chain [cytochrome c oxidase subunits I-III (cox I -cox III), apocytochrome b (cytb), atpase subunits 6 and 8 (atp6, atp8), and NADH dehydrogenase subunits 1-6 and 4L (nad 1-6, nad 4L)] and 24 RNA genes of the translation system [small (S) and large (L) subunit ribosomal RNA (rrn) and 22 transfer RNAs] [1]. The high number of possible arrangements makes it very unlikely that identical gene orders arise by chance [2]. Such a complex character combined with a low frequency of gene rearrangements is highly valuable for reconstructing palaeozoic or even pre-Cambrian phylogenetic events. Examples for this situation are Vertebrata (over 540 species sequenced) and Arthropoda (over 100 species sequenced): both show few rearrangements within the phylum [3].
In contrast, only 30 complete mitochondrial genomes of Mollusca are published: ten Gastropoda, nine Bivalvia, one Polyplacophora, two Scaphopoda and eight Cephalopoda. However, even this small taxonomic sample reveals much greater variability of gene arrangements compared to vertebrates and arthropods and notable differences in rearrangement frequencies between phyla and also within the Mollusca [3]. Whereas the order of the protein coding and the rRNA (rrn) genes in the mt genomes of the polyplacophoran Katharina tunicata, the vetigastropod Haliotis rubra and the cephalopods Octopus vulgaris and Octopus ocellatus are identical and the apogastropod Ilyanassa obsoleta and the other cephalopods can be related to them, the euthyneuran gastropods, the scaphopods and the bivalves are highly rearranged.
An additional complication in the Bivalvia, termed doubly uniparental inheritance (DUI), is the existence of distinct male and female mitochondrial lineages [4][5][6][7][8][9][10]. It is not clear whether this mode of inheritance is characteristic for all bivalves, or if it contributes to the accelerated rearrangement rate in this group. There are yet more special features of molluscan mt genes. Hoffmann et al. [11] described an additional trn-Met in Mytilus edulis; Katharina tunicata has two additional tRNAs [12]. Some pulmonate gastropods have unusual tRNA s lacking the T-stem or the D-stem, similar to nematode mt tRNAs [13]. The atp6 and atp8 genes are separated in scaphopods [14,15] and most gastropods (only the prosobranch Littorina saxatilis, the vetigastropd Haliotis rubra and the apogastropod Ilyanassa obsoleta have adjacent atp6 and atp8). The published heterodont and pteriomorph bivalve sequences lack the atp8 gene altogether. This is unusual because the atp6 -atp8 cluster is common to most animal mitochondrial genomes, often with overlapping reading frames [3]. It is, thus, not clear for which molluscan taxa and on which systematic levels mitochondrial gene order data and genomic characters like those mentioned above are phylogenetically informative.
The phylogenetic relationships of the major taxa of the heterodont bivalves are only partly resolved. Molecular phylogenetic analyses [16,17] agreed on the exclusion of the Hiatellidae from the Myoida placing this taxon close to the base of the higher Heterodonta ("unnamed clade I" in [16] fig. 3.6). The latter clade also contains the Cardiidae and Veneridae. With the complete mitochondrial sequences of one species of each Hiatellidae, Carditidae and Veneridae available for the present study we are able to test the monophyly and sistergroup relationships of the higher heterodonts, Acanthocardia and Venerupis.

Genome size, genes, base composition and codon usage
The size of the complete mt-genome of Acanthocardia tuberculata is 16.104 basepairs (bp) and has an overall A+T content of 59.6 %. All genes are on one strand (Fig.  1). The Acanthocardia genome features 1.751 non-coding bp. The largest non-coding region (Table 1), of 1.103 bp is located between trn-Met and trn-His. It contains a 599 bp fragment composed of 3.5 nearly identical copies of a 167 bp motive (Fig. 2). This repeat has an A+T content of 60 %. The other 23 non-coding regions range between 1 and 128 bp.
All but one (atp8) of the 37 typical mitochondrial genes are present, with an additional copy of the trn-Met (Fig.  3). The Acanthocardia mt-genome encodes for a total of 3.647 amino acids. The most frequent codon is TTT (Phe; n = 264), followed by TTA (Leu; n = 172). An A or T nucleotide is present at the third position in 2.269 codons (61.13%). Five of the 12 protein coding genes start with ATG or ATA (Table 2), six starts with the alternative start codon ATT (Isoleucine). The atp6 gene starts with a GTG codon. Eight genes are terminated by TAA and four by TAG. An incomplete stop codon is inferred from the alignment of the atp6 gene. The genes for trn-Gln and trn-Arg overlap by two 2 bp.
The mt-genome of Hiatella arctica is 18.204 bp in length and has an A+T content of 66.35 %. As in Acanthocardia, all genes are on the same strand (Fig. 1). The longest of the 30 non-coding regions (Tab 3) has 614 bp and is located between the genes for trn-Ala and atp8. The others range between 1 and 376 bp in length.Hiatella has two copies of a 121 bp motive (Fig. 4) starting in the 3' end of the nad6 gene and extending into the non-coding region before the tRNA-Trp gene. The genes for trn-Leu I and trn-LeuII overlap by one nucleotide.
The Hiatella mt-genome contains all 37 mitochondrial genes including atp8 and a second copy of the trn-Met (Fig. 4). A total of 3.985 amino acids are encoded. As in Acanthocardia, the most frequent codons are TTT (Phe; n = 359) and TTA (Leu; n = 284). A or T are present in 2.873 third codon positions (72.09 %). Seven of the 13 protein coding genes start with ATA, the other six genes with ATG. The codon ATT terminates seven, and the codon ATG four protein coding genes (Tab 4). Truncated stop codons (T) are inferred for the atp8 and the coxII genes.

Phylogenetic analysis of nucleotide and protein coding sequences
The concatenated amino acid alignment of 28 species (Tab 5) consists of 5.004 positions of which 3.085 are parsimony-informative. The corresponding nucleotide alignment including the rrnL sequences has 16.862 positions in total, 11.854 without 3 rd codon positions, of which 7.130 are parsimony-informative. The Bayesian analyses resulted in almost fully resolved trees ( Fig. 5) with total marginal -lnL of 156.458,59 for the amino acid data and 209.800,11 for the nucleotide data (arithmetic means). Most branches have posterior probabilities of 1.0. The deeper nodes tend to be less supported. The parsimony analyses of both data sets yielded a single most parsimonious tree each (amino acid data: tree length 31.163, consistency index 0.5819, rescaled consistency index 0.2719; nucleotide data: tree length 49.859, consistency index 0.3440, rescaled consistency index 0.1366; trees not shown). All Bayesian and parsimony analyses recover the three heterodont species as a robust monophylum. Acanthocardia is the sister taxon of Venerupis with high to moderate support ( fig 5). The Pteriomorpha are resolved as monophyletic from the nucleotide data in the parsimony analysis only. In most analyses the two unionid species are separated from the remaining bivalves and placed in a more basal position in the tree. Only the Bayesian tree of the nucleotide data resolves monophyletic Bivalvia, although with low support (posterior probability 0.76). This is also the only tree showing monophyletic Scaphopoda and Pulmonata. The unstable position of the vetigastropod Haliotis, near the base of the molluscan clade renders the Gastropoda diphyletic in all analyses. Cephalopoda are always robustly supported, and only the parsimony analysis of the amino acid data fails to resolve Mollusca as monophyletic. The exclusion of highly varia-Gene order of Katharina, Acanthocardia, Hiatella and Venerupis  Nd2 Nd4 Alignment of the large duplicated regions in Acanthocardia tuberculata and Hiatella arctica   ble alignment positions using GBLOCKS had no effect on the topologies of the trees and brought only minimal changes in branch support.
The functional and selective significance of the duplicated regions in Acanthocardia is unclear. Tandem repeats are also present in other bivalve mitochondrial genomes: Venerupis, e.g., has four tandem repeats of 203 bp between the nad2 and the nad4l genes [9]; Placopecten has seven repeats of a 79 bp motive between trn-Asn and trn-Glu and two repeats of 1.435 bp between nad6 and trn-Met [18]. More unusual is the duplication in Hiatella starting 12 bases upstream of the 3' end of the nad6 gene.
Although the copies are 79 % identical the second repeat has no open reading frame. It is likely that this non-functional copy of the coding part accumulated substitutions more rapidly due to relaxed selection and, thus, lost the reading frame. Nearly identical duplications of complete genes occur in mt-genomes of the cephalopods Watasenia and Todarodes [19,20].
Acanthocardia and Hiatella mt genomes encode 23 transfer RNA genes which can be folded in a typical secondary structure. Both genomes have an additional tRNA for Methionine. A second Methionine tRNAs is present in the bivalves Mytilus edulis [11], Mytilus galloprovincialis [7], Crassostrea virginica [21], Placopecten magellanicus [18] and Venerupis phillipianarum [9]. Overlaps of tRNA genes as observed in Acanthocardia and Hiatella are a common feature in mt-genomes [1].  The atp6 gene of Acanthocardia lacks a Methione or Isoleucine at the putative 5' end and a complete stop codon. The first ATN codon is 48 bp downstream of the putative point of start as inferred from the alignment of the molluscan atp6 genes. The assumed start codon is GTG as in the nad 2 gene of polyplacophore Katharina tunicata [12]. Trun-Cloverleaf structures of the 23 tRNA genes in the mitochondrial genome of Acanthocardia tuberculata Figure 3 Cloverleaf structures of the 23 tRNA genes in the mitochondrial genome of Acanthocardia tuberculata.
Cloverleaf structures of the 23 tRNA genes in the mitochondrial genome of Hiatella arctica Figure 4 Cloverleaf structures of the 23 tRNA genes in the mitochondrial genome of Hiatella arctica.
T T --cated stop codons like in the Acanthocardia atp6 and the Hiatella co II and the atp8 genes require the inference of the ends of the genes from the alignment with other species. The completion of truncated stop codons by polyadenylation after transcript processing was described by Ojala [22].
Hiatella arctica is the first marine bivalve reported to have a complete atp8 gene consisting of 53 amino acids. The alignment of this atp8 gene (Fig. 6) shows a Methionine at the start and a truncated stop codon T. We also identified a putative atp8 gene in the mt-genome of Venerupis, between the genes rrnL and nad4 at positions 5.974 to 6.088. Although this region was annotated as part of the rrnL by the authors [9], it represents an open reading frame encoding for only 37 amino acid positions. It starts with Leucine instead of Methionine, but ends with a complete stop codon. The more conserved 5' region of the gene resembles other molluscan atp8 genes in amino acid sequence (Fig. 6) and in the hydrophilicity profile. The positively charged 3' region of the gene, which is known to vary greatly in length and composition [23,24], is reduced to a few residues in Venerupis. This is confirmed by the alignment of the amino acid sequence corresponding to the conserved atp8 profiles in other metazoans [25]. It remains open, however, whether this gene is functional. Dreyer and Steiner [15] reported a comparably short atp8 gene for the scaphopod Siphonodentalium lobatum. Serb and Lydeard [26] discuss a non functional version of the atp8 gene in the freshwater mussel Inversidens, and Milbary and Gaffney [21] describe a potential remnant of the atp8 gene in the eastern oyster Crassostrea virginica.
Many metazoan mt genomes have neighbouring atp6 and atp8 genes on the same strand. This arrangement is likely to be selected for, if the uncleaved transcripts are co-trans-    lated [25,2]. Several taxa lacking this gene arrangement in the mt genome, e.g. Plathyhelminthes, Nematoda, Annelida, Sipunculida, the brachipods Laqueus [27] and Terebratalia [28], and, among Mollusca, Bivalvia and Scaphopoda. Of these genomes, Plathyhelminthes, Nematoda except for Trichinella, and the pteriomorph bivalves lack atp8 altogether. The disparate distribution of this feature clearly indicates that the loss of the atp6 -atp8 coupling and the loss of atp8 occurred several times independently in metazoan evolution. This is corroborated by finding truncated atp8 genes separated from the atp6 gene in the nematode Trichinella and in the scaphopod Siphonodentalium. It is possible that this situation represents an evolutionary stepping stone from the fully functional atp6 -atp8 coupling, via decoupled but complete genes like in annelids and the scaphopod Graptacme, and the complete loss of atp8.
The location of all mt-genes on the same strand, as in Acanthocardia and Hiatella, is uncommon among Meta-zoa, but is reported for several taxa [28] including all published marine bivalves. Only in the unioid freshwater bivalves Lampsilis ornata and Inversidens japanensis genes are located on both strands. Under the Heteroconchia concept postulating a sister group relationship of Unionida and Heterodonta, the "all-on-one-strand" situation either evolved independently in Heterodonta and Pteriomorph or was lost in the Unionida.
Comparing the gene arrangements of Acanthocardia and Venerupis no identities are apparent, if the tRNA genes are included. The tRNAs are more variable because the secondary structure allows them to translocate more frequently [12]. Even after excluding the tRNAs from the comparison the two mt-genomes show few identical gene junctions. These are limited to the block containing the Cytb -rrnL -nad4 -atp6 genes in Acanthocardia, although this is interrupted by the putative atp8 gene in Venerupis. This gene order may be inherited from the common ancestor of Acanthocardia and Venerupis, with the apomor- The mitochondrial genome sequence data confirm previous results [16,17] on the monophyly of Cardiidae and Veneridae relative to the Hiatellidae. Their common branch and the heterodont clade are robustly supported in all analyses. Similarly, the clade uniting Heterodonta and Pteriomorpha is well supported, although to the exclusion of the unionid branch. This is in accordance with the topology of Giribet and Distel [16] but contrasts that of Waller [29] and Steiner and Hammer [30] supporting the Heteroconchia clade (Unionida + Heterodonta). The Bivalvia clade is resolved by the Bayesian analysis of the nucleotide data only. This may indicate the higher potential of recovering correct topologies by this method or the superiority of nucleotide substitution models over amino acid substitution models or a combination of these factors. Note that the Bayesian nucleotide analysis also succeeds in resolving the pteriomorph, scaphopod, and pulmonate branches.
What could cause the unexpected position of the Unionida rendering the Heteroconchia diphyletic? In both amino acid and nucleotide-based trees the unionid species have conspicuously shorter branches compared to the other bivalves. Although the present data set is not large enough for statistical assessment, such obvious differences in substitution rates may cause phylogenetic analyses to find incorrect trees, as previously documented for Bivalvia [30]. In addition to lower substitution rates, different substitution patterns in the unionids may confound phylogenetic analyses. All bivalve mt-genomes have the genes encoded on the same strand, except for those of the unionids where three to four genes are encoded on the opposite strand. Due to the asymmetric replication process the strands show different substitution skews. Hassanin et al. [31] showed that skew differences may influence phylogenetic analyses.
The mitochondrial gene order in the Bivalvia is too divergent and the present taxon set too small to make use of this character set for phylogenetic inference at this point. In addition, the substitution models for phylogenetic inference presently do not take strand specific patterns into account. Similarly, gene rearrangement models are limited to one type of rearrangement only, either translocation or inversion. However, with a growing set of mtgenomes -their nucleotide and gene sequences -we are likely to enhance our understanding of patterns and modes of nucleotide substitutions and gene rearrangements. This will help to improve phylogenetic reconstructions by refining the models for these evolutionary processes. Improved taxon sampling and refined phylogenetic inference models are likely to resolve more open questions of bivalve phylogeny and evolution than with previously used markers. cycles of 15 sec denaturation at 92°C, 35 sec annealing at 63°C and 10 min primer extension at 72°C followed by a final primer extension step at 72°C for 10 min. The products were sequenced by primer walking.

Data analysis
Protein coding genes were analysed by the Open Reading Frame Finder [32] using the invertebrate mitochondrial code. Protein and rRNA genes were identified by their similarity to published gene sequences by BLAST searches [33]. The tRNA genes are usually too little conserved for BLAST hits. Some of them were identified by tRNA-scan SE Search Server [34] and DOGMA [35], others could only be reckognised by manually folding intergenic sequences to cloverleaf structures with anticodons. Codon usage analysis was performed by CodonW version 1.3 [36]. The whole sequence was tested for potentially tandem repeats by TANDEM REPEAT FINDER, Version 4.0 [37]. The hydrophobicity profiles for the atp8 genes are generated using the general method of Kyte and Doolittle [38] with BIOEDIT version 7.0.5 [39].

Phylogenetic analysis
Deduced amino acid sequences were aligned with CLUS-TAL X 1.83 [40] at default settings followed by manual correction. The nucleotide alignment was based on the amino acid alignment. All protein coding and the rrnL gene sequences of 24 molluscs (10 bivalves, 4 cephalopods, 2 scaphopods, 7 gastropods and 1 polyplacophore [tab 6]) were concatenated in a single Nexus file. Three annelids and one brachiopod served as outgroups. Separate analyses were run with all positions and with hypervariable positions excluded with GBLOCKS 0.91 [41]. We used PAUP* 4.0b10 [42] for equally weighted parsimony analyses with the heuristic search option and 50 random addition sequences with TBR branch swapping. Bootstrap support was assessed by 10.000 (amino acid data) or 20.000 replicates (nucleotide data) with three random addition sequences each. Bayesian inference was performed with MRBAYES 3.1 [43,44] on the Schrödinger II cluster of the Univ. Vienna computing facility under the Mtrev+Γ+I substitution model for the amino acid data set. The AIC criterion implemented in MODELTEST 3.06 [45] returned the GTR+Γ+I model as most appropriate for the nucleotide data set. We used separate and unlinked partitions for each gene and 2 × 4 chains of 5 × 10 5 generations, sampling every 100 th tree. Burnin estimation by lnL and convergence diagnostics were used as implemented in the software. We excluded 3 rd codon positions from the nucleotide analyses to reduce phylogenetic noise due to substitution saturation. Trees were visualized with TREEVIEW 1.6.6 [46].