Full-length transcriptome sequencing and comparative transcriptomic analysis to uncover genes involved in early gametogenesis in the gonads of Amur sturgeon (Acipenser schrenckii)
Frontiers in Zoology volume 17, Article number: 11 (2020)
Sturgeons (Acipenseriformes) are polyploid chondrostean fish that constitute an important model species for studying development and evolution in vertebrates. To better understand the mechanisms of reproduction regulation in sturgeon, this study combined PacBio isoform sequencing (Iso-Seq) with Illumina short-read RNA-seq methods to discover full-length genes involved in early gametogenesis of the Amur sturgeon, Acipenser schrenckii.
A total of 50.04 G subread bases were generated from two SMRT cells, and herein 164,618 nonredundant full-length transcripts (unigenes) were produced with an average length of 2782 bp from gonad tissues (three testes and four ovaries) from seven 3-year-old A. schrenckii individuals. The number of ovary-specific expressed unigenes was greater than those of testis (19,716 vs. 3028), and completely different KEGG pathways were significantly enriched between the ovary-biased and testis-biased DEUs. Importantly, 60 early gametogenesis-related genes (involving 755 unigenes) were successfully identified, and exactly 50% (30/60) genes of those showed significantly differential expression in testes and ovaries. Among these, the Amh and Gsdf with testis-biased expression, and the Foxl2 and Cyp19a with ovary-biased expression strongly suggested the important regulatory roles in spermatogenesis and oogenesis of A. schrenckii, respectively. We also found the four novel Sox9 transcript variants, which increase the numbers of regulatory genes and imply function complexity in early gametogenesis. Finally, a total of 236,672 AS events (involving 36,522 unigenes) were detected, and 10,556 putative long noncoding RNAs (lncRNAs) and 4339 predicted transcript factors (TFs) were also respectively identified, which were all significantly associated with the early gametogenesis of A. schrenckii.
Overall, our results provide new genetic resources of full-length transcription data and information as a genomic-level reference for sturgeon. Crucially, we explored the comprehensive genetic characteristics that differ between the testes and ovaries of A. schrenckii in the early gametogenesis stage, which could provide candidate genes and theoretical basis for further the mechanisms of reproduction regulation of sturgeon.
Sturgeons (Acipenseriformes) are polyploid chondrostean fish that originated during the Devonian period and have over 200 million years of history; thus, they constitute an important model species for studying development and evolution in vertebrates [1, 2]. As the source of caviar food, the sturgeon has high economic value, which has resulted in intense fishing pressure on wild stocks, so leading them to be listed among the more endangered group of species. In the updated February 2019 press release, the International Union for Conversation of Nature and Natural Resource (IUCN) Red List identified sturgeon as one of the most endangered animal groups; some 85% of sturgeon species are on the verge of extinction (http://www.iucnredlist.org/search). Efforts have been made worldwide to develop a sturgeon aquaculture industry for artificial reproduction since the early 1960s. However, to date, many new problems have emerged that require attention. For instance, The developmental asynchronism of the double sex gametes i.e. sperm and ovum and reproduction interval of 2–7 years markedly reduce reproduction efficiency in sturgeon artificial propagation practices, causing more aquaculture cost. Therefore, studies of gametogenesis mastering the mechanism of reproduction regulation in sturgeon are greatly useful for reproduction evaluation and aquaculture management.
Gametogenesis includes oogenesis and spermatogenesis, which mainly are comprised of germ cell growth and proliferation, primary spermatocytes and primary oocytes formation, until matured gametes of double sexes production. In fish, the onset of meiosis of germ cell is one of the most important steps in gametogenesis, which involved in the sex steroid hormones stimulating and the endocrine regulation [3, 4]. The previous studies reported various abnormalities in sturgeon gametogenesis and occurs of aberrant intersex gonads in cultured sturgeons [5, 6], which significantly impact on the viability of progeny and decrease the efficiency of sturgeon stock enhancement. In this context, the first and core step in investigating the key genes involved in early gametogenesis in sturgeon is to acquire full-length nucleotide sequences information of the genes. Whole-genome sequencing and assembly combined with transcriptome data would be an efficient way to systematically characterize gene models. However, the sturgeon genome is large and includes numerous mini-chromosomes and substantial polyploidy caused by genome duplication [7, 8], which possess significant difficulties to entire-genome sequencing. Recently, transcriptome-scale sex-related gene characterization was conducted in different sturgeon species with next-generation high-throughput sequencing technologies, including Adriatic sturgeon (A. naccarii) , Chinese sturgeon (A. sinensis) , Amur sturgeon (A. schrenckii) [11, 12] and Russian sturgeon (A. gueldenstaedtii) . However, the genes acquired by only assemble procedure used in above studies signify incomplete sequences, which extremely restricts the yield of full-length genes. The third-generation long-read sequencing platform can overcome this difficulty.
In comparison with short-read sequencing, the methodological advantages of PacBio Isoform sequencing (Iso-Seq) include better completeness in sequencing both the 5′ and 3′ ends of the full-length cDNA molecules and greater accuracy in producing isoform-level transcripts. Recently, PacBio Iso-Seq technology has been successfully used in multiple species, such as Populus , Dolly Varden char , human cell lines and tissues , rabbits  and primates . The transcriptomic data produced by PacBio Iso-Seq provide innovative research materials. For example, deciphering highly similar multigene family transcripts from Iso-Seq data with IsoCon has opened the door for gaining a deep understanding of genome evolution and human diseases , and the full-length transcripts from the Iso-Seq platform have provided new insight into the extreme metabolism of the ruby-throated hummingbird . Meanwhile, the reconstruction and annotation of full-length transcripts also plays a critical role in gene discovery, particularly for species with no reference genome, such as the transcript variants involved in the innate immune system in Litopenaeus vannamei , gene families with two and more isoforms in Misgurnus anguillicaudatus  and transcript diversity in bioactive compound biosynthesis of Astragalus membranaceus . Full-length transcriptome analyses may also drive new innovative progress for understanding the mechanisms of reproduction regulation in sturgeon.
In this study, we adopted joint PacBio Iso-Seq and short-read RNA-seq to generate a high-confidence full-length transcripts dataset of the gonads of 3-year-old A. schrenckii individuals and further used them to obtain comparative transcriptomic analysis for quantification investigation between testes and ovaries. Subsequently, a functional annotation of these full-length transcripts was systematically conducted with well-curated databases. Alternative splicing (AS) events, long noncoding RNAs (LncRNAs) and transcription factors (TFs) were detected. Most importantly, searching for genes involved in early gametogenesis and association analyses of related factors and early gametogenesis were performed. Herein, we not only provide a valuable resource i.e. a comprehensive full-length transcript set for the genomic-level reference of sturgeon but also systematically characterize early gametogenesis related genes of A. schrenckii to further investigate the functions of molecules during gametogenesis in sturgeon.
Materials and methods
Samples collection and histological analysis
A total of ten 3-year-old healthy Amur sturgeon (five males and five females) were sampled from the Engineering and Technology Center of Sturgeon Breeding and Cultivation of the Chinese Academy of Fishery Sciences (Beijing, China) in this study. Before sampling, the experimental individuals were anesthetized with eugenol in water for 1–3 min according to the AVMA guidelines for use (2013 version). The gonads i.e. ovaries and testes were separately collected and treated. One part of each gonad tissues was preserved in Bouin’s fixation for histological procedures, and another part was immediately immersed into liquid nitrogen until total RNA extraction. The histological analysis of each gonad tissue was performed using hematoxylin and eosin (HE) staining.
RNA extraction and quality evaluation
Total RNA was extracted from each gonad tissue (testis or ovary) using RNAiso reagent (Takara, Tokyo, Japan). To prevent genomic DNA contamination, RNA sample was treated to digest DNA using RNase-free DNase I during extraction of total RNA. RNA purity and concentration were checked using a Nanodrop 2000 spectrophotometer (Thermo Scientific). RNA integrity was assessed using an Agilent RNA 6000 Nano reagents part I Kit in an Agilent 2100 Bioanalyzer System (Agilent Technologies, CA, USA). The RNA quality criteria for the RNA samples were RIN ≥ 7.0 (RNA Integrity Number), and 1.8 < OD260/280 < 2.2. The qualified RNAs were used for further PacBio and Illumina library construction, respectively. All the sequencing operations were conducted at Biomarker Technologies CO., LTD (Beijing, China). Considering the relatively large size of the sturgeon genome and the available A. schrenckii individuals (a presumed octaploid species), the standard of sequencing quantity (clean data) was set as follows: Pacbio Iso-Seq of the library was at least 40 G and Illumina short-read RNA-seq was greater than 20 G for each sample.
PacBio library construction and sequencing
To construct the library for PacBio sequencing, the qualified RNA from seven tissues, including the three testes and four ovaries, were mixed in equal amounts. The mixed RNA sample was reverse-transcribed for mRNA using the SMARTer™ PCR cDNA Synthesis Kit. PCR amplification was performed using the KAPA HiFi HotStart PCR Kit. Then, the PCR product for the SMRTbell library was constructed using the SMRTbell template pre kit. The concentration of the SMRTbell library was measured using a Qubit 3.0 fluorometer with a Qubit™ 1X dsDNA HS Assay kit (Invitrogen, Carlsbad, USA). The quantified criteria of library quality were concentration > 10 ng/μl with dispersive but continuous distribution in the range of 1–10kbp. A total of 2.5 ng of the library was sequenced for each SMRT cell using the binding kit 2.1 from the PacBio Sequel platform, producing 20 h of movies. The sample information was first registered as BioProject with accession number PRJNA532819 and BioSample with accession number SAMN11415730. Subsequently, the subread sequence generated by the PacBio Iso-Seq platform was deposited into the NCBI Sequence Read Archive (SRA) with accession number SRR7453063.
Error correction of PacBio Iso-Seq reads
According to PacBio’s protocol, the raw polymerase reads were first processed using SMRTlink 5.0 software. Briefly, after removing the SMRTbell™ adapter and the low-quality data, postfilter polymerase reads were obtained. The circular consensus sequence (CCS) was generated from the subreads BAM files, also known as the reads of insert (ROI). All the ROIs whose number of full passes were > 1 were further classified into full-length (FL) and non full-length (nFL) transcript sequences based on whether the 5′ primer, 3′ primer and poly A tail could be simultaneously observed. We employed a three-step strategy for error correction to improve the accuracy of the full-length transcripts produced by the PacBio Iso-Seq platform. First, the circle sequencing with > 1 pass provided an opportunity for ROI self-correction. Second, full-length, nonchimeric (FLNC) reads were subjected to nonredundant and clustering treatments by the ICE Quiver algorithm and to Arrow polishing with the nFL sequence, producing high-quality and polished full-length consensus sequences. Finally, these polished consensus sequences were further subjected to correction and redundancy removal with Illumina short reads using the Proovread tool and the CD-HIT program with a –c 0.99 parameter cutoff , respectively. The above three corrections resulted in nonredundant, nonchimeric, full-length unigenes (isoform level) with high accuracy for subsequent analyses.
Illumina library construction and sequencing
The Illumina library was prepared using the NEBNext, Ultra™ RNA Library Prep Kit (E7530 L) for Illumina (NEB, USA). Briefly, polyadenylated RNA was isolated and randomly separated into fragments. First-strand cDNA was synthesized using random hexamer primers, followed by second-strand synthesis. The purification of the double-stranded cDNA was performed using VAHTSTM mRNA capture beads. The purified and repaired double-stranded cDNA fragments were selected by size in the range of 250 bp ~ 350 bp. The concentration and quality control of the Illumina library were measured using a Qubit 3.0 fluorometer with an ExKubit dsDNA HS Assay kit (Invitrogen, Carlsbad, USA) and a Qsep 400 fragment analyzer, respectively. The quantified criteria of the library quality were a concentration > 1 ng/μl in a range of 380 bp ~ 480 bp fragments. The Illumina libraries were finally sequenced on the Illumina HiSeq platform.
Raw short reads in FASTQ format were first processed through in-house Perl scripts; clean short reads were obtained by removing reads containing adapter and ploy-N, and with low quality from the raw short reads. Simultaneously, the Q30, GC content and sequence duplication level of the clean data were calculated. The clean short reads were then mapped to the PacBio reference sequence using Tophat2 tools. Only reads with perfect matches or only one mismatch were further analyzed.
Functional annotation of unigenes
For comprehensive functional annotation, the unigenes were searched against the following seven databases using BLAST software (version 2.2.26) : NR (NCBI nonredundant protein sequences) , COG (Cluster of Orthologous Groups of proteins) , Pfam (Protein family) , Swiss-Prot (A manually annotated and reviewed protein sequence database) , KEGG (Kyoto Encyclopedia of Genes and Genomes) , GO (Gene Ontology)  and eggNOG (Cluster of Orthologous Groups of proteins) . The Diamond BLASTX methods  with an E-value < 1 × 10− 5 were analyzed in NR, COG, Swiss-Prot, Pfam and KEGG annotations.
Quantification of unigene expression levels and differential expression analysis
Using the full-length transcripts yielded from the SMRT Iso-Seq analysis as reference sequences, the unigene expression levels between the ovaries and testes of A. schrenckii were further analyzed based on the short-reads datasets generated by the Illumina sequencing platform. The extracts of three testes and four ovaries from seven A. schrenckii individuals were separately used as examples of the analysis.
Quantifications of the unigene expression values from the Illumina reads of each sample were determined with RSEM using the default parameters . Briefly, the clean data from the Illumina sequencing were mapped back onto the reference sequences, and the readcount values of the unigenes for each sample were obtained. To eliminate the effects of the sequencing depth and transcript length, all the readcounts were transformed into FPKM values (expected number of fragments per kilobase of transcript sequence per million base pairs sequenced). The 0.1 threshold of FPKM value is regarded as the expression criterion of the unigene in tested tissue (FPKM > 0.1) according to the method .
To detect the differentially expressed genes (DEGs), all the readcounts of each sample were first normalized into a standardized readcount using the edgeR package. Then, the differential expression analysis of testes and ovaries was performed using the EBSeq R package mode based on the negative binomial distribution from the biological replicates. The resulting false discovery rate (FDR) values were adjusted using the posterior probability of being DE (PPDE) approach. Herein, the |log2(Fold Change)| > 1 and FDR < 0.05 were used as the threshold for determining DEGs.
Alternative splicing prediction and analysis
Based on the BLAST method , all the unigenes were used for pairwise alignment. Finally, BLAST alignments that met all three of the following criteria were considered products of candidate AS events . Briefly, 1) the length of two unigenes was both greater than 1000 bp, and there were two high-scoring segment pairs (HSPs) in the alignment. 2) The AS gap between two aligned unigenes was greater than 100 bp and at least 100 bp from the 3′ end and 5′ end. 3) A 5 bp overlap could be tolerated.
LncRNA prediction and analysis
Unigenes with a length of over 200 nt and having more than two exons were selected as lncRNA candidates. Then, four computational approaches were employed to further screen the protein-coding unigenes from the noncoding unigenes: 1) we performed the coding-noncoding-Index (CNCI) with default settings  to assess the coding potential; 2) we used the coding potential calculator (CPC) to search for the unigenes in the NCBI eukaryotic protein database with a score < 0 setting ; 3) we translated each transcript in all three possible frames and used the Pfam scan utility with default parameters to identify the occurrence of any of the known protein family domains documented in the Pfam database ; 4) we used the coding potential assessment tool (CPAT) to assess the putative protein-coding unigenes by calculating the Fickett and Hexamer scores based on the logistic regression model . As a result, all the isoform transcripts with coding potential were filtered, and the intersecting unigenes without coding potential formed our candidate set of lncRNAs.
Transcription factor prediction and further analysis
Transcription factor (TF)-related unigene sequences were predicted using the BLAST method with the AnimalTFDB database . The HMG domain of SOX was further verified using the SMART mode (http://smart.emblheidelbergde/index2.cgi).
According to the lengths of the four nonredundant full-length Sox9 genes, we renamed them as Asc_Sox9–1, Asc_Sox9–2, Asc_Sox9–3 and Asc_Sox9–4, respectively. A neighbor-joining phylogenetic tree was reconstructed based on amino acid sequences using the MEGA 7.0 package with the following parameter settings: p-distance model and partial deletion treatment. Meanwhile, Sox2 from the zebrafish Danio rerio (accession number: BAE48583.1) was chosen as the out-group protein sequence.
Histology characteristics from the gonads of 3-year-old A. schrenckii
The gonads of 3-year-old A. schrenckii were visible at the early gametogenesis stage. In 3-year-old female individuals, the histological section of ovary showed deep, branching ovarian lamellae structure and was mainly composed of primary growth oocytes of different diameter sizes ranging from 100 to 500 μm. Some oocytes at the perinucleolar stage were clearly observed with a high number of small nucleoli along the nuclear perimeter. In 3-year-old male individuals, section of the entire germ region of testis showed smooth surface of the lateral side. By histological observation, it was found that the testis tissue displayed alveolate seminiferous lobules organization structure. The anastomosing tubules were separated from each other by thin layers of compact connective tissues and filled with spermatogonia enveloped by their own Sertoli cells. Meanwhile, we also found that there were some degenerating oocytes probably undergoing autophagy and some spermatogonia with pyknotic nuclei were probably apoptotic. Histomorphological characteristics of the ovary and testis from 3-year-old A. schrenckii individuals are shown in Fig. 1.
Full-length transcripts from the gonads of A. schrenckii
The full-length transcriptome of A. schrenckii was generated using the PacBio Sequel platform on the pooled RNA from seven tissues, including three testes and four ovaries. The resulting total of 50.04 G subread bases was generated by two SMRT cells from the PacBio library; therefore, the 1,260,958 reads of insert were produced with a mean read quality of 0.95 and mean passes of 14 circles (Supplementary Table 1). By applying the standard Iso-Seq classification and clustering protocol, all the ROIs were further classified into 358,153 nFL sequences and 860,617 FLNC reads with a mean length of 2548 bp. Based on the ICE Quiver and Arrow polishing algorithms, we produced 461,596 polished full-length consensus transcripts with a mean length of 2782 bp, including 335,067 high-quality (HQ) and 125,969 low-quality (LQ) sequences. After correction using short reads produced by Illumina short-read RNA-seq and subsequently removing redundancies using the CD-Hit program, the consensus transcripts were finally clustered into a total of 164,618 unigenes for subsequent analysis. We found that 91.09% of the unigenes main length distribution ranging from 1 to 6 kbp (Fig. 2 and Supplementary Table 2). The Iso-Seq statistics from the gonads of A. schrenckii by the PacBio Sequel platform are listed in Table 1.
Meanwhile, the clean short reads were respectively generated from seven gonads of A. schrenckii (including three testes and four ovaries) using Illumina sequencing platform. The quality evaluation indexes of the clean short reads are summarized in Supplementary Table 3. In sum, the high-quality clean short reads from each sampled Illumina library was obtained with at least 6.96 × 107 read numbers and a Q30 of over 90%.
Efficient gene annotation of full-length unigenes from A. schrenckii gonads
To obtain a comprehensive functional annotation from the full-length gonad transcriptome of A. schrenckii, all the 164,618 nonredundant unigenes were assigned to align with seven different databases, including NR, KOG, Pfam, SwissProt, KEGG, GO, and eggNOG. A total of 93.55% of the unigenes (154,006 of 164,618) were successfully annotated with significant hits (E-value <1E− 5) from these well-curated databases. The statistics of the full-length unigene annotations are listed in Table 2. The remaining unannotated unigenes (10,612 unigenes) might represent novel A. schrenckii species-specific genes.
Among the annotated 60 classified GO terms, cellular process was identified as the most common annotation in the biological process; metabolic process and biological regulation were the next most abundant GO terms. In the molecular function and cellular component categories, binding and cell part annotations were identified as the most abundant terms, respectively. Two putative early gametogenesis related GO terms, including reproductive process (involving 1188 unigenes) and reproduction (involving 1134 unigenes), were successfully annotated. The GO classifications of the full-length unigenes from the gonads of A. schrenckii are shown in Supplementary Figure 1 and Supplementary Table 4.
In the KEGG classification, a total of 295 pathways annotated from 87,086 nonredundant unigenes were extracted from the gonad transcriptome of A. schrenckii (Supplementary Table 5). The results showed that the protein processing endoplasmic reticulum (2329 unigenes), RNA transport (2312 unigenes) and cell cycle (2263 unigenes) were the top three pathways with the most abundant unigenes. Notably, we paid attention to 14 KEGG pathways, which may be closely associated with early gametogenesis of sturgeon. Among these, MAPK signaling pathway (1834 unigenes), oocyte meiosis (1731 unigenes) and progesterone-mediated oocyte maturation (1398 unigenes) were the top three pathways with the most abundant unigenes distribution.
Search for genes involved in early gametogenesis of A. schrenckii
A total of 60 genes (involving 755 unigenes), reported as active in the gonad development of sturgeon [10, 42,43,44], were detected to be present in the full-length gonad transcriptome of 3-year-old A. schrenckii. Therefore, the 60 early gametogenesis related genes and their NR annotations of full-length unigenes from the Iso-Seq of A. scipenserkii gonads are listed in Table 3 and Supplementary Table 6, respectively. Mainly, two major sex determination related genes in fish, Dmrt1and Gsdf, were also present in the full-length gonad transcriptome. Nine spermatogenesis-related genes were significantly matched with the full-length gonad transcriptome, including AR, Vasa, ER, Rspo1, Igf I, Dkk1, Hsd11b2, Cyp11b and ATRX. Meanwhile, four oogenesis-related genes were also predicted from the full-length gonad transcriptome, including Foxl2, Cyp19a, VR and Ozf. Four hormone receptor genes, including AR, ER, Gnrhr and Fshr. Two members of transforming growth factor (TGF)-β superfamily, Amh and Gsdf. Nine genes belonging to the sox subfamily, Sox3, Sox4, Sox5, Sox6, Sox7, Sox8, Sox9, Sox10 and Sox18, and five genes belonging to the double sex and mab-s (DM) domain, Dmrt1, Dmrt2, DmrtA1, DmrtA2 and DmrtB1, were found. In addition, Nine Sap family genes and four Hsp family genes were also concerned. We also found that Oct4 has the most abundant transcript diversity, with 100 unigenes predicted with ORFs among 104 unigenes, followed by Hsp70 (59/64) and Cyp19a (49/50).
Differential expression genes in early gametogenesis in the gonad transcriptome of A. schrenckii
Using nonredundant full-length transcripts as genome sequence references and combining the clean short reads datasets from the Illumina sequencing platform, the expression values (FPKM) of all 164,618 unigenes in three testes and four ovaries of seven A. schrenckii individuals were separately obtained. Therefore, the expression characteristics of all the full-length unigenes between the testes and ovaries of A. schrenckii were classified into the following three categories. 1) 19,481 unigenes were not expressed in either ovaries or testes (FPKM = 0 or FPKM ≤0.1); 2) 19,716 unigenes (13.6%) were ovary-specific expression patterns, including 14,284 unigenes in 0.1 < FPKM< 2, 4242 unigenes in 2 < FPKM< 10 and 1190 unigenes with FPKM> 10; 3) In contrast, only 3028 unigenes (2.1%) were exclusively transcribed in testis tissues, including 2751 unigenes in 0.1 < FPKM< 2, 229 unigenes in 2 < FPKM< 10 and 48 unigenes with FPKM> 10. Here, the distribution of testis-specific and ovary-specific unigenes are shown in a Venn diagram (Fig. 3a).
DEseq software was used for the analysis of differentially expressed unigenes (DEUs) in the testes and ovaries. Among 24,101 DEUs with a |log2(Fold Change)| > 1 and FDR < 0.05, 18,863 unigenes were upregulated in the ovaries, while 5238 unigenes were upregulated in the testes (Supplementary Figure 2). In further analysis 30 genes of the 60 early gametogenesis-related genes were screened to have significant expression between the testes and the ovaries (Table 4). In total, twelve genes (Foxl2, Cyp19a, OCT4, Sox3, Sox7, Bmp15, Dkk1, Gsf1, Hsp, Hsp70, Hsp90 and Sap24) were shown to be upregulated in the ovaries, while eighteen genes (DmrtB1, Amh, Sox4, Sox5, Sox8, Sox9, Vasa, Rspo1, ERβ, Gsdf, Hsd11b2, Fshr, ATRX, Ozf6, Ozf7, Sap2, Sap5 and Sap6) exhibited significant higher expression in the testes. Among the highly expressed genes in the testes, Amh is a only tissue-specific gene, i.e. only expressed in testes.
To uncover the biological function of DEUs, we separately performed KEGG pathway enrichment analysis for the ovary-biased and testis-biased DEUs. As shown in Fig. 3b and c, we found that completely different KEGG pathways were enriched between the ovary-biased and testis-biased DEUs. A total of 17 and 11 terms significantly enriched from the ovary-biased and testis-biased DEUs were discovered, respectively (corrected P < 0.05). Among the ovary-biased DEUs, most of the unigenes were involved in the three top KEGG pathways, including the cell cycle, oocyte meiosis and phagosome. However, most of the unigenes of the testis-biased DEUs were related to cell adhesion molecules, neuroactive ligand-receptor interactions, calcium signaling pathways, etc. We further analyzed the distribution of DEUs in early gametogenesis-related GO terms and KEGG pathways. We found that oocyte meiosis is the most significant pathway with 277 DEUs of 1731 unigenes, followed by the progesterone-mediated oocyte maturation pathway with 201 DEUs (Table 5).
Association analysis of alternative splicing (AS) and early gametogenesis from the gonad full-length transcriptomes of A. schrenckii
Because sturgeon species still had no reference genome, we detected AS transcript isoforms of the full-length gonad transcriptome from A. schrenckii by referring to the pipeline of Amborella trichopoda without a reference genome . As a result, a total of 236,672 AS events (involving 36,522 unigenes) were detected in the gonad transcriptome of A. schrenckii. Among these, we found that 16,909 unigenes (46.29%) had only one isoform; however, it is interesting to note that 5314 unigenes (14.55%) were predicted to have more than 10 isoforms (Fig. 4a).
Importantly, we found that sixteen early gametogenesis-related genes were predicted to be involved in AS events (Supplementary Table 7). We selected two genes as examples, including Vasa (unigene ID: F01_cb6729_c68/f1p2/2928) predicted with four alternative isoforms and Fem1 (unigene ID: F01_cb8161_c15/f1p2/1763) predicted with five alternative isoforms. In Fig. 4b, the cluster heatmap (Log2(FPKM+ 1) values) indicates the expression patterns of different alternative isoforms in the testes and ovaries of A. schrenckii. Vasa, as a exclusively expressed marker gene in germ-line, showed the predominant expression transcript with higher expression level in testes than that in ovary among 4 isoforms. Isoform 1 of Vasa had an obviously high expression in ovaries, and the other isoforms showed very low expression levels in both testes and ovaries. Meanwhile, we also found that expression levels of Fem and its five isoforms might seem like a paradoxical pattern. For example, isoform 2 and isoform 3 were the two main expression transcripts and had relatively high expression levels in both the ovaries and the testes, but isoform 4 and isoform 5 showed the opposite pattern between the ovaries and the testes. Therefore, these alternative isoforms suggest an important role in the regulation of gene expression through compensation or neutralization effects.
Subsequently, we investigated the distribution of AS events in the early gametogenesis-related GO terms and KEGG signaling pathways. The results indicated that the AS event is a pervasive feature involved in the early gametogenesis processes of sturgeon. The column plot showed that oocyte meiosis, progesterone-mediated oocyte maturation and the MAPK signaling pathway are the three most abundant AS events unigenes, while steroid biosynthesis is the least abundant event, with only eight unigenes (Fig. 4c).
Association analysis of long noncoding RNAs (LncRNAs) and early gametogenesis from full-length gonad transcriptomes of A. schrenckii
A total of 10,566 unigenes were identified as putative LncRNAs from the full-length gonad transcriptome of A. schrenckii (Fig. 5a). Further analysis indicated that 2388 of the detected lncRNAs were not expressed in either the testes or ovaries (FPKM = 0 or FPKM≤0.1). Meanwhile, 524 LncRNAs present exclusively in ovary-specific expression and 551 LncRNAs present exclusively in testis-specific expression were also detected. Overall, 961 putative LncRNAs were differentially expressed between the ovaries and testes of A. schrenckii, including 235 ovary-biased LncRNAs and 726 testis-biased LncRNAs (Supplementary Figure 3).
Association analysis of transcript factors (TFs) and early gametogenesis from full-length gonad transcriptomes of A. schrenckii
A total of 4339 nonredundant TF-related unigene sequences were matched against the AnimalTFDB database using BLAST, corresponding to 53 TF-related Pfam family domains. The top 20 abundant terms are listed in Fig. 5b. The sixty unigenes were found to have two or more different TF-related domains. For example, the results indicated that F01_cb10567_c40/f1p0/2683 was predicted to have a Homebox and miscellaneous domain, which suggests that further identification needs to be performed.
Many SRY-related HMG (high-mobility group) box (Sox) transcription factors play an important role in male gonadal differentiation, spermatogenesis and gonadal function maintenance in vertebrate species. Because the SOX family often shares the conserved HMG box domain, members of the Sox family were herein identified from the full-length gonad transcriptome of A. schrenckii. After further validation using SMART protein motif analysis, 82 nonredundant unigenes were found to have both HMG domains and SOX NR annotations with significant matches. We also found that the sequence variations of Sox increased the numbers of unigenes, and the 82 nonredundant unigenes belonged to seven members of the SOX family, including Sox3 (52 unigenes), Sox4 (4 unigenes), Sox5 (5 unigenes), Sox7 (10 unigenes), Sox8 (6 unigenes), Sox9 (4 unigenes) and Sox10 (1 unigene) (Fig. 5c and Supplementary Table 8).
Further analysis was performed using Sox9 as an example. The four nonredundant full-length unigenes were renamed Asc_Sox9–1-4 according to the length of PacBio Iso-Seq. The characteristics of the only completed sturgeon Sox9 protein from A. sinensis in the NCBI database (accession number: AHZ62758.1) and the Asc_Sox9–1-4 sequence are listed in detail in Table 6. The results showed two main differences: 1) compared to the length of Asi_Sox9 (2145 bp), those of the four Asc_Sox9–1-4 were longer and varied from 2873 bp to 3491 bp. 2) The lengths of the amino acids in Asc_Sox9–1-4 also varied, from 429 aa to 488 aa. The information for the Asc_Sox9–1-4 sequences is listed in Supplementary File 1. Due to the significant differences in amino acid lengths, HMG domain position, UTR length and expression abundance, the four Asc_Sox9 genes suggested putatively novel transcripts in A. schrenckii. Figure 6 shows a schematic diagram of the gene structure and expression abundance (FPKM levels) among the Asc_Sox9–1-4 genes.
The phylogenetic tree was constructed from the alignment of the amino acid sequences in the four Asc_Sox9–1-4 genes with those from forty-six other animal species from five classes, including Mammalia, Aves, Amphibia, Reptilias and Osteichthyes (Supplementary Table 9). From Fig. 7, we found that all four Asc_Sox9–1-4 genes and those of other sturgeon species were perfectly clustered into one group. Obviously, Asc_Sox9–1 gene was closer to that of A. baerii; however, it was genetically distant from Asc_Sox9–3, Asc_Sox9–2 and Asc_Sox9–4. As expected, compared with the other four classes, the sturgeon Sox9 genes had the closest evolutionary relationship with other teleost fish species among all the five classes. We also found differences between the Sox9 gene tree and the species tree, which suggests that Sox9 is under evolutionary selection pressure. Meanwhile, Sox9 genes from five different classes were clustered together into five groups, which indicates that their functions may be relatively conserved in vertebrates.
Female sturgeons are more valuable than males due to the valuable caviar produced by their ovaries; consequently identifying sturgeon sex as early as possible and mastering the regulation technology of female gonad development could reduce production costs for enterprises. However, we still have poor knowledge for the regulatory mechanisms of reproductive processes, for example sex determination and differentiation, gonad development and gametogenesis. Amur sturgeon (A. schrenckii) is a critically endangered fish in the Acipenseridae family distributed in the Amur river in China and Russia . Artificial reproduction of A. schrenckii began in the 1930s in China, and it has been the most popular sturgeon aquaculture species since the early 1990s. Currently, A. schrenckii has been one of the dominant caviar productions of farmed sturgeons and the popular crossing parents for sturgeon aquaculture in China [46, 47]. Here, the study of searching for genes involved in early gametogenesis and regulation mechanisms of reproduction control of A. schrenckii can provide practice application and theory basis. In the present study, we first produced high confidence and full-length transcriptome data from two independent types of gonad tissues (three testes and four ovaries), and maximized transcript diversity using the PacBio Sequel sequencing approach. As expected, a large amount of transcriptome data was generated, including 164,618 unigenes with a mean length of 2782 bp, compared with previous transcriptome reports yielding only N50 values with fewer than 1300 bp in A. naccarii , A. sinensis , and A. schrenckii . Here, a total of 154,006 (93.55%) out of 164,618 full-length unigenes were successfully annotated as known homologous genes using seven well-curated databases. Due to the absence of reference genomic information, the remaining 10,612 full-length unigenes may suggest putative novel genes in sturgeon.
The testes tissue of 3-year-old A. schrenckii individuals was active in the mitotic proliferation stage of spermatogonia. In teleost fish, R-spondin 1 (Rspo1) expression was upregulated just before meiotic initiation in both the ovary and testis during the early developmental stages and the deficiency of Rspo1 was intriguingly found to cause a delay in spermatogenesis in XY fish . In human, two estrogen receptor β (ERβ) wild-type transcript variants suggested specific functions in spermatogenesis due to their expression mainly located in somatic cells and primary spermatocytes . The 11-beta-hydroxysteroid dehydrogenase (Hsd11b2) was expressed higher in male than female of 2-year-old A. ruthenus . ATRX protein was present in adult human and rat testis and was expressed in spermatogonia, early meiotic spermatocytes and somatic cells . Many members of spermatogenesis associated protein family genes (Sap) are identified in testes and play important roles in the spermatogenesis process in vertebrates including fish, for example, Sap2 [52, 53], Sap4 , Sap6 , Sap22 , and so on. In our study, above mentioned genes were also identified to be sex-significant expression genes in testes, which suggest they may paly conserved role in the early spermatogenesis of sturgeon.
Morphological observations indicated that ovary tissue was active in the stage of ovarian follicles development, when it filled with growing oocytes with different diameter sizes and abundant nucleoli in 3-year-old A. schrenckii. The forkhead transcription factor (Foxl2) plays an essential role in early ovarian development, subsequent maintenance of female trait and reproduction function. The knockout of Foxl2 was reported to give rise to complete female to male sex reversal in mammals and teleost fish [57, 58]. In fish, Foxl2 may play an important role in ovarian differentiation by maintaining cyp19a (cytochrome P450 1A) expression and antagonizing the expression of Dmrt1(double sex and mab-3 related transcription factor 1) . In the present study, foxl2 and cyp19a were sexual dimorphic expression pattern and significantly higher expressed in the ovaries compared to the testes. Considering their significant role in the ovary, it would be necessary to explore the function of foxl2 and cyp19a further in the A. schrenckii. As the markers of undifferentiated spermatogonia, the expression patterns of Oct4 and Sox3 were found to have differentially expressed in the ovaries than that in the testes of the A. schrenckii transcriptome. The similar patterns were also reported in the previous Chinese sturgeon , which maybe signify a novel function in sturgeon.
At the molecular level, functional assignment indicated that the abundant transcript numbers and DEGs were mainly distributed in directly ovary-related KEGG pathways, including oocyte meiosis, progesterone-mediated oocyte maturation, oxytocin signaling pathway, estrogen signaling pathway, prolactin signaling pathway and ovarian steroidogenesis. Above results may suggest that the gonads of 3-year-old female Amur sturgeon are still in the core development stage for launching reproduction-related function activity. In mammals, the TGF-β signaling pathway has been shown to play important roles in the development of both ovarian and testicular functions [60, 61]. Moreover, the identification of Amh and Gsdf genes indicated that the TGF-β signaling pathway played a critical role in gonadal differentiation of teleost fish [62, 63]. In the present study, Amh and Gsdf, the two TGF-β subfamily numbers, were more significantly expressed in the testis compared to the ovary; in particular, the Amh gene had a testis-specific expression pattern. Therefore, whether the TGF-β signaling pathway is involved in sturgeon gonadal differentiation and how it works are worth further investigation.
In mammalian Mus musculus, the transcription factor Dmrt1 is sufficient to determine male fate, subsequent testicular development and both intrinsic and extrinsic control of spermatogenesis [64, 65]. In zebrafish, Dmrt1 functions in male sex determination and testis development . In this study, Dmrt1 expression levels were not found to be significantly different between the testes and ovaries of A. schrenckii. A similar result was also observed in Chinese sturgeon  and the starlet A. ruthenus . Coincidentally, Dmrt1 was reported that it does not participate in the initial steps of gonad differentiation in Siberian sturgeon  or Russian sturgeon . Recent study indicated that Dmrt1 proteins were abundantly expressed in spermatocytes and spermatids, but not in type A spermatogonia of the Japanese eel . Therefore, Dmrt1 expression also may be the stage-specific manner due to the pre-meiotic testes of 3-years-old A.schrenckii. However, another DM domain gene, DmrtB1, was found to be differentially expressed in the testes and ovaries of Amur sturgeon. DmrtB1 has been reported to play a pivotal role in coordinating the transition between mitosis and meiosis in murine germ cells and has a relevant role in the entry of spermatogonia into meiosis in humans . Thus, it would be worth investigating the special role that DmrtB1 plays during the spermatogenesis of Amur sturgeon.
Although sturgeons have Müllerian ducts, which different from the teleost fish, anti-müllerian hormone (Amh) is reported to be conserved in a wide range of fish species and has a possible regulatory interaction with sex steroids and gonadotropic hormones in gonad development in fishes . For example, the expression of Amh started in the gonads before sex differentiation, and its levels surged in the differentiated testes in Danio rerio . An Amh testis-specific expression pattern was also detected in this study, strengthening the specific expression pattern in differentiated testes of sturgeon such as A. baeri  and A. ruthenus . In teleost fish, a high diversity of sex candidate genes has been reported, for example Gsdf in Oryzias luzonensis  and Amhy (a duplicated copy of Amh) in the Patagonian pejerrey (Odontesthes hatcheri) . Recent studies have identified major candidate genes for sex determination and differentiation based on conserved molecular mechanisms in many developmental events for vertebrate taxa, but the results vary among different species of sturgeon.
For the early gametogenesis related genes, we compared our results with published studies of other sturgeons. We found two distinct characteristics among all the studies. First, two hormone genes, Gnrh and Fsh, were reported to be transcribed in the gonads of 3-year-old A. sinensis . However, no similar hormone genes could be found in the present study, the transcriptome data from 6 m A. naccarii , or even in five-year-old A. gueldenstaedtii ; we only found their corresponding receptors. Fish gonadal development is regulated by the hypothalamo-pituitary-gonad (HPG) axis. In axis, gonadotropin-releasing hormone (GnRH) expression is generally restricted to hypothalamus and promotes the synthesis and release of FSH and LH. FSH and LH are expressed in the pituitary by gonadotropic ells and activate their receptors and stimulate synthesis of the various sex steroid hormones in the gonads to regulate gametogenesis. That may be the main cause of no their transcripts in gonadal tissues. Second, Sox family numbers were found widely, which suggests a universal regulatory role in sturgeon gonadal differentiation. In this study, we provided an additional 31 full-length cDNA sequences grouped into five novel Sox family numbers for sturgeon species, including Sox5 (7 unigenes), Sox7 (15 unigenes), Sox8 (7 unigenes), Sox10 (1 unigene) and Sox18 (1 unigene). Therefore, a total of 14 Sox family numbers have been identified thus far from transcriptomes of sturgeon gonads.
The transcription factor Sox9, which is both necessary and sufficient for male fate and hence the maintenance of testis function in mammals , was found to have significantly higher expression in the testes than that in the ovaries of 3-year-old Amur sturgeon. The Sox9 gene expression characteristics in the other sturgeon species can be summarized as follows. In Russian sturgeon, Sox9 was expressed at significantly higher levels in the gonads of 2-year-old and 5-year-old males . A similar report was made for 4-year-old stellate sturgeon (A. stellatus) individuals  and Siberian sturgeon during the early sex differentiation period . These previous studies may suggest that Sox9 may have a core role in the early spermatogenesis of Amur sturgeon; however, additional studies i.e. function confirmation are necessary to verify the role of Sox9. In our study, we also found four novel Sox9 transcript variants with significantly different UTR region sequence and length. To our knowledge, the UTR region can play regulatory roles in mRNA transcription and translation. For example, the eukaryotic 5′ UTR is critical for ribosome recruitment to the mRNA and to start codon choice and that it plays a major role in the control of translation efficiency and shaping the cellular proteome . The sequence and length of 3’UTR region impact on mRNA expression by association with microRNA (miRNAs). MiRNAs are endogenous, small non-coding RNAs (19–25 nucleotides in length) that functions as post-transcriptional repressors of gene expression by binding to complementary sites of 3’UTR of target mRNA. MiRNAs participate in pervasive biological processes, including growth, development, tumorigenesis . Recent advances have been reported gonadal miRNAs directly control the differential expression of many sex-related genes to be involved in gametogenesis ; for example, miR-124 was involved in regulating the fate of developing ovarian cells by preventing the expression of Sox9 in mice . An other study also reported that miRNAs can control mRNA fate by compartmentalization based on the 3’UTR length in male germ cells . The present study is the first to report the four novel Sox9 transcript variants in the gonads of A. schrenckii. Excitedly, the identification of gonadal miRNAs and their differential expression in testis and ovary of A. schrenckii were also reported . Therefore, investigation combined the early gametogenesis related full-length genes with gonadal miRNAs may help us to further understand the roles of reproduction regulation mechanism in sturgeon.
Isoform identification from the PacBio Iso-Seq strategy is one of is extremely important advantages. When using short read RNA-Seq strategies, extensive alternative prediction is impractical and high variability of isoform expression quantification is impossible in sturgeon without a true genome reference. However, the PacBio Iso-Seq conveniently provides the ability to finding greater numbers of AS events in genes of many species, even those of reference-free species [23, 36]. From the full-length gonad transcriptome data, we detected a huge number (236,672, involving 36,522 nonredundant full-length unigenes) of AS events in A. schrenckii. We also revealed that these AS events are universal, involving early gametogenesis related genes and widely existing in early gametogenesis related GO terms and KEGG pathways, which most likely corresponds to the complexity of the early gametogenesis function. For example, Vasa was proven to be important role in animal reproduction. The absence of functional Vasa leads to sterility, but the penetrance of this phenotype vary by sex. In Drosophila, Vasa deficiency results in female-specific sterility . In mice and zebrafish, Vasa mutants have led to male-specific sterility [81, 82]. However, the regulation role of Vasa in gametogenesis is still unclear and previous reports mainly rely on expression studies. Vasa expression is restricted to only in both the ovary and testis with sexually dimorphic expression pattern. During gametogenesis, the Vasa protein was dynamically expressed in differentiating germ cells at different stages in adult gonads of the anadromous fish . In A. dabrynaus, Vasa mRNA and protein displayed mitotic and meiotic expression in females, and mainly showed mitotic expression in males . In our study, Vasa expression in both ovary and testis with sexually dimorphic pattern was consolidated in A. schrenckii, which maybe suggest the potential role in gametogenesis of sturgeon. Meanwhile, Vasa had relatively higher expression levels in testis than that in ovary, which is contrary with the ovary-biased pattern in Japanses flounder . Importantly, we also revealed the four isoform variants of Vasa, which may imply having different biology functions during sturgeon early gametogenesis. Recent evidences indicate that different splice variants of Vasa exist in many species, including bovine  and Chinese mitten crab . The qPCR further analysis indicated that the splicing variants of Vasa had different relative proportions during bull spermatogenesis. As 3-year-old A. schrenckill individuals were in the mitotic proliferation stage of spermatogonia, the testis-biased expression of Vasa suggests its roles in spermatogenesis, and further studies are required to investigate.
In summary, we combined PacBio Iso-Seq with Illumina short-read sequencing methods to conduct a comprehensive transcriptome analysis of the gonads of the Amur sturgeon, A. schrenckii. This approach enabled the generation of full-length transcripts as well as related analysis, that is, efficient gene annotation, alternative splicing, long noncoding RNAs and transcript factors. More importantly, isoform variants and expression profiles survey of the early gametogenesis related molecules of A. schrenckii contributed to a comprehensive insight into the early gametogenesis process and the reproduction regulation mechanisms of Amur sturgeon. Therefore, our study provides a valuable resource—a comprehensive full-length transcript set for genomic-level reference—which is both interesting and worthy of further in-depth studies in sturgeon.
The present study provides the new genetic resources of full-length gonad transcriptome data and comparative transcriptome information of the gonads of the Amur sturgeon (A. schrenckii), an importantly economic aquaculture sturgeon species. A total of 164,618 high-quality nonredundant full-length transcripts (unigenes) generated from 50.04 G subread bases were herein produced with an average length of 2782 bp, which represents a significant advance in sturgeon genetics. The study discovered the number of 60 full-length genes identified to be related to early gametogenesis, further out of the 30 genes showed differential expression in the testes and ovaries suggesting significant function in the early gametogenesis of sturgeon. Interestingly, Amh with testis-specific expression and Gsdf with significantly higher expression in testes than ovaries (fold change > 200) belong to two key numbers of TGF-β superfamily, which may play core regulatory roles in the spermatogenesis of A. schrenckii; while the foxl2 combining with Cyp19a imply significant regulatory role in the oogenesis of A. schrenckii. Meanwhile, the four Vasa isoforms and four novel Sox9 transcript variants also hint function complexity of early gametogenesis of A. schrenckii. Finally, a total of 236,672 AS events, the 10,556 putative lncRNAs and the 4339 predicted TFs were identified to be involved in biological process of early gametogenesis of A. schrenckii. In total, our results provide first full-length gonad transcription data and information as a genomic-level reference for sturgeon. These importantly provide candidate genes and theoretical basis for further exploration of reproduction regulation of sturgeon.
Availability of data and materials
All data generated or analyzed during this study are included in this published article and its additional files.
PacBio isoform sequencing
Differential expression unigenes
Long noncoding RNAs
- A. schrenckii :
Full-length, nonchimeric reads
NCBI nonredundant protein sequences
Cluster of Orthologous Groups of proteins
A manually annotated and reviewed protein sequence database
Kyoto Encyclopedia of Genes and Genomes
Cluster of Orthologous Groups of proteins
Expected number of fragments per kilobase of transcript sequence per million base pairs sequenced
Differentially expressed genes
False discovery rate
Coding potential calculator
Coding potential assessment tool
Saito T, Psenicka M, Goto R, Adachi S, Inoue K, Arai K, Yamaha E. The origin and migration of primordial germ cells in sturgeons. PLoS One. 2014;9(2):e86861.
Bemis WE, Findeis EK, Grande L. An overview of Acipenseriformes. In: Sturgeon biodiversity and conservation. Berlin: Springer; 1997. p. 25–71.
Nagahama Y. Endocrine regulation of gametogenesis in fish. Int J Dev Biol. 1994;38(2):217–29.
Senthilkumaran B. Pesticide- and sex steroid analogue-induced endocrine disruption differentially targets hypothalamo-hypophyseal-gonadal system during gametogenesis in teleosts. Gen Comp Endocrinol. 2015;219:136–42.
Ruban GI, Akimova NV, Goriounova VB, Mikodina EV, Nikolskaya MP, Shagayeva VG, Shatunovsky MI, Sokolova SA. Abnormalities in sturgeon gametogenesis and postembryonal ontogeny. J Appl Ichthyol. 2006;22:213–20.
Rzepkowska M, Ostaszewska T, Gibala M, Roszko ML. Intersex gonad differentiation in cultured Russian (Acipenser gueldenstaedtii) and Siberian (Acipenser baerii) sturgeon. Biol Reprod. 2014;90(2):31,1–10.
Ludwig A, Belfiore NM, Pitra C, Svirsky V, Jenneckens I. Genome duplication events and functional reduction of ploidy levels in sturgeon (Acipenser, Huso and Scaphirhynchus). Genetics. 2001;158:1203–15.
Fontana F, Congiu L, Mudrak VA, Quattro JM, Smith TI, Ware K, Doroshov SI. Evidence of hexaploid karyotype in shortnose sturgeon. Genome. 2008;51(2):113–9.
Vidotto M, Grapputo A, Boscari E, Barbisan F, Coppe A, Grandi G, Kumar A, Congiu L. Transcriptome sequencing and de novo annotation of the critically endangered Adriatic sturgeon. BMC Genomics. 2013;14(1):407.
Yue H, Li C, Du H, Zhang S, Wei Q. Sequencing and De novo assembly of the gonadal Transcriptome of the endangered Chinese sturgeon (Acipenser sinensis). PLoS One. 2015;10(6):e0127332.
Zhang XJ, Jiang HY, Li LM, Yuan LH, Chen JP. Transcriptome analysis and de novo annotation of the critically endangered Amur sturgeon (Acipenser schrenckii). Genet Mol Res. 2016;15(2):gmr-15027999.
Zhang X, Shi J, Sun Y, Zhu Y, Zhang Z. Transcriptome analysis provides insights into differentially expressed genes and long noncoding RNAs involved in sex-related differences in Amur sturgeon (Acipenser schrenckii). Mol Reprod Dev. 2019;86(2):132–44.
Chen Y, Xia Y, Shao C, Han L, Chen X, Yu M, Sha Z. Discovery and identification of candidate sex-related genes based on transcriptome sequencing of Russian sturgeon (Acipenser gueldenstaedtii) gonads. Physiol Genomics. 2016;48(7):464–76.
Chao Q, Gao ZF, Zhang D, Zhao BG, Dong FQ, Fu CX. The developmental dynamics of the Populus stem transcriptome. Plant Biotechnol J. 2019;17(1):206–19.
Meng F, Zhang Y, Zhou J, Li M, Shi G, Wang R. Do the toll-like receptors and complement systems play equally important roles in freshwater adapted Dolly Varden char (Salvelinus malma)? Fish Shellfish Immunol. 2019;86:581–98.
Au KF, Sebastiano V, Afshar PT, Durruthy JD, Lee L, Williams BA, van Bakel H, Schadt EE, Reijo-Pera RA, Underwood JG, Wong WH. Characterization of the human ESC transcriptome by hybrid sequencing. Proc Natl Acad Sci U S A. 2013;110(50):E4821–30.
Chen SY, Deng F, Jia X, Li C, Lai SJ. A transcriptome atlas of rabbit revealed by PacBio single-molecule long-read sequencing. Sci Rep. 2017;7(1):7648.
Zhang SJ, Wang C, Yan S, Fu A, Luan X, Li Y, Sunny Shen Q, Zhong X, Chen JY, Wang X, Chin-Ming Tan B, He A, Li CY. Isoform evolution in Primates through independent combination of alternative RNA processing events. Mol Biol Evol. 2017;34(10):2453–68.
Sahlin K, Tomaszkiewicz M, Makova KD, Medvedev P. Deciphering highly similar multigene family transcripts from Iso-Seq data with IsoCon. Nat Commun. 2018;9(1):4601.
Workman RE, Myrka AM, Wong GW, Tseng E, Welch KC Jr, Timp W. Single-molecule, full-length transcript sequencing provides insight into the extreme metabolism of the ruby-throated hummingbird Archilochus colubris. GigaScience. 2018;7:1–12.
Zhang X, Li G, Jiang H, Li L, Ma J, Li H, Chen J. Full-length transcriptome analysis of Litopenaeus vannamei reveals transcript variants involved in the innate immune system. Fish Shellfish Immunol. 2019;87:346–59.
Yi S, Zhou X, Li J, Zhang M, Luo S. Full-length transcriptome of Misgurnus anguillicaudatus provides insights into evolution of genus Misgurnus. Sci Rep. 2018;8(1):11699.
Li J, Harata-Lee Y, Denton MD. Long read reference genome-free reconstruction of a full-length transcriptome from Astragalus membranaceus reveals transcript variants involved in bioactive compound biosynthesis. Cell Discov. 2017;3:17031.
Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22(13):1658–9.
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25(17):3389–402.
Deng Y, Jianqi LI, Songfeng WU, Zhu Y, Chen Y, Fuchu HE. Integrated nr database in protein annotation system and its localization. Comput Eng. 2006;32(5):71–2.
Tatusov RL, Galperin MY, Natale DA, Koonin EV. The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 2000;28(1):33–6.
Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, Heger A, Hetherington K, Holm L, Mistry J, Sonnhammer EL, Tate J, Punta M. Pfam: the protein families database. Nucleic Acids Res. 2014;42:D222–30.
Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Natale DA, O’Donovan C, Redaschi N, Yeh LS. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 2004;32:D115–9.
Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M. The KEGG resource for deciphering the genome. Nucleic Acids Res. 2004;32:D277–80.
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kaasaskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25(1):25–9.
Koonin EV, Fedorova ND, Jackson JD, Jacobs AR, Krylov DM, Makarova KS, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Rogozin IB, Smirnov S, Sorokin AV, Sverdlov AV, Vasudevan S, Wolof YI, Yin JJ, Natale DA. A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes. Genome Biol. 2004;5(2):R7.
Hong CP, Lee SJ, Park JY, Plaha P, Park YS, Lee YK, Choi JE, Kim KY, Lee JH, Lee J, Jin H, Choi SR, Lim YP. Construction of a BAC library of Korean ginseng and initial analysis of BAC-end sequences. Mol Gen Genomics. 2004;271(6):709–16.
Li B, Dewey CN. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics. 2011;12:323.
Li B, Qing T, Zhu J, Wen Z, Yu Y, Fukumura R, Zheng Y, Gondo Y, Shi L. A comprehensive mouse Transcriptomic BodyMap across 17 tissues by RNA-seq. Sci Rep. 2017;7(1):4200.
Liu X, Mei W, Soltis PS, Soltis DE, Barbazuk WB. Detecting alternatively spliced transcript isoforms from single-molecule long-read sequences without a reference genome. Mol Ecol Resour. 2017;17(6):1243–56.
Sun L, Luo H, Bu D, Zhao G, Yu K, Zhang C, Liu Y, Chen R, Zhao Y. Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts. Nucleic Acids Res. 2013;41(17):e166.
Kong L, Zhang Y, Ye ZQ, Liu XQ, Zhao SQ, Wei L, Gao G. CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res. 2007;35:W345–9.
Finn RD, Coggill P, Eberhardt RY, Eddy SR. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 2016;44(D1):D279–85.
Wang L, Park HJ, Dasari S, Wang S, Kocher JP, Li W. CPAT: coding-potential assessment tool using an alignment-free logistic regression model. Nucleic Acids Res. 2013;41(6):e74.
Zhang HM, Liu T, Liu CJ, Song S, Zhang X, Liu W, Jia H, Xu Y, Guo AY. AnimalTFDB 2.0: a resource for expression, prediction and functional study of animal transcription factors. Nucleic Acids Res. 2015;43:D76.
Jin SB, Zhang Y, Dong XL, Xi QK, Song D, Fu HT, Sun DJ. Comparative transcriptome analysis of testes and ovaries for the discovery of novel genes from Amur sturgeon (Acipenser schrenckii). Genet Mol Res. 2015;14(4):18913–27.
Wang W, Zhu H, Dong Y, Dong T, Tian Z, Hu H. Identification and dimorphic expression of sex-related genes during gonadal differentiation in sterlet Acipenser ruthenus, a primitive fish species. Aquaculture. 2019;500:178–87.
Fajkowska M, Głowacka DK, Adamek-Urbańska D, Ostaszewska T, Krajnik KA, Rzepkowska M. Sex-related gene expression profiles in various tissues of juvenile Russian sturgeon (Acipenser gueldenstaedtii). Aquaculture. 2019;500:532–9.
Li D, Liu Z, Xie C. Effect of stocking density on growth and serum concentrations of thyroid hormones and cortisol in Amur sturgeon, Acipenser schrenckii. Fish Physiol Biochem. 2012;38(2):511–20.
Zhang X, Ma J, Li L, Jiang H, Chen J. Effects of different Cytoprotectants combination on sperm survival, fertility and embryo development in Amur sturgeon (Acipenser schrenckii). Anim Vet Sci. 2018;6(4):51.
Wei QW, Zou Y, Li P, Li L. Sturgeon aquaculture in China: progress, strategies and prospects assessed on the basis of nation-wide surveys (2007-2009). J Appl Ichthyol. 2011;27(2):162–8.
Wu L, Yang P, Luo F, Wang D, Zhou L. R-spondin1 signaling pathway is required for both the ovarian and testicular development in a teleosts, Nile tilapia (Oreochromis niloticus). Gen Comp Endocrinol. 2016;230-231:177–85.
Aschim EL, Saether T, Wiger R, Grotmol T, Haugen TB. Differential distribution of splice variants of estrogen receptor β in human testicular cells suggests specific functions in spermatogenesis. J Steroid Biochem Mol Biol. 2004;92(1–2):97–106.
Wang W, Zhu H, Dong Y, Tian Z, Dong T, Hu H, Niu C. Dimorphic expression of sex-related genes in different gonadal development stages of sterlet, Acipenser ruthenus, a primitive fish species. Fish Physiol Biochem. 2017;43(6):1557–69.
Tang P, Argentaro A, Pask AJ, O'Donnell L, Marshall-Graves J, Familari M, Harley VR. Localization of the chromatin remodelling protein, ATRX in the adult testis. J Reprod Dev. 2011;57:317–21.
Agarwal D, Gireesh-Babu P, Pavan-Kumar A, Koringa P, G. Joshi C, Gora A, Ahmad Bhat I, Chaudhari A. Molecular characterization and expression profiling of 17-beta-hydroxysteroid dehydrogenase 2 and spermatogenesis associated protein 2 genes in endangered catfish, Clarias magur (Hamilton, 1822). Anim Biotechnol. 2018;1–14. https://doi.org/10.1080/10495398.2018.1545663.
Zhao J, Zhao J, Xu G, Wang Z, Gao J, Cui S, Liu J. Deletion of Spata2 by CRISPR/Cas9n causes increased inhibin alpha expression and attenuated fertility in male mice. Biol Reprod. 2017;97:497–513.
Jiang J, Zhang N, Shiba H, Li L, Wang Z. Spermatogenesis associated 4 promotes Sertoli cell proliferation modulated negatively by regulatory factor X1. PLoS One. 2013;8(10):e75933.
Oh C, Aho H, Shamsadin R, Nayernia K, Muller C, Sancken U, Szpirer C, Engel W, Adham IM. Characterization, expression pattern and chromosomal localization of the spermatogenesis associated 6 gene (Spata6). Mol Hum Reprod. 2003;9(6):321–30.
Hays E, Majchrzak N, Daniel V, Ferguson Z, Brown S, Hathorne K, La Salle S. Spermatogenesis associated 22 is required for DNA repair and synapsis of homologous chromosomes in mouse germ cells. Andrology. 2017;5(2):299–312.
Uhlenhaut NH, Jakob S, Anlag K, Eisenberger T, Sekido R, Kress J, Treier AC, Klugmann C, Klasen C, Holter NI, Riethmacher D, Schutz G, Cooney AJ, Lovell-Badge R, Treier M. Somatic sex reprogramming of adult ovaries to testes by FOXL2 ablation. Cell. 2009;139(6):1130–42.
Fan Z, Zou Y, Liang D, Tan X, Jiao S, Wu Z, Li J, Zhang P, You F. Roles of forkhead box protein L2 (foxl2) during gonad differentiation and maintenance in a fish, the olive flounder (Paralichthys olivaceus). Reprod Fertil Dev. 2019;31(11):1742–52.
Zhang X, Li M, Ma H, Liu X, Shi H, Li M, Wang D. Mutation of foxl2 or cyp19a1a results in female to male sex reversal in XX Nile Tilapia. Endocrinology. 2017;158(8):2634–47.
Drummond AE. TGFβ signalling in the development of ovarian function. Cell Tissue Res. 2005;322(1):107–15.
Spiller C, Burnet G, Bowles J. Regulation of fetal male germ cell development by members of the TGFβ superfamily. Stem Cell Res. 2017;24:174–80.
Hattori RS, Murai Y, Oura M, Masuda S, Majhi SK, Sakamoto T, Fernandino JI, Somoza GM, Yokota M, Strussmann CA. A Y-linked anti-Müllerian hormone duplication takes over a critical role in sex determination. Proc Natl Acad Sci U S A. 2012;109(8):2955–9.
Myosho T, Otake H, Masuyama H, Matsuda M, Kuroki Y, Fujiyama A, Naruse K, Hamaguchi S, Sakaizumi M. Tracing the emergence of a novel sex-determining gene in medaka, Oryzias luzonensis. Genetics. 2012;191(1):163–70.
Matson CK, Murphy MW, Sarver AL, Griswold MD, Bardwell VJ, Zarkower D. DMRT1 prevents female reprogramming in the postnatal mammalian testis. Nature. 2011;476(7358):101–4.
Zhang T, Oatley J, Bardwell VJ, Zarkower D. DMRT1 is required for mouse spermatogonial stem cell maintenance and replenishment. PLoS Genet. 2016;12(9):e1006293.
Webster KA, Schach U, Ordaz A, Steinfeld JS, Draper BW, Siegfried KR. Dmrt1 is necessary for male sexual development in zebrafish. Dev Biol. 2017;422(1):33–46.
Vizziano-Cantonnet D, Di Landro S, Lasalle A, Martinez A, Mazzoni TS, Quagio-Grassiotto I. Identification of the molecular sex-differentiation period in the Siberian sturgeon. Mol Reprod Dev. 2016;83(1):19–36.
Jeng SR, Wu GC, Yueh WS, Kuo SF, Dufour S, Chang CF. Dmrt1 (doublesex and mab-3-related transcription factor 1) expression during gonadal development and spermatogenesis in the Japanese eel. Gen Comp Endocrinol. 2019;279:154–63.
Hilbold E, Bergmann M, Fietz D, Kliesch S, Weidner W, Langeheine M, Rode K, Brehm R. Immunolocalization of DMRTB1 in human testis with normal and impaired spermatogenesis. Andrology. 2019;7(4):428–40.
Pfennig F, Standke A, Gutzeit HO. The role of Amh signaling in teleost fish-multiple functions not restricted to the gonads. Gen Comp Endocrinol. 2015;223:87–107.
Chen W, Liu L, Ge W. Expression analysis of growth differentiation factor 9 (Gdf9/gdf9), anti-müllerian hormone (Amh/amh) and aromatase (Cyp19a1a/cyp19a1a) during gonadal differentiation of the zebrafish, Danio rerio. Biol Reprod. 2017;96(2):401–13.
Croft B, Ohnesorg T, Hewitt J, Bowles J. Human sex reversal is caused by duplication or deletion of core enhancers upstream of SOX9. Nat Commun. 2018;9(1):5319.
Burcea A, Popa GO, Florescu Gune IE, Maereanu M, Dudu A, Georgescu SE. Expression characterization of six genes possibly involved in gonad development for stellate sturgeon individuals (Acipenser stellatus, Pallas 1771). Int J Genomics. 2018;2018:7835637.
Hinnebusch AG, Ivanov IP, Sonenberg N. Translational control by 5′-untranslated regions of eukaryotic mRNAs. Science. 2016;352(6292):1413–6.
Bartel DP. MicroRNAs: target recognition and regulatory functions. Cell. 2009;136(2):215–33.
Ryazansky SS, Mikhaleva EA, Olenkina OV. Essential functions of microRNAs in animal reproductive organs. Mol Biol. 2014;48(3):319–31.
Real FM, Sekido R, Lupianez DG, Lovell-Badge R, Jimenez R, Burgos M. A microRNA (mmu-miR-124) prevents Sox9 expression in developing mouse ovarian cells. Biol Reprod. 2013;89(4):78.
Zhang Y, Tang C, Yu T, Zhang R, Zheng H, Yan W. MicroRNAs control mRNA fate by compartmentalization based on 3′ UTR length in male germ cells. Genome Biol. 2017;18(1):105.
Zhang X, Li L, Jiang H, Ma JE, Li J, Chen J. Identification and differential expression of microRNAs in testis and ovary of Amur sturgeon (Acipenser schrenckii). Gene. 2018;658:36–46.
Raz E. The function and regulation of vasa-like genes in germ-cell development. Genome Biol. 2000;1(3):1017.
Tanaka SS, Toyooka Y, Akasu R, Katoh-Fukui Y, Nakahara Y, Suzuki R, Yokoyama M, Noce T. The mouse homolog of Drosophila vasa is required for the development of male germ cells. Genes Dev. 2000;14(7):841–53.
Hartung O, Forbes MM, Marlow FL. Zebrafish vasa is required for germ-cell differentiation and maintenance. Mol Reprod Dev. 2014;81(10):946–61.
Wu XL, Tang ZK, Li W, Chu ZJ, Hong XY, Zhu XP, Xu HY. Identifying the germ cells during embryogenesis and gametogenesis by germ-line gene vasa in an anadromous fish, American shad Alosa sapidissima. J Fish Biol. 2018;92(5):1422–34.
Ye H, Yue HM, Yang XG, Li CJ, Wei QW. Identification and sexually dimorphic expression of vasa isoforms in Dabry's sturgeon (Acipenser dabryanus), and functional analysis of vasa 3′-untranslated region. Cell Tissue Res. 2016;366(1):203–18.
Wu X, Wang Z, Jiang J, Gao J, Wang J, Zhou X, Zhang Q. Cloning, expression promoter analysis of vasa gene in Japanese flounder (Paralichthys olivaceus). Comp Biochem Physiol B Biochem Mol Biol. 2014;167:41–50.
Luo H, Zhou Y, Li Y, Li Q. Splice variants and promoter methylation status of the bovine vasa homology (Bvh) gene may be involved in bull spermatogenesis. BMC Genet. 2013;14:58.
Yang GC, Wang RR, Liu ZQ, Ma KY, Feng JB, Qiu GF. Alternative splice variants and differential relative abundance patterns of vasa mRNAs during gonadal development in the Chinese mitten crab Eriocheir sinensis. Anim Reprod Sci. 2019;208:106131.
This work was supported by the Funds of the National Natural Science Fund of China , the Natural Science Fund of Guangdong Province [2018A030310488], and the GDAS’ Project of Science and Technology Development [2020GDASYL-20200104026, 2019GDASYL-0104017 and 2018GDASCX-0107].
Ethics approval and consent to participate
All the experimental animal procedures followed the principles of the Guide for Care and Use of Laboratory Animals and were approved by the Animal Experimental Ethical Committee of Guangdong Institute of Applied Biological Resources.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The GO classification of the unigenes in A. schrenckii.
Volcano plot showing all the diferentially expressed unigenes (DEUs) in the gonad full-length transcriptome of A. schrenckii. The 18,863 DEUs occur in ovary-biased patterns and the 5238 DEUs occur in testis-biased patterns.
Volcano plot showing the 961 putative LncRNAs differentially expressed between the ovaries and testes of A. schrenckii, including 235 ovary-biased LncRNAs and 726 testis-biased LncRNAs.
Summary of the ROIs from two SMRT cells in the gonads of A.schrenckii.
The length distribution of full-length unigenes in the gonads of A. schrenckii acquired using the PacBio Sequel platform.
Statistics of the clean short reads respectively from the testes and ovaries of seven A. schrenckii individuals using Illumina short-read RNA sequencing.
Detailed GO annotation classification information for the unigenes of A. schrenckii.
Detailed KEGG pathway annotation classification information for the unigenes of A. schrenckii.
Information concerning the full-length unigenes annotated into early gametogenesis related genes using the NR database.
The alternative splicing (AS) events of nonredundant early gametogenesis related unigenes in the full-length gonad transcriptome of A.schrenckii.
Sox family members identified by TFDB, NR and SMART association analyses.
Species and accession numbers of Sox9 proteins used in the phylogenetic analysis.
The cDNA sequences of full-length transcripts from 4 A. schrenckii Sox9s (Asc_Sox9–1-4) are underlined, including the 5′-untranslated region (UTR), the 3′-UTR containing a poly (A) tail and the open reading frame underlined.
About this article
Cite this article
Zhang, X., Zhou, J., Li, L. et al. Full-length transcriptome sequencing and comparative transcriptomic analysis to uncover genes involved in early gametogenesis in the gonads of Amur sturgeon (Acipenser schrenckii). Front Zool 17, 11 (2020). https://doi.org/10.1186/s12983-020-00355-z