Study species
Quasipaa spinosa is a large-sized anuran. Adult males are characterized by keratinized spines on their chests that appear during breeding seasons [22]. The distribution of Q. spinosa covers 12 provincial-level regions across central, southern and southwestern China (including Yunnan, Guizhou, Guangxi, Guangdong, Hong Kong, Fujian, Jiangxi, Hunan, Hubei, Anhui, Jiangsu and Zhejiang) and northern Vietnam [20, 22]. It inhabits rocky streams in evergreen forests and open countrysides/fields on hills and mountains. Its elevation range is 200–1500 m a.s.l. [20, 22, 23]. This species is considered an important food and medicinal resource according to traditional views [22]. It has been collected for consumption throughout its range for many decades, a practice that persists. To make matters worse, habitats of Q. spinosa have been degraded and/or destroyed by agricultural pollution and the construction of dams for hydropower projects [20, 22, 24]. This species is currently classified as “vulnerable (A2abc)” on the IUCN Red List [20]. Despite the initiation of captive breeding programmes in the 1980s, low fertilization and hatching rates, diseases, and high overwinter mortality mean that the cultivation of Q. spinosa strongly depends on the supply of tadpoles or adults from wild populations [24]. Thus, other conservation measures to arrest the population decline of this species are urgently required.
Data collection and processing
The study area constituted central and southern China and adjacent countries in Southeast Asia, including Vietnam, Laos, Thailand, Cambodia, the Philippines, and Myanmar (10–35°N, 100–125°E; Fig. 1). We collected a total of 1714 occurrence records of Q. spinosa from field expeditions conducted by Chengdu Institute of Biology (CIB) personnel (2010–2015), supplemented with location data from the literature [24,25,26], the Global Biodiversity Information Facility (GBIF; http://www.gbif.org/), and georeferenced specimen records in the Herpetological Museum of the Chengdu Institute of Biology (CIB). To avoid georeferencing errors and over-fitting in the following modelling, we checked all location data in ArcGIS 9.2 (ESRI, Redland, USA) and removed duplicate occurrences at a spatial resolution of 1 × 1-km so that each grid cell had only a single record (in each grid cell, we randomly selected one occurrence and deleted all the other occurrences) [27]. We obtained a total of 136 occurrences composing the final dataset (Fig. 1).
Based on the habitats of Q. spinosa [20, 22] and our field experiences, we considered five types (i.e., climate, habitat, biogeography, topography, and human impact) of environmental variables, including 22 parameters (Table S1 in Additional File 1 [28,29,30,31,32,33];) that may impact the habitat use and distribution of Q. spinosa in our modelling processes. For several variables of each type, we can use dimension-reduction techniques (e.g., correlation analysis, clustering algorithms) by pruning the variables with possible redundancy [7, 34]. This procedure was to minimize the risk of overfitting species-environment relationships. Specifically, we retained those variables that gave a higher value in the jackknife analysis [35], which consequently yielded the 14 climatic variables and eight other types of variables (Table S1 in Additional File 1). All variables were transformed into 1 × 1-km equal-area grid rasters in ArcGIS 9.2 and projected onto the UTM WGS 1984 projection.
To capture the uncertainties in future climate projections [9] and represent different temperature sensitivities and greenhouse gas emissions, we considered six global climate models (GCMs: NCC-NorEsm1-M, MRI-CGCM3, BCC-CSM1.1, CSIRO_MK_3_6_0, MIROC_ESM_CHEM, and IPSL_CM5A_LR) for four representative greenhouse gas concentration pathways (RCPs: RCP2.6, RCP4.5, RCP6.0, and RCP8.5) from the IPCC 5th Assessment report [1]. We obtained the 14 climatic variables for 2050 and 2070 from the WorldClim 1.4 dataset [36], that is, we collected 24 projections (6 GCMs × 4 RCPs) for each variable at each time slice to illustrate the temperature and precipitation conditions at that time. For the past climate, variable layers were obtained for the time frames of the Mid-Holocene (MidH, ~ 6 kyr BP) [36], the Last Glacial Maximum (LGM, ~ 22 kyr BP) [37], and the LIG [38]. Three GCMs (CCSM4, Mirco-ESM, and MPI-ESM-P) were used for both the MidH and LGM, but only one GCM (CCSM) was available for the LIG.
For each climate change scenario, climatic variables were used to make projections into the past and the future. The non-climatic variables (i.e., AI, aridity index; PET, annual potential evapo-transpiration; Landcover, land cover type; NPP, net primary productivity; Biome, terrestrial ecoregion; HFP, human footprint index) were assumed to be constant over time, as past and future estimates of these factors were, unfortunately, not available. In addition, these variables are likely to be determined not only by climatic drivers but also a wide range of socioeconomic drivers, and any simple estimates or extrapolations could be misleading [cf. 39]. Thus, their constancy is an assumption that presents a conservative prognosis and limits additional uncertainties [cf. 39]. To determine the influence of the non-climatic data, we predicted and analysed the environmental suitability for Q. spinosa by comparing two competing models [39,40,41]: the full model (using all 22 variables) and a climate-only model (using only the 14 bioclimatic variables).
Ensemble distribution modelling and climate change impacts
We predicted the potential distributions of Q. spinosa under current climatic conditions using a maximum entropy approach in MaxEnt v3.3.3 k [42, 43]. MaxEnt has been shown to have good performance and consistently outperforms many other methods, especially with small sample sizes, noisy input data, and correlated parameters [43, 44]. As an inappropriate model complexity (e.g., inappropriate combination of feature classes, level of regularization, or variable selection) or data organization (e.g., occurrence and background localities partition) can typically reduce the accuracy of inferred habitat quality, we selected model settings and conducted model choices using the ENMeval 0.1.0 package in R 4.0.3 software [45,46,47,48]. Following the method outlined in Muscarella et al. [47], five feature class values (FC: linear, quadratic, product, threshold, hinge), eight regularization multiplier values (RM: 0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4), five locality partition methods (jackknife, randomkfold, block, checkerboard1, checkerboard2) were calculated; the mean values of the area under the receiver operating characteristics curve using the testing data across all data bins (mean AUCtest) and the mean values of the difference between the training and testing AUCs across all data bins (mean AUCdiff) were computed as the evaluation metrics for the goodness-of-fit and the degree of overfitting of the model. For model selection, we used the Akaike Information Criteria corrected for a small sample size (AICc) to compare different variable combinations [46, 49]. The existence of sampling bias in background points could decrease modelling effectiveness; thus, we used a kernel density estimator (KDE) surface to create 10,000 random background points in the Software for Automated Habitat Modelling (SAHM) [50]. As the models incorporating all environmental factors (i.e., 22 variables for the full model, 14 variables for the climate-only model) and with the conditions FC = linear, RM = 1, and the randomkfold (k = 10) partition method showed the best performances (see Results), 80% of the localities were randomly selected to generate a training set, and the remaining 20% of the localities were withheld to use as an evaluation set. Other model settings were adopted, including the maximum number of iterations (500) and the convergence threshold (10− 5) [43]. We selected the logistic output format with suitability values ranging from 0 to 1 and conducted jackknife procedures to evaluate the relative contribution of each variable [43].
We then projected the contemporary species-environment relationships into different time slices (the past: LIG, LGM and MidH; the future: 2050 and 2070). For each time slice and each environmental variable group, we ran ten cross-validation replications of the model and weighted them by their AUCs to obtain an ensemble distribution prediction [51, 52]. Next, we averaged the outputs for each past (3GCMs each for LGM and MidH) and future time slice (6 GCMs × 4 RCPs) to achieve a suitability map for each timeframe.
To convert continuous outputs into presence-absence maps, we extracted the suitability values of the occurrences of Q. spinosa from the raw modelling results and calculated the mean values as the thresholds (full model: 0.61, climate-only model: 0.58) [53]. That is, areas with predicted suitability values above and below the threshold were considered “present” (suitable) and “absent” (unsuitable), respectively. Considering the home ranges, movements and dispersal abilities of anurans [54], we excluded habitat patches < 10 km2 in size from the subsequent analyses. To quantify the impacts of climate change, we first calculated the total area of suitable habitats, the number of suitable habitat patches, and the area of the largest and smallest habitat patches for each time slice. Then, the latitudes, longitudes, and elevations of the “present” grid cells were extracted, and one-way ANOVAs were conducted to detect their variations among time slices. In addition, to show the effectiveness of the existing protected area network on protecting Q. spinosa habitats, we overlaid the distributions of protected areas ([55]; only national and international protected areas were included as they have relatively reliable management efforts [56, 57]) and suitable habitat patches at present and in the future (for the current time, 2050, and 2070; for the full model and the climate-only model). We calculated the percentages of the area and the percentages of the number of patches of suitable habitats covered by the protected areas for each time slice.
Landscape connectivity assessment
Based on the response curves, which show the relationships among environmental variable gradients and the predicted suitability of species, we ranked each variable into rasters ranging from 1 to 10, with higher values indicating higher habitat quality (Table S2 in Additional File 1). Low-quality grid cells have high movement costs for organisms, whereas good habitats are associated with lower resistance to movement [14, 58]. We then reversed the integer categories of each variable for habitat quality to reclassify the movement cost score of Q. spinosa by grid cell (Table S2 in Additional File 1). To create a movement cost surface for each time period, we weighted the reclassified rasters of environmental variables with training gains with only the variables obtained from the jackknife procedure (Table S2 in Additional File 1), and then averaged the movement cost surfaces from all the GCMs and RCPs (3 GCMs for LGM and MidH, 6 GCMs × 4 RCPs for 2050 and 2070).
Landscape connectivity was assessed using a pairwise cost distance method, the Cost Distance Tool in ArcGIS 9.2 [59]. This method evaluates the best potential dispersal linkage by summing the movement cost of each grid cell between two core habitat patches. The best potential dispersal linkage is the one that has the least resistance (the least cost patch-to-patch path, LCP). Based on the movement cost surfaces and suitable habitats, we created pairwise LCPs among habitat patches for each time slice with the Corridor Design Tool [59]. We accounted for the potential limits to dispersal distances by using four LCP length caps: 20 km, 50 km, 100 km, and no limit. To investigate the dynamics of landscape connectivity as consequences of climate change over time, we compared the numbers, mean lengths, and mean movement costs of the LCPs among the time slices.