Effects of landscapes and range expansion on population structure and local adaptation.

Understanding the origin and distribution of genetic diversity across landscapes is critical for predicting the future of organisms in changing climates. This study investigated how adaptive and demographic forces have shaped diversity and population structure in Pinus densata, a keystone species on Qinghai-Tibetan Plateau (QTP). We examined the distribution of genomic diversity across the range of P. densata using exome capture sequencing. We applied spatially explicit tests to dissect the impacts of allele surfing, geographic isolation and environmental gradients on population differentiation and forecasted how this genetic legacy may limit the persistence of P. densata in future climates. We found that allele surfing from range expansion could explain the distribution of 39% of the ~48,000 genotyped SNPs. Uncorrected, these allele frequency clines severely confounded inferences of selection. After controlling for demographic processes, isolation-by-environment explained 9.2-19.5% of the genetic structure, with ~4.0% of loci being affected by selection. Allele surfing and genotype-environment associations resulted in genomic mismatch under projected climate scenarios. We illustrate that significant local adaptation, when coupled with reduced diversity due to demographic history, constrains potential evolutionary response to climate change. The strong signal of genomic vulnerability in P. densata may be representative for other QTP endemics.


Introduction
The ability of a species to sustain environmental change is primarily determined by its genetic reservoir, which is shaped over the course of history through demography and selection. Dissecting the effects of demography, geography and selection on population diversity helps us to understand how genetic variation is distributed across a landscape, as well as the evolutionary potential of species under climate change (Sork et al., 1999;Lee & Mitchell-Olds, 2011;Manel & Holderegger, 2013;Orsini et al., 2013).
Sources of genetic differentiation can be broadly classified into adaptive and dispersal-demographic factors. Among the later, isolation by distance (IBD; Wright, 1943), a neutral process where gene flow is increasingly limited between more distant or isolated populations, is a well-studied source of clinal variation. More recently, the importance of density-dependent effects during range expansion in producing strong clines or even discrete genetic sectors has been illustrated by microbial experiments and simulation studies (Excoffier & Ray, 2008;Excoffier et al., 2009;Waters et al., 2013;Peischl et al., 2016). Repeated founder effects combined with density-dependent competitive exclusion can allow alleles to 'surf' to very high or low frequencies, which may leave a molecular signature similar to selection (Edmonds et al., 2004;Klopfstein et al., 2006;Excoffier & Ray, 2008). Clines produced by allele surfing can overlap with those produced by IBD, but surfing can also result in strong differentiation between geographically proximate populations. Until now, the impact of range expansion on allele frequency clines (AFCs) has been obtained mostly from theoretical simulations (Klopfstein et al., 2006;Lotterhos & Whitlock, 2015;Hoban et al., 2016), with few empirical studies in natural populations, especially in plants (but see Gonz alez-Mart ınez et al., 2017;Ruiz Daniels et al., 2018). Detecting adaptive sources of genetic differentiation is often confounded by dispersal-demographic factors including IBD and allele surfing, but natural populations are widely expected to experience isolation by environment (IBE; Orsini et al., 2013;Wang & Bradburd, 2014), in which gene flow among populations inhabiting different ecological habitats is limited by selection (Nosil et al., 2009;Feder et al., 2012a). Although methods exist to control for the ubiquitous autocorrelation of IBE and IBD, distinguishing selection from dispersal-demographic effects during range expansion remains a challenge.
Patterns of IBE can be used to identify relationships between genomic and environmental variation, which can be projected onto future climate models to estimate the vulnerability of extant populations to extinction (Manel & Holderegger, 2013;Fitzpatrick & Keller, 2015;Bay et al., 2018). Understanding the genomic mismatch between modern and future environments is necessary for assessing the ability of populations to persist. Populations with high genomic mismatch are likely to suffer population decline if de novo mutations and migration cannot compensate for the required diversity Ruegg et al., 2018). Moreover, the potential for populations to adapt interacts with demographic history, and range expansions may constrain viability by reducing both the pool of genetic diversity and the effectiveness of selection to purge deleterious alleles.
Pinus densata forms extensive forests on the southeastern Qinghai-Tibetan Plateau (QTP) at elevations ranging from 2700 to 4200 m above sea level . Previous genetic analyses suggest that P. densata originated from hybridization between Pinus tabuliformis and Pinus yunnanensis in the late Miocene (Wang & Szmidt, 1994;Wang et al., 2011;Gao et al., 2012). An ancient hybrid zone has been identified in the northeastern edge of the current distribution of P. densata, from where the hybrid lineage successfully colonized new habitats by stepwise, westward migration Gao et al., 2012). This demographic history could produce significant AFCs along the expansion route as a result of genetic surfing. Reciprocal transplant experiments revealed pronounced differences in survival among P. densata populations, suggesting extensive local adaptation (Zhao et al., 2014). These lines of evidence suggest that demographic events and local adaptation have played important roles in the evolution of P. densata. However, until now it has been difficult to fully address the relative contribution of these forces because of the lack of genomic resources and methods to adequately distinguish between AFCs generated by dispersal-demographic effects and IBE.
In this study, we used spatially explicit tests to identify the signature of allele surfing, IBD and IBE and forecasted how this genetic legacy might constrain the persistence and evolution of P. densata in future climates. We hypothesize that: the stepwise expansion history of P. densata generated substantial AFCs along the east-west colonization axis; adaptation to heterogeneous plateau habitats marked the genome with the signature of IBE; and strong AFCs coupled with decreased diversity result in high genomic mismatch to future climates. To test these hypotheses, we sampled 23 populations across the range of P. densata and examined the allele frequency distributions over 40 000 exome regions. We estimated the magnitude of AFCs as a result of range expansion, investigated gene-environment associations and partitioned the contribution of environment and geography to among-population differentiation, with and without clinal loci. Using current genotype-environment association as a baseline, we projected the risk of population decline under future climate change scenarios. Our study provides a concrete dissection of complex evolutionary forces that shape population structure in a keystone alpine species and highlights the vulnerability of endemics to climate change.

Sampling and exome capture sequencing
We sampled 23 populations across the distribution of P. densata (Fig. 1a). The name, location and sample size of each population are listed in Table 1. Because of its hybrid history, we included two and four representative populations of P. tabuliformis and P. yunnanensis, respectively, to better polarize the genetic components in P. densata. Populations from the eastern margin of P. densata (group E populations in this study) have a mix of mitochondrial DNA (mtDNA) haplotypes found in the two parental species, reflecting heterogeneous maternal lineages, but predominately P. tabuliformis chloroplast DNA (cpDNA) haplotypes as a result of pollen-mediated introgression (cpDNA is paternally inherited in pines; Wang et al., 2011). The frequency of haplotypes unique to P. densata for both organelles increases westward, which supports the eastern margin as the ancient hybrid zone where P. densata originated Gao et al., 2012). Despite their organelle haplotype overlap, P. densata populations in the east are distinct from the two parental species in cone and seed morphometric traits (Mao et al., 2008).
Needles or cones were collected from four to 12 randomly selected trees in each stand. Genomic DNA was extracted from needles or seedlings using a Plant Genomic DNA kit (Tiangen, Beijing, China). Forty thousand exome probes (each 120 nt) were designed from Pinus taeda UniGenes (Neves et al., 2013). The majority were aligned to c. 29 000 genes, while 9800 probes were aligned to intergenic regions. Library preparation, probe hybridization, and sequencing were conducted by RAPiD Genomics (Gainesville, FL, USA; Neves, et al., 2013). In total, 208 trees were genotyped.

Reduced reference genome preparation, mapping and variant calling
Genomes in Pinus are > 20 Gbp and computationally challenging for population genomic analyses. We thus prepared a reduced reference genome from the P. taeda v.1.01 assembly (Neale et al., 2014) following the approach of Yeaman et al. (2016). Briefly, any scaffolds to which the capture probes aligned were retained as the reduced reference genome. Exome sequence reads for 58 individuals (two per population) were first aligned to the whole genome using the Burrows-Wheeler Aligner mem (BWA-MEM) algorithm with default parameters (Li, 2013). Variants were called with the SAMTOOLS and BCFTOOLS pipeline using default parameters (Li, 2011). Scaffolds that had at least one single nucleotide polymorphism (SNP) in ≥ 50% of the individuals were also included in the reduced reference. In total, 55 300 scaffolds were included in the reduced reference genome.
Several filtering steps were performed to minimize SNP calling errors: SNPs within 5 bp from any indel were removed; GATK hard filters were set to QD < 2.0, FS > 60.0, SOR > 4.0, MQ < 40.0, MQRankSum < À12.5 and ReadPosRankSum < À8.0; genotypes with genotype quality (GQ) < 20 or read depth (DP) < 5 were masked; and SNPs that met any of the following criteria were removed: missing rate > 20%, minor allele frequency (MAF) < 5%, heterozygosity > 70% or allele number > 2. The remaining SNPs were used for population genomic analyses, except for site frequency spectrum (SFS) and nucleotide diversity-based analyses for which no MAF filtering was applied.

Population structure and diversity
Genomic structure was assessed using FASTSTRUCTURE (Raj et al., 2014), and model complexity (i.e. the number of K genetic groups required to explain structure in the dataset) was examined

Research
New Phytologist for K = 1-16. The most likely K was selected using the 'chooseK.py' script. Geographic maps of ancestry coefficients were drawn in R following Caye et al. (2016). Population structure was also evaluated using principal component analysis (PCA) implemented in EIGENSOFT v.6.1.4 (Price et al., 2006). Genetic diversity in each population was estimated as average heterozygosity per locus (Het) and pairwise nucleotide diversity at all sites (p) and at 0-fold (p 0 ) and four-fold (p 4 ) degenerate coding sites using VCFTOOLS (https://vcftools.github.io/index.html; Danecek et al., 2011). The Het was calculated using MAF-filtered informative SNPs, while p was calculated using both informative and invariant SNPs. The zygotic linkage disequilibrium (LD; squared correlation coefficient, r 2 ) values for all SNP pairs within each scaffold were calculated using VCFTOOLS (https://vcftools.github. io/index.html) and plotted against the physical distances between SNP pairs in each scaffold. Population differentiation (F ST ) was calculated using VCFTOOLS.

Estimating the fitness effects of amino acid-changing mutations
To understand the strength of purifying selection operating in P. densata populations, we estimated the distribution of fitness effects (DFE) of nonsynonymous mutations by using the maximum-likelihood procedure implemented in DFE-a (Keightley & Eyre-Walker, 2007;Eyre-Walker & Keightley, 2009). This method assumes that the fitness effects of new mutations at neutral sites are zero and deleterious at selected sites. We generated the folded SFS in each population for a class of putatively neutral reference sites (four-fold degenerate sites) and a class of selected Table 1 Geographic locations, sample size (N), average heterozygosity per locus (Het), mean nucleotide diversity across all loci (p), 0-fold degenerate (p 0 ) and four-fold degenerate loci (p 4 ) of the 29 populations of Pinus densata, Pinus tabuliformis, and Pinus yunnanensis included in this study. sites (0-fold degenerate sites). We modeled the effects of recent demographic change on neutral SFS by assuming one step population size change and inferred the fitness of new deleterious mutations at the selected sites from a gamma distribution while simultaneously fitting the estimated parameters for the demographic model. The strength of purifying selection is defined as the product of the effective population size N e and the selection coefficient s (ÀN e s). We performed 999 bootstrap resampling of SNPs in each site class to generate the 95% confidence intervals of ÀN e s.

Detection of AFCs
To detect allele frequency changes along an east to west axis across the P. densata range, we used a sliding-window approach following Currat & Excoffier (2005) and Pereira et al. (2018). Briefly, the distribution range was divided longitudinally into 1.5°(c. 150 km) nonoverlapping windows. The mean frequency of each SNP in every window was calculated. We then used linear regression to test the strength of the relationship between allele frequency changes from east to west vs geographic distance between windows. To evaluate whether the observed clines were a result of chance, we used a permutation test to derive the null distribution of cline slopes by performing linear regression on each allele frequency vs 10 000 randomized population coordinates. The observed slope of each allele was then compared with the null distribution to test the probability that the observed cline was significant at the a = 0.05 level. This procedure was performed in R for all SNPs. Alleles were considered AFC alleles if the regression was significant and the slope was significantly different from the null distribution. Allele surfing, IBD and IBE can all result in AFCs, and we further dissect these forces in the outlier detection section.

Environmental data
Fourteen environmental variables previously identified as most relevant to the niche divergence between P. densata and its parental species  and six global UV radiation values were extracted for each of the sampling locations (Supporting Information Table S1). After evaluating the Spearman pairwise correlations among these variables, 11 variables with correlation coefficients (q) ≤ |0.75| across the range of P. densata were retained: annual mean air temperature (bio1), isothermality (bio3), air temperature seasonality (bio4), annual precipitation (bio12), precipitation of the driest month (bio14), precipitation seasonality (bio15), ground-frost frequency (FRS), growing degree days (GDD), soil organic carbon (SC), wet-day frequency (WET, i.e. days having ≥ 0.1 mm of precipitation) and annual mean UV-B (UVB1). Additionally, we also included altitude as we expect it may be important to an alpine species. To evaluate the divergence in environmental conditions across the species' range, we extracted 115 occurrence sites and their corresponding environmental variables from . All environmental layers were converted to the same resolution at a grid cell size of 30 arc-seconds (c. 1 km 2 ).

Detection of outlier loci
We employed two conceptually different methods, PCADAPT (Luu et al., 2017) and BAYENV2.0 (G€ unther & Coop, 2013), to detect genome-wide signatures of local adaptation because they show promise for accurately identifying outliers under a variety of complex demographic scenarios (Lotterhos & Whitlock, 2014Luu et al., 2017). PCADAPT identifies population structure using PCA and calculates the correlation between genetic variants and significant PCs. The Mahalanobis distance is computed for each SNP and the scores that do not follow the distribution of the bulk of distance points are considered outliers (Luu et al., 2017). Cattell's graphical rule was used to choose the number of principal components, and outliers were selected at a false discovery rate (FDR) of 0.05 using the R package QVALUE (Storey et al., 2015). BAYENV2.0 uses a set of neutral loci to estimate the empirical pattern of covariance in allele frequencies between populations. This estimate is then employed as a null model to test the correlation between individual SNPs and environmental variables (G€ unther & Coop, 2013). We generated a set of putatively neutral and independent SNPs by performing LD pruning on the four-fold degenerate sites using --indep-pairwise (50 5 0.05) in PLINK 1.9 (https://www.cog-genomics.org/plink/1.9/). This procedure calculates pairwise LD (r 2 ) within overlapping window of 50 SNPs with a step size of 5 and removes one of a pair of SNPs if r 2 > 0.05. The remaining four-fold SNPs (1906) were then used to estimate the covariance matrix with 100 000 iterations. Covariance matrices were compared after three independent runs with different seed numbers to ensure convergence. Correlations with the 11 environmental variables described earlier and altitude were assessed by averaging five independent runs, each with 100 000 Markov chain Monte Carlo iterations. SNPs were considered outliers if they were in the top 1% of the Bayes factor (BF) values, with BF > 10 and in the top 10% of the absolute values of Spearman's q.

Inferring IBD and IBE
We conducted redundancy analyses (RDAs) to estimate the relative importance of environmental and geographic distance to population genomic differentiation. RDA involves multiple linear regressions and PCA to assess the relative effect of matrices of independent variables on a matrix of dependent variables. We included Hellinger-transformed MAF for each population created in the R package VEGAN (Oksanen et al., 2016) as the dependent matrix and two independent matrices: the scaled environmental variables (11 environmental variables and altitude, representing IBE) and the distance-based Moran's eigenvector map (6 dbMEMs, representing IBD). Using all SNPs, we performed forward selection with an a value of 0.05 on the geographic and environmental variables separately to avoid overfitting, following the recommendation of Borcard et al. (2011). This resulted in the retention of four dbMEMs (dbMEM1, dbMEM2, dbMEM3 and dbMEM4) and four environmental variables (UVB1, WET, bio1 and bio14) for the following analyses.

New Phytologist
We performed a series of full and partial RDA model tests to differentiate the independent effects of environment and geography by reciprocally constraining one of the two factors. The significance of these models was determined by the 'anova.cca' function of VEGAN, based on a permutation test of 9999 iterations. RDA was performed for different SNP sets ( Table 2).

Prediction of genomic mismatch
To assess genotype-environment correlations and predict the genomic mismatch to future conditions, we performed gradient forest (GF) analysis using the R package GRADIENTFOREST (Ellis et al., 2012). GF models apply a machine-learning algorithm to identify genotype-environment relationships at sampled locations and project the correlations onto unsampled geographic regions or times (Fitzpatrick & Keller, 2015). SNPs polymorphic in fewer than five populations were removed from this analysis to ensure robust regression. The 11 environmental variables characterizing each sample location were included in the GF models to predict the genomic composition of each grid point across the range of P. densata. Each GF model was tested using 500 regression trees per SNP while keeping all the other parameters at default values. The resulting multidimensional genomic patterns were summarized using PCA with the first three PCs assigned to red, green and blue, respectively (Fitzpatrick & Keller, 2015). Similar colors in the space represent similar genetic composition. This allowed us to visualize the differences in allele frequencies (referred to as 'genomic turnover') along environmental gradients across space. To validate that our model explained more variation than expected by chance, we performed 10 GF models with randomized environmental variables following Ruegg et al. (2018) and compared the number of SNPs with positive r 2 and the mean r 2 across these SNPs between models.
To identify the spatial regions where genotype-environment relationships are most likely to be disrupted by climate change, we evaluated the mismatch between current and predicted genomic compositions under future environmental projections using contemporary associations as baseline (Fitzpatrick & Keller, 2015). We evaluated the predicted genomic composition in by 2070 under two representative concentration pathway (RCP) scenarios, RCP 2.6 and RCP 8.5, representing low and high greenhouse gas emission trajectories, respectively. Five of the 11 environmental layers (WET, UVB1, FRS, GDD and SC) were not available for future projection. We left them unchanged and used the projections for the other six variables (bio1, bio3, bio4, bio12, bio14 and bio15) to predict future genomic compositions, assuming a generation time of 50 yr for pine. We calculated the Euclidean distances between the current and future genomic compositions to represent the scale of genetic change needed to match environmental change, with higher values indicating greater vulnerability of the population (Fitzpatrick & Keller, 2015). We visualized the genomic mismatch in geographic space to illustrate the distribution of vulnerable populations.

Data availability
All sequencing data are archived in the NCBI SRA database under BioProject accession number PRJNA492187.

Genomic diversity and population structure
Exome capture sequencing on the 208 trees generated 1.25 billion paired-end reads with an average of 6.00 million reads per sample (Table S2). Alignment of the sequence reads to the reduced reference genome (55 300 scaffolds) yielded an average of 15.10 Mbp of genomic sequence covered by at least five reads per individual (Table S2). Exome probe capture rates ranged from 77.11% to 89.24% (Table S2). We retained 48 443 highquality SNPs after stringent quality control, of which 53.49% Table 2 Redundancy analyses (RDAs) that partition sources of genetic differentiation among populations in Pinus densata into geography, environment and their combined effects.  were annotated to 2621 'high confidence' and 3755 'low confidence' genes. LD among SNPs decayed by half, from r 2 = 0.46 to 0.23 within 2 kb (Fig. S1). This pattern is similar to that reported for other pine species (Brown et al., 2004;Neale & Savolainen, 2004;Eckert et al., 2009). Using all 48 443 SNPs, FASTSTRUCTURE determined K = 5 to be an optimal number of clusters to explain the genetic structure in the 208 individuals. Of the five clusters, one was unique to P. yunnanensis, and the other four clusters split P. densata into distinct geographic zones: east (E), central (C), southwest (SW) and west (W) (Fig. 1a, b; Table 1). The eastern range of P. densata was predominated by the P. tabuliformis ancestry. These populations, however, have a mtDNA composition and morphometric traits that distinguish them from P. tabuliformis. The nuclear DNA pattern was concordant with the chloroplast DNA (paternally inherited in pines) result reported earlier , thus reflecting pollen-mediated introgression in this region. Increasing K to higher values resulted in no further population substructuring (data not shown). PCA yielded a similar grouping as FASTSTRUCTURE and the first three eigenvalues significantly (Tracy-Widom test, P < 0.001) explained 28.6% of the total genetic variance (19.9%, 5.2% and 3.5% for PC1, PC2 and PC3, respectively; Fig. 1c).
Excluding reference populations of P. tabuliformis and P. yunnanensis yielded 47 612 SNPs in the 23 populations of P. densata (180 trees). Genome-wide heterozygosity for each population ranged from 0.193 to 0.297 (Table 1) and showed a strong negative correlation with distance to the easternmost population, supporting this region as the ancient hybrid zone (r = À0.85, P << 0.001; Fig. 2a). Nucleotide diversity at all sites (p) and 0-fold (p 0 ) and four-fold (p 4 ) degenerate sites showed a similar pattern across this east-west axis ( Fig. 2b; Table 1). The variation in p 0 and p 4 followed the same trend, but the values of p 4 decreased much faster than p 0 (Fig. 2b), resulting in an increasing p 0 /p 4 ratio from east to west (r = 0.90, P << 0.001; Table 1; Fig. 2c).

Distribution of fitness effects of nonsynonymous mutations
We quantified the fitness effects of nonsynonymous mutations in P. densata using whole-exome sequences. The inferred DFE for all groups indicated that 29.7-41.0% of new mutations would be strongly deleterious (ÀN e s > 100; Fig. 2d), 34.1-40.1% weakly deleterious (ÀN e s < 1), and 18.9-36.5% moderately deleterious (1 < ÀN e s < 100). The DFE was significantly different among groups, with the SW and W groups having a larger proportion of highly deleterious mutations while the E and C groups had more moderately deleterious sites (Fig. 2d).

Detecting AFCs and putative adaptive loci
Allele surfing during range expansion can generate clinal variation that resembles IBE. To separate the confounding effect of clines caused by dispersal-demographic effects from local adaptation, we performed a spatial sliding-window analysis along the eastwest expansion axis to identify alleles showing significant clinal variation. The spatial sliding-window analysis identified 38.9% (18 539) of the 47 612 SNPs showing significant longitudinal clines that cannot be explained by chance. The decline of Het at clinal SNPs from east to west was sharp (r = À0.89, P << 0.001; Fig. 2a). Mean population differentiation (F ST ) at these clinal SNPs was 0.288, which was much higher than the F ST of 0.191 over all 47 612 SNPs (Fig. 1d).
Using all 47 612 input SNPs in P. densata, PCADAPT identified 3653 outliers at a FDR of 0.05. BAYENV identified 2704 SNPs with significant correlations with one or more environmental variables. Among these outliers, 55.7% of PCADAPT and 31.4% of BAYENV showed significant AFCs. With the aim of minimizing false positives as a result of allele surfing, all AFC SNPs were first removed from the outliers, leaving 1620 and 1855 outliers for PCADAPT and BAYENV, respectively. This rigorous filtering conversely elevates the risk of false negatives. To help mitigate this effect, we considered outliers that are common to PCADAPT and BAYENV yet that demonstrate AFCs (170 SNPs) to more likely reflect IBE than demographic-dispersal effects and included them back into both the PCADAPT and BAYENV outlier bins (Fig. S2). This resulted in a set of 1790 and 2025 SNPs for PCADAPT and BAYENV, respectively ( Table 2).
The average genetic differentiation (F ST ) among the P. densata groups at the 1790 PCADAPT outliers was 0.411, while the value for the 2025 BAYENV outliers was 0.215. A closer examination of the 2025 BAYENV outliers showed that 1154 were significantly associated with temperature, 778 with water availability, 132 with soil organic carbon (SC), 317 with UVB and 294 with altitude (Table S3). Annotation of the 2025 outliers identified 703 candidate genes, of which 372 can be functionally classified into gene ontology (GO) terms. Macromolecule transport and metabolic processes were significantly enriched among these terms (FDR < 0.05; Table S4).

Quantifying IBD and IBE
We evaluated the effect of IBD and IBE in shaping genomic variation among populations using redundancy analysis (RDA). All the RDA models performed using the retained environmental (four) and geographic (four) variables after forward selection were highly significant (P ≤ 0.001). We found that the exclusive contribution of IBD while controlling for environment (partial model F~geog. | env. in Table 2) explained 9.8-18.6% of the variation in allele frequencies among populations in different SNP sets. When controlling for IBD, the exclusive contribution of IBE (partial model F~env. | geog.; Table 2) explained 6.6% of the variation in the full set of SNPs and 4.8% for the AFC SNPs. Removing AFC SNPs from the PCADAP and BAYENV outliers enlarged the exclusive contribution of IBE to 9.5% and 19.5% for PCADAPT and BAYENV outliers from 8.0% and 16.2%, respectively. Including the 170 AFC SNPs identified as outliers by both PCADAP and BAYENV did not alter the contribution of IBE in the PCADAP outliers (1790 SNPs Table 2), of which a large proportion was s a result of their joint effect ('Total confounded' in Table 2). This effect was most pronounced at the AFC SNPs.
Spatial distribution of gene-environmental associations and future genomic mismatch Spatial mapping of genotype-environment relationships using GF models revealed genomic turnover between the east, central-southwest and west regions (Fig. 3a, b). The western range of the P. densata distribution showed a distinct class of genotype-environment associations, consistent with the more extreme conditions of the region (Fig. 3c). Wet-day frequency (WET), annual mean UV-B (UVB1), annual mean air temperature (bio1) and air temperature seasonality (bio4) were most strongly correlated with the observed genomic variation (Fig. S3). PCA on these variables showed clear niche divergence across the distribution range of P. densata (Fig. 3c). Permutation tests showed that the GF models explained

Geographical distance to group E (km) Average heterozygosity per locus AFC SNPs
All SNPs r = -0.89, P = 1.9e−08 r = -0.85, P = 2.9e−07 Fig. 2 Plot of nucleotide diversity vs geographic distance from the suggested ancient hybrid zone for average heterozygosity per locus (a), mean pairwise distance across all sites (p), 0-fold degenerate sites (p 0 ) and four-fold degenerate sites (p 4 ) (b), and the ratio of mean p 0 to mean p 4 (p 0 /p 4 ) (c); the Pearson correlation coefficient and the corresponding significance are shown. (d) Estimates of purifying selection at 0-fold degenerate sites in four Pinus densata (Pd) groups. Error bars represent 95% bootstrap confidence intervals. Different letters (a, b, c and d) above each bar indicate significant differences (a = 0.05) among groups based on Kruskal-Wallis multiple-range tests.
We simulated the genomic change needed to track predicted climate change by the year 2070 under RCP 2.6 and RCP 8.5. To compare the two scenarios, we calculated the proportion of the distribution range having a genomic mismatch > 50% of the maximum detected value (0.048 in this study; Fig. 4). Based on all SNPs, 16.5% and 17.2% of the distribution space were recognized as above this threshold of vulnerability under RCP 2.6 and RCP 8.5, respectively. The corresponding values for the 2025 BAYENV outliers were 8.2% and 38.5%, illustrating that adaptive variation is more susceptible to strong climate change. GF modelling predicted the western portion of the distribution was most vulnerable under both climate scenarios, indicating where climate-induced selective pressure will be the highest (Fig. 4). Note that we kept five of the 11 environmental variables static because of the lack of future prediction data, which may have resulted in an underestimation of genomic mismatch in P. densata.

Discussion
As the dominant forest-forming species in the southeastern QTP, the resilience of P. densata underlies regional ecosystem structure and function. While the colonization of large and variable areas suggests successful adaptation to high plateau environments Zhao et al., 2014), the spatial autocorrelation of environmental gradients with expansion routes makes identifying genotype-environment relationships challenging. Here, we controlled explicitly for genetic differentiation generated by founder effects, geographic isolation and local adaptation to predict population viability under ongoing climate change.

Range expansion produces AFCs and structures genetic diversity
Previous studies established that P. densata colonized the plateau by stepwise westward migration from the ancient hybrid zone located in the eastern margin of its current distribution Gao et al., 2012). In this study, we detected a continuous loss of heterozygosity and nucleotide diversity in exome sequences across the distribution from east to west, reinforcing the previous inferences of serial bottlenecks along the expansion axis. Wind-pollinated conifers are generally characterized by low population differentiation at nuclear markers, even across large geographical distances (Alberto et al., 2013). The more widely distributed and montane parental species P. tabuliformis and P. yunnanesis accord well with this expectation, with estimates of F ST ranging from 0.023 to 0.086 (Ma et al., 2006;Wang et al., 2011;Gao et al., 2012;Xia et al., 2018). By contrast, P. densata possesses substantial spatial genetic structure, similar to other plants inhabiting the QTP (Geng et al., 2009;Qiu et al., 2011;Wen et al., 2014). The four genetic groups identified within P. densata correspond to four distinct geographic regions and the genome-wide F ST among them was 0.191, presumably a result of dispersal limitations across the dramatic and complex topography of the QTP.
Range expansion can lead to a loss of heterozygosity, allele clines, and genetic differentiation along the axis of a range expansion (Currat & Excoffier, 2005;Klopfstein et al., 2006;Excoffier & Ray, 2008;Excoffier et al., 2009). However, empirical support remains limited owing to the difficulty in detecting range expansion and AFCs in natural systems. We applied a spatially explicit method (Currat & Excoffier, 2005;Pereira et al., 2018) to Grouping of the sites into E, C, SW and W follows the map in Fig. 1(a).

Research
New Phytologist identify AFCs. We found as many as 38.9% of the 47 612 SNPs showed clinal variation that could be the result of allele surfing and/or clinal adaptation. Consistent with theoretical expectations, the decline of heterozygosity from east to west at AFC SNPs was sharp, and population differentiation at these SNPs was 51.0% higher than the genome average. This suggests that during range expansion in P. densata, genetic drift from serial founder events drove a considerable number of alleles to very high or low frequencies and consecutively reduced heterozygosity, with the western populations harboring the lowest diversity.
Redundancy analyses determined IBD can explain 13.2% of the variation at all SNPs (Table 2) and 9.8% at the AFC SNPs. Because the range expansion in P. densata occurred east-to-west, we expect the AFC SNPs to overlap with signal of IBD, although allele surfing can also produce clines that show no relationship to geographic distances under some circumstances. Our analyses illustrate that while P. densata was historically able to exploit an unoccupied niche on the high plateau, colonization has left a strong genetic legacy of reduced diversity and population connectivity that may leave it vulnerable to future change.

Local adaptation or allele surfing?
Simulations show that dispersal-demographic effects during range expansion produce strong AFCs and even discrete sectors of genetic homogeneity. These processes can promote genetic differentiation along the axis of a range expansion, which can mimic the signature of local adaptation (Currat & Excoffier, 2005;Klopfstein et al., 2006;Excoffier & Ray, 2008;Excoffier et al., 2009). Even though the methods we used for outlier detection (PACDAPT and BAYENV) correct for neutral population structure (G€ unther & Coop, 2013;Luu et al., 2017), detection of selection in structured populations can be difficult (Wright & Gaut, 2005;Lotterhos & Whitlock, 2014. We found that as many as 55.7% of outlier loci may be the result of allele surfing, although the two detection methods used here differed substantially in their overlap with AFCs (Fig. S2). Notably, outliers identified with respect to population structure using PCADAPT were more enriched for AFCs than those identified based on allele-environment correlations in BAYENV (55.7% vs 31.4%), which lends support to the ability of our methodology to remove outliers generated by drift.
Simultaneously minimizing false positives and negatives is virtually impossible in population genomic studies in natural systems. Conservatively, we can regard outliers overlapping with AFCs as false positives, and thus their association with environments as resulting from dispersal-demographic effects rather than selective pressures. However, among the 11 studied environmental variables and altitude, five were correlated significantly with longitude (Table S3). Additionally, simulations indicate that beneficial alleles on the expansion front have a higher probability to establish and spread (Hallatschek & Nelson, 2010;Peischl et al., 2015Peischl et al., , 2016. Discarding all outliers that show AFCs will remove true adaptive alleles mapped to any of these variables and will thus underestimate local adaptation. To achieve a pragmatic trade-off between power and error rates, we created an additional  outlier set that included those identified by both PCADAPT and BAYENV but show AFC. Common outliers detected by different methods yield conservative error rates (de Villemereuil et al., 2014), and their clinal behavior should be more likely as a result of clinal selection. Overall, 3.8-4.3% of the sampled loci can be inferred as being affected by selection because they were either nonAFC outlier loci identified by one of the methods or because they were AFC loci detected by both methods as outliers.
Although false positives and negatives almost certainly exist in these outlier designations, our analyses provide a rare empirical evaluation of the magnitude of allele surfing during range expansion as well as its bearing on inferences of adaptive evolution. Further dissection using an RDA strategy suggested that a significant portion (9.2-19.5%) of the variation in the outliers can be explained exclusively by IBE. Reduced gene flow across the genome as a result of environmental conditions provides a convincing line of evidence for local adaptation (Nosil et al., 2009;Orsini et al., 2013). Although outlier SNPs comprise a small proportion of the genome, selection affecting multiple loci can cause genome-wide reductions in gene flow, even for loci unlinked to those directly under selection, resulting in a signal of IBE across the whole genome (Table 2). Furthermore, spatially divergent selection against nonadapted migrants can allow mutations to establish even if their effect size is relatively small (Nosil et al., 2005;Feder et al., 2012a,b;Wang & Bradburd, 2014). The combined evidence of strong selection against immigrants in reciprocal transplant trials (Zhao et al., 2014), the detection of genomewide differentiation attributable exclusively to environmental variables, and the identification of robust outlier associated with water availability, temperature, and UVB radiation (Table S3) suggest the strong signature of IBE in P. densata is the result of local adaptation to distinct habitats.

Efficacy of purifying selection and genomic vulnerability to changing climate
Organisms cope with changing environments through migration to track ecological niches spatially and/or through adaptation to new conditions in situ (Aitken et al., 2008). High amounts of standing genetic variation facilitate faster adaptive evolution than waiting for appropriate mutations to arise, especially because most de novo mutations with fitness consequences are expected to be mildly deleterious. In P. densata, genetic diversity clearly declined along the east-to-west expansion axis, and this decline was coupled with an increasing p 0 /p 4 ratio. The p 0 /p 4 ratio is generally interpreted as a measure of the efficacy of purifying selection, with higher values indicating lower selection efficacy (Chen et al., 2017). Selection operates more effectively in populations with large effective sizes (N e ). The observed increasing trend of p 0 /p 4 from east to west was thus probably a consequence of decreasing N e as a result of consecutive bottlenecks (Gao et al., 2012).
We examined the DFE of nonsynonymous mutations, which is a more direct reflection of the efficacy of selection because the method corrects for demography and population structure. We found purifying selection acting less effectively in the western than in the eastern and central populations, with the western margin showing a higher proportion of weakly and strongly deleterious mutations. Accumulation of highly deleterious mutations increases genetic load, which in turn leads to elevated risk of population decline via mutational meltdown (Lynch et al., 1995). An alternative view is that populations that occupy different environments may experience diverse selective constraints, resulting in different shapes of DFE (Tellier et al., 2011). The difference in the strength of purifying selection found in this study could thus be partially explained by the distinct environments occupied by the P. densata groups (Fig. 3c).
Using GF analyses, we found a clear signal of genomic mismatch, especially for adaptive variation, across the distribution range of P. densata. This mismatch is particularly high in the western populations where differentiation is the highest and genetic variation the lowest. This mismatch estimation is probably conservative as we assumed five environmental layers will be the same in the future as they are now, and some true outliers under selection but were confounded by AFC may have been removed. A strong genomic mismatch indicates that populations may be unable to persist in situ because contemporary genotypes are not sufficiently associated with projected climatic variables. This problem is exacerbated for species with long generation times, such as conifers. Responses to change in P. densata are expected to be further hindered by the strong geographical isolation among regions, which limits the ability of populations to track spatially changing niches through migration. Given the reduction in genetic diversity found in the western populations of P. densata, the low mutation rate in conifers (De La Torre et al., 2017) and the low probability of de novo mutations proving beneficial, persistence in situ under future climates seems unlikely.

Conclusions
Our study shows how range expansion across complex landscapes promotes not only allele surfing but also strong spatial structure through repeated founder effects, geographical isolation and local adaptation. While the deep valleys and high mountain ridges of the QTP have helped to create a global biodiversity hotspot (Zheng, 1996;Qiu et al., 2011;Wen et al., 2014;Xing & Ree, 2017), these same features can constrain adaptive responses to climate change. This should be especially prounced in organisms with limited dispersal, by either selection or physicial barriers to migration. Other species on the QTP show strong genetic structure consistent with limited effective migration (Qiu et al., 2011;Wen et al., 2014), which suggests that the strong signal of genomic mismatch in P. densata may be representative for other plateau endemics. As we accumulate further examples, it will become possible to gain a more general understanding of how demography and landscape factors constrain or promote adaptation to novel and changing environments.
Centre North (HPC2N) and the Uppsala Multidisciplinary Centre for Advanced Computational Science (UPPMAX). This study was supported by grants from the National Natural Science Foundation of China (NSFC 31800550, 31670664) and the Swedish Research Council (VR). The authors declare there are no conflicts of interest.
Author contributions X-RW, J-FM and WZ planned and designed the research. J-FM, WZ and Y-QS conducted the fieldwork. WZ and JP prepared DNA samples. WZ analyzed the data. WZ and X-RW wrote the manuscript draft. WZ, X-RW, ARS and MLA revised the manuscript.

Supporting Information
Additional Supporting Information may be found online in the Supporting Information section at the end of the article.

Fig. S4
Comparison of the number of SNPs with positive R 2 with environments, and the mean R 2 across these SNPs between 10 randomized GF models and our final selected GF models.

Table S1
Environmental parameters used in this study.   Please note: Wiley Blackwell are not responsible for the content or functionality of any Supporting Information supplied by the authors. Any queries (other than missing material) should be directed to the New Phytologist Central Office.
New Phytologist is an electronic (online-only) journal owned by the New Phytologist Trust, a not-for-profit organization dedicated to the promotion of plant science, facilitating projects from symposia to free access for our Tansley reviews and Tansley insights.
Regular papers, Letters, Research reviews, Rapid reports and both Modelling/Theory and Methods papers are encouraged. We are committed to rapid processing, from online submission through to publication 'as ready' via Early View -our average time to decision is <26 days. There are no page or colour charges and a PDF version will be provided for each article.
The journal is available online at Wiley Online Library. Visit www.newphytologist.com to search the articles and register for table of contents email alerts.
If you have any questions, do get in touch with Central Office (np-centraloffice@lancaster.ac.uk) or, if it is more convenient, our USA Office (np-usaoffice@lancaster.ac.uk) For submission instructions, subscription and all the latest information visit www.newphytologist.com Ó 2020 The Authors New Phytologist Ó 2020 New Phytologist Trust New Phytologist (2020) 228: 330-343 www.newphytologist.com