Volume 242, Issue 2 p. 774-785
Full paper
Open Access

C4 photosynthesis provided an immediate demographic advantage to populations of the grass Alloteropsis semialata

Graciela Sotelo

Corresponding Author

Graciela Sotelo

Universidade de Vigo, Departamento de Ecoloxía e Bioloxía Animal, 36310 Vigo, Spain

Author for correspondence:

Graciela Sotelo

Email:[email protected], [email protected]

Search for more papers by this author
Sara Gamboa

Sara Gamboa

Universidade de Vigo, Departamento de Ecoloxía e Bioloxía Animal, 36310 Vigo, Spain

Universidad Complutense de Madrid, 28040 Madrid, Spain

Search for more papers by this author
Luke T. Dunning

Luke T. Dunning

Ecology and Evolutionary Biology, School of Biosciences, University of Sheffield, S10 2TN Sheffield, UK

Search for more papers by this author
Pascal-Antoine Christin

Pascal-Antoine Christin

Ecology and Evolutionary Biology, School of Biosciences, University of Sheffield, S10 2TN Sheffield, UK

Search for more papers by this author
Sara Varela

Sara Varela

Universidade de Vigo, Departamento de Ecoloxía e Bioloxía Animal, 36310 Vigo, Spain

Search for more papers by this author
First published: 22 February 2024
Citations: 2

Summary

  • C4 photosynthesis is a key innovation in land plant evolution, but its immediate effects on population demography are unclear. We explore the early impact of the C4 trait on the trajectories of C4 and non-C4 populations of the grass Alloteropsis semialata.
  • We combine niche models projected into paleoclimate layers for the last 5 million years with demographic models based on genomic data.
  • The initial split between C4 and non-C4 populations was followed by a larger expansion of the ancestral C4 population, and further diversification led to the unparalleled expansion of descendant C4 populations. Overall, C4 populations spread over three continents and achieved the highest population growth, in agreement with a broader climatic niche that rendered a large potential range over time. The C4 populations that remained in the region of origin, however, experienced lower population growth, rather consistent with local geographic constraints. Moreover, the posterior transfer of some C4-related characters to non-C4 counterparts might have facilitated the recent expansion of non-C4 populations in the region of origin.
  • Altogether, our findings support that C4 photosynthesis provided an immediate demographic advantage to A. semialata populations, but its effect might be masked by geographic contingencies.

Introduction

Understanding how life copes with recurrent environmental shifts and how certain traits can confer key advantages for population success and/or spread into new areas are primary questions in evolutionary biology. In the long-term, climatic changes have been linked to macroevolutionary patterns, including taxonomic and biogeographic turnovers (e.g. Hewitt, 2004; Svenning et al., 2015). At finer temporal and spatial scales, variation in temperature and precipitation regimes can alter the demographic dynamics of local populations and modify the adaptive value of specific traits (e.g. Parmesan & Yohe, 2003; Urban et al., 2016). Nonetheless, addressing the immediate impact of the rise of a key trait on the fate of the populations is still a challenging topic (e.g. Miller et al., 2023). First, major key traits evolved deep in the past, making it difficult to disentangle the effect of the emergence of the trait from that of further modifications accumulated through time. Second, fundamental traits rarely remain polymorphic at the intraspecific level, hindering the application of population approaches to capture the trait transition.

A remarkable exception is the evolution of C4 photosynthesis in the grass Alloteropsis semialata. C4 photosynthesis is a change in primary metabolism that represents a fundamental innovation not only in the evolution of land plants but of life on Earth (Sage, 2004), and A. semialata is the only known species where the trait is not fixed, encompassing C3, C4, and intermediate (type II C3–C4 intermediate sensu Edwards & Ku, 1987; hereafter C3 + C4 sensu Dunning et al., 2017) populations (Pereira et al., 2023). C4 plants have enhanced light-, water-, and nitrogen-use efficiency, outperforming the C3 in conditions that reduce CO2 availability in the leaf (Ehleringer et al., 1997; Sage, 2004; Zhou et al., 2018). The first C4 plants appeared c. 30 million years ago (Ma) matching a global drop in atmospheric CO2 (Christin et al., 2008; Vicentini et al., 2008), but C4-dominated ecosystems expanded much later, c. 8–3 Ma, becoming prevalent in tropical and subtropical areas (Edwards et al., 2010). C4 photosynthesis convergently evolved many times in different lineages (Sinha & Kellogg, 1996; Sage et al., 2011), and each lineage followed a particular trajectory depending on its specific evolutionary background and environmental context (Kadereit et al., 2012; Spriggs et al., 2014; Dunning et al., 2017; Heyduk et al., 2019; Bianconi et al., 2020a,b). In the long-term, C4 photosynthesis generally worked as a niche expander since most C4 plants can tolerate a broader set of climatic conditions than non-C4 plants (Aagesen et al., 2016; Watcharamongkol et al., 2018). However, whether/how the acquisition of C4 photosynthesis was beneficial to populations in the first place remains unclear, and A. semialata offers an ideal system for tackling this question.

The photosynthetic types of A. semialata correspond to distinct genetic lineages that probably evolved during the Plio-Pleistocene (Lundgren et al., 2015; Bianconi et al., 2020a) from a common ancestor that used a weak C4 cycle (Dunning et al., 2017; Fig. 1). The diversification of the system has been studied attending to anatomy, physiology, ecology, and genomics (reviewed in Pereira et al., 2023). Nonetheless, the demographic history of these lineages has not been explored yet. Understanding how their effective population sizes fluctuated through time and to what extent these fluctuations correlate with the C4–non-C4 split and with past climatic changes could provide direct insights into the role of the photosynthetic innovation in the relative success of the A. semialata lineages. Accessing that information is the main goal of this study.

Details are in the caption following the image
Hypothesis of photosynthetic diversification in Alloteropsis semialata (following Dunning et al., 2017) along the proposed nuclear phylogeny of the species (Bianconi et al., 2020a). The most recent common ancestor likely used a weak C4 cycle (similar to a C3 + C4 state, in dark grey), although different from that of the current intermediate lineage (C3 + C4, in light grey). The full C4 cycle (in orange) was acquired after the initial split between non-C4 and C4 lineages (filled asterisk). A reversal to the ancestral C3 cycle (in blue) occurred after the split between non-C4 lineages (open asterisk). The tips of the tree represent the four main nuclear lineages currently recognized (I, II, III, and IV), with corresponding photosynthetic types indicated above.

The initial divergence between A. semialata C4 and non-C4 types most likely happened in the Central Zambezian miombo woodlands of Africa, where the species originated c. 3 Ma (Lundgren et al., 2015; Bianconi et al., 2020a). The C3 lineage (clade I) later migrated to Southern Africa and a single C4 lineage (clade IV) spread across Africa, Madagascar, Southeast Asia, and Oceania. The Central Zambezian region remained occupied by another C4 lineage (clade III) and by C3 + C4 populations (clade II). The lineages evolved largely in isolation, but repeated episodes of genetic exchange might have contributed to the expansion of the different photosynthetic types (Lundgren et al., 2015; Olofsson et al., 2016, 2021; Bianconi et al., 2020a). Currently, C4 plants overlap with C3 plants in Southern Africa and with C3 + C4 ones in the Central Zambezian region, but when they appear mixed (growing close to each other), the C4 are polyploids and the non-C4 are diploids, and this ploidy difference probably prevents gene flow between them (Olofsson et al., 2021).

Taking advantage of the uniqueness of this system, here we integrate demographic modelling with ecological niche modelling projected into paleoclimatic layers for the last 5 million years (Myr) aiming to test the following hypotheses: (1) major changes in effective population size occurred immediately after the divergence between C4 and non-C4 lineages; (2) C4 lineages experienced larger increases in effective population size than non-C4 lineages; (3) these larger demographic increases were due to a wider C4 climatic niche resulting in a broader potential geographic range. All in all, this would tell us how the effective population size of the A. semialata lineages fluctuated through time and whether the C4 trait acquisition was associated with immediate demographic advantages under changing climatic conditions.

Materials and Methods

Population sampling and data collection

Based on previous studies (Bianconi et al., 2020a; Olofsson et al., 2021), we used four well-defined groups of A. semialata (R. Br.) Hitchc. samples that represent the three photosynthetic types and the four main nuclear lineages described for the species. These samples are a subset of those studied by Olofsson et al. (2019, 2021). Specifically, we included C3 samples from Zimbabwe and South Africa assigned to clade I, C3 + C4 samples from Zambia and Tanzania assigned to clade II, C4 samples from Zambia and Tanzania assigned to clade IIIa, and C4 samples from Australia that belong to clade IV. We excluded polyploids, such as C4 samples from clade IIIb (which together with IIIa form clade III), given that polyploidy has been shown to affect population success in different ways (e.g. Monnahan et al., 2019; Padilla-García et al., 2023) and might confound our results. Briefly, these samples were collected during field trips between 2012 and 2019. Populations were spotted while driving or through stop-and-walk searches. GPS coordinates were recorded at each sampling site, and individuals were genotyped using a double-digested restriction-associated DNA sequencing (ddRADseq) approach, as described previously (Olofsson et al., 2019, 2021). Both the genomic data and geographic coordinates used in this work were retrieved from the previous studies, accessible via the NCBI SRA projects PRJNA560360 and PRJNA649872. This dataset consisted of 475 individuals from 100 localities (Fig. 2; Supporting Information Table S1). For ecological niche modelling, we added 35 individuals that provide a more comprehensive coverage of the current geographic distribution of A. semialata (Fig. S1; Table S1). They were previously confirmed as diploids and assigned to photosynthetic type and nuclear lineage (Bianconi et al., 2020a; Olofsson et al., 2021).

Details are in the caption following the image
Distribution of Alloteropsis semialata samples included in this study. Symbols correspond to photosynthetic types and colours, to clades. For the Australian samples, codes indicate geographic regions: WA, Western Australia; NT, Northern Territory; FNQ, Far North Queensland; and SEQ, South East Queensland. The Central Zambezian region, where clades II and IIIa overlap, is shown in more detail in the inset panel. DRC, Democratic Republic of Congo. See Supporting Information Fig. S1 for additional samples used for niche analyses.

Ecological niche modelling

Niche envelope and species distribution model

We used mean annual temperature and total annual precipitation as predictive variables for niche modelling. They have been identified as fundamental drivers of C4 evolution (Edwards & Smith, 2010; Christin & Osborne, 2014), and the predictions on annual trends for past climatic scenarios are more robust to the underlying climatic model than seasonal or extreme trends (Varela et al., 2015). We obtained preindustrial values for these variables from the high-resolution climate emulator Paleo-Pgem (Holden et al., 2019), as illustrative of present-day conditions. We estimated the environmental niche of each clade using the classic ‘climate-envelope model’ Bioclim (Busby, 1991) implemented in the R package dismo v.1.3-9 (Hijmans et al., 2022). Bioclim employs presence-only data to define a multidimensional environmental space where species occur (Busby, 1991; Varela et al., 2014). The resulting environmental space is represented as a box defined by the minimum and maximum values of all variables across the localities where species were sampled or reported. To mitigate the impact of outliers, we only considered those values within the 5th to 95th percentile range. The potential geographic range of each clade was then calculated as those areas where the values of temperature and precipitation fell within the estimated climatic requirements for the clade.

Niche overlap

We quantified pairwise niche overlap using Schoener's D statistic (Schoener, 1968; Warren et al., 2008) as implemented in the R package enmtools v.1.0.5 (Warren et al., 2021). The D statistic ranges from 0 (no overlap) to 1 (full overlap). To assess the significance of the overlap, we applied equivalency and similarity tests under the null hypothesis that each pair of clades occupies the same environmental space. The tests were performed with 1000 permutations and a confidence level of 0.05. For the equivalency test, the null distribution was generated by pooling the occurrences (i.e. localities) of the corresponding pair of clades, randomizing these occurrences to get two new sets with the same number of observations as the originals, and calculating D for each permutation. The similarity tests were conducted in an analogous way but pooling the backgrounds of the clades instead of the occurrences, to account for the environmental conditions that are geographically available for the clades. The backgrounds were built by adding a buffering circle of 200 km radius around each occurrence. For both tests, rejecting the null hypothesis indicated that niches overlap less than expected by chance.

Suitable areas through time

We mapped the areas with suitable climatic conditions for each clade through time under the assumption of niche conservatism. We used Paleo-Pgem to extract monthly data of temperature (°C) and precipitation (mm) for the last 5 Myr with a spatial resolution of 0.5° and a temporal resolution of 1000 yr, and then, we used the R package landscapemetrics v.1.5.4 (Hesselbarth et al., 2019) to sum the suitable areas at each 1000-yr layer under the equal-area projection ‘Equal Earth’ (Šavrič et al., 2019). We calculated these areas (1) at the global scale, excluding the Americas where A. semialata has not been reported, (2) considering just Africa for every clade, and (3) considering just Australia for clade IV. Additionally, we estimated the areas that remained suitable for each clade over time, which could act as refugia. We assessed the extent of suitable areas through time and the number and/or connectivity of stable areas as potential predictors of the demographic success of the clades.

Demographic modelling

For demographic inference, we applied two composite likelihood methods based on the coalescent that use the site frequency spectrum (SFS) derived from genomic data: Stairway Plot v.2.1.1 (Liu & Fu, 2015, 2020), which estimates effective population size (Ne) trajectories for single populations without a predefined model, and fastsimcoal2 v.2.709 (Excoffier et al., 2013, 2021), which can evaluate more complex models specified by the user for one or more populations.

These methods assume that the genomic sites analysed (single nucleotide polymorphisms (SNPs)) are independent and evolve under neutrality. In this way, the demographic parameters can be translated into absolute units if a mutation rate and a generation time are provided and if the total length of the sequence analysed is known. Here, we used a mutation rate of 1 × 10−8 per site per generation based on available estimates for other grasses (maize and rice; Clark et al., 2005; Jiao et al., 2012; Yang et al., 2015, 2017) and a tentative generation time of 5 yr based on glasshouse observations, while the sequence length was estimated as detailed below.

Another assumption of these methods is the absence of sub-structure in the data. The A. semialata clades behave as distinct genomic groups but not as panmictic units, since the populations within each clade (localities hereafter) show a strong pattern of isolation by distance (Olofsson et al., 2021). This means that each clade better adjusts to a metapopulation, where localities exchange migrants depending on their geographic distance. Such structure can lead to spurious signatures of Ne change through time (Städler et al., 2009; Mazet et al., 2016; Maisano Delser et al., 2019; Lesturgie et al., 2022). Thus, we first assessed the Ne trajectories for each clade as a deme and for each locality as a smaller (near-)panmictic unit to verify that the results were consistent and similar at both scales. Then, we considered the Ne changes inferred for each clade to define models of their joint demographic history, to minimize potential biases due to unaccounted Ne changes in ancestral/descendant populations (Momigliano et al., 2021).

Site frequency spectrum estimation

We mapped clean ddRADseq reads for each individual to the reference genome of A. semialata (ASEM_AUS1_v1.0; GenBank accession QPGU01000000; Dunning et al., 2019b) using Bowtie2 v.2.4.4 (Langmead & Salzberg, 2012) with the default setting for paired-end reads. We retained properly paired reads with a unique mapping, and sorted and indexed them with SAMtools v.1.10 (Li et al., 2009).

To perform the analyses at the clade level, we derived folded SFSs from genotype calls. We generated a variant call format (VCF) file for each dataset (i.e. each single clade, pairs of clades, and the four clades together) as follows. We called sites with minimum mapping and quality scores of 20 using the mpileup and call functions from BCFtools v.1.10.2 (Danecek et al., 2021), and genotypes from biallelic SNPs aligning to the nuclear genome with VCFtools v.0.1.16 (Danecek et al., 2011). We treated a genotype as missing if its depth of coverage was below 7 in the corresponding individual (following recommendations as in Peterson et al., 2012), and we discarded sites with > 70% missing data across individuals. We further discarded individual genotype calls with abnormally low or high coverage (< 0.5× or > 2× mean coverage for that individual), sites with heterozygosity excess across individuals, and individuals with > 70% missing data. We built a folded SFS from each VCF file with easySFS (https://github.com/isaacovercast/easySFS), projecting down the data to a smaller sample size to get rid of remaining missing values. We set the sample size per clade to 30 individuals (60 genomes), except when including the four clades where it was limited to 10 individuals (20 genomes) to not exceed the maximum SFS size that fastsimcoal2 can handle.

We calculated the total sequence length as the sum of monomorphic and polymorphic (biallelic) sites in each SFS. To get the number of monomorphic sites in the data, we called all sites (not only variants) with BCFtools and filtered them with VCFtools by setting the minimum and maximum number of alleles to 1. We then adjusted the number of monomorphic sites in the SFS based on the proportion of monomorphic to polymorphic sites that passed the first filters on coverage (≥ 7) and missingness (< 70%), and the proportion of polymorphic sites in the original VCF file that were finally included in the SFS file.

To perform the analyses at the locality level, we estimated a folded SFS for each locality with at least four sampled individuals, from genotype likelihoods as implemented in Angsd v.0.938 (Korneliussen et al., 2014). The filtering strategy was analogous to that used for clades: We only considered sites mapping to the nuclear genome, with mapping and quality scores over 20, with no more than two alleles, and present in at least 70% of the samples. We evaluated the depth of coverage per site as the total sequencing depth across individuals per locality, setting minimum, and maximum values as half and twice the sum of mean coverage per individual. We also discarded sites with heterozygosity excess. Because the monomorphic sites were retained throughout the pipeline (no threshold was specified for ‘-SNP_pval’ tag), the number of sites defining the total sequence length was already included in the SFS file (without further adjustment).

One-population models

To get an overview of Ne trajectories in each C4 and non-C4 clade/locality, we applied the multi-epoch model implemented in Stairway Plot 2, with default options: including singletons, using four numbers of random breakpoints, two thirds of the sites as training set and 200 replicate runs. The observed trajectories were contrasted with explicit models in fastsimcoal2 (Methods S1, S2; Fig. S2a,b).

Four-population models

To infer the joint demographic history of the four A. semialata clades, we defined four-population models in fastsimcoal2. Based on the phylogenetic relationships inferred from the nuclear genomes (Bianconi et al., 2020a), the models consisted of a first split of the most recent common ancestor of the four clades into a non-C4 ancestral population and a C4 ancestral population, which later split into clades I and II and clades IIIa and IV (respectively) at independent time points (Fig. S2c). Attending to results from one- and two-population models (Methods S1, S2), we included a sudden Ne change in each clade after the most recent splits. The models differed in the set of migration rates specified: The base model included migration between the non-C4 and C4 ancestral populations (IM1); next, we added migration between the Central Zambezian clades II and IIIa (IM2), then between the non-C4 clades I and II (IM3), and last between the C4 clades IIIa and IV (IM4). A fifth model included a sudden Ne change in non-C4 and C4 ancestral populations on top of the latter settings (IM4ac). All migration rates were specified as constant and allowed to be asymmetric. Each model was run 100 times, with 40 optimization cycles and 100 000 coalescent simulations per cycle. The models were compared with the Akaike information criterion, considering the run with the highest likelihood in each case.

To get confidence intervals for the parameter estimates under the best model, we used a nonparametric block-bootstrap approach. We divided the original dataset (i.e. VCF for the four clades together) into blocks of 1000 SNPs and generated 20 bootstrapped datasets by sampling with replacement 85 blocks to match the size of the original dataset (85 398 SNPs). For each bootstrap replicate, we obtained the SFS and run the demographic model as for the original dataset, and the estimates from the best run were used to calculate maximum and minimum values for each parameter.

Results

Ecological niche models

The environmental niche of the C4 clade IV was found to be the broadest, nearly comprising the niches of all other clades except for C3 clade I, while the niche of the C4 clade IIIa appeared as the most restricted (Fig. S3a). All pairwise tests supported the nonequivalency of the niches, indicating that niches overlap less than expected by chance, although their backgrounds do not differ significantly (Table S2). The extensive niche of clade IV was reflected in the extent and distribution of predicted suitable areas for present-day conditions (Figs 3, S4): The predicted range largely resembles the putative distribution of A. semialata across Africa, Madagascar, Southeast Asia, and Oceania based on currently available records (Lundgren et al., 2015, 2016). It also approaches the distribution of the C4 type alone if polyploids were considered, as those reported in South Africa, which suggests that diploid and polyploid C4 populations might share the same environmental niche. Further suitable areas were inferred along tropical and temperate regions of the Americas and around the Mediterranean basin in Southern Europe, although these regions likely remained inaccessible to A. semialata through its history. The predicted range of clade IV was mostly continuous within each continent, in contrast to the more fragmented ranges estimated for the other clades (Figs 3, S4).

Details are in the caption following the image
Distribution of suitable areas estimated for each Alloteropsis semialata clade at the global scale for the present (preindustrial) time. Colours correspond to clades. See Supporting Information Fig. S4 for the distribution based on the dataset used for demographic analyses, with clade IV samples just from Australia.

The paleoclimate reconstructions revealed a cyclical variation in the extent of suitable areas for each clade over the last 5 Myr, which was in line with global changes in temperature during this period (fig. 2 in Holden et al., 2019). Clade IV maintained the largest suitable areas both at the global scale and within Africa throughout the whole period. Within Australia, the suitable areas for clade IV were in the same order of magnitude as the areas of the other clades within Africa (Fig. S5; Table S3). In general, the climatic conditions became less favourable for A. semialata c. 2.5 Ma. Since then, despite the continuous fluctuations, the lower bound of suitable areas was higher than before for every clade. Within the last 2.5 Myr, maximum areas in Africa were estimated at c. 1.8, 1.5, 1.2, and 0.4 Ma for clades II, IV, IIIa, and I, respectively; and in Australia, c. 2.3 and 1.3 Ma for clade IV (Fig. S5; Table S3). Over time, no stable areas were identified for clade IIIa and only a few for clades I and II, while a network of suitable areas was identified for clade IV within each continent at every time step evaluated (Fig. S6; Table S4). The patterns observed across the paleoclimate reconstructions remained consistent when using the reduced dataset with clade IV samples restricted to Australia (data not shown).

Demographic models

One-population models

The Stairway Plot analysis inferred an ancestral expansion for each clade (Figs 4a, S7), which started c. 300 thousand generations ago (kga) (c. 1.5 Ma) for clades II and IIIa and c. 150 kga (c. 0.8 Ma) for clades IV and I. Further steep increases in Ne started c. 50 kga (c. 0.3 Ma). Clade IV attained the largest Ne and experienced the largest increase with respect to the ancestral Ne (> 60-fold vs < 20-fold in the other three clades; Fig. 4a).

Details are in the caption following the image
Changes in effective population size (Ne) through time inferred with Stairway Plot 2 for Alloteropsis semialata. (a) Changes inferred by clade, based on folded site frequency spectrum (SFS) obtained after downsampling the data to 30 individuals per clade. Shaded areas represent 95% confidence intervals. Dotted vertical lines mark the approximate start of expansions (see main text). (b) Changes inferred for each locality with at least four sampled individuals. African localities are grouped by clade (upper panels) and Australian localities, by region (bottom panels: NT, Northern Territory; WA, Western Australia; FNQ, Far North Queensland; SEQ, South East Queensland). Four localities with profiles distinct from the others in the same clade are not shown: TAN2, ZAM1716 (clade II), and TAN1603, ZAM1505 (clade IIIa), but see Supporting Information Fig. S8 for separate plots with 95% confidence intervals for each locality. (a, b) Colours correspond to clades; y-axis is shown in log scale.

The analysis by locality detected the initial expansions but also posterior declines for multiple localities (Fig. 4b). These declines were supported by both the Ne trajectory and the shape of the normalized SFS (i.e. a reduction in singletons respect to the previous allele frequency category) in a few C3 + C4 localities from clade II and in most C4 localities from clade IIIa and from clade IV along the easternmost region in Australia (South East Queensland; Figs S8, S9). Within clade IV, the timing of the demographic changes and the estimates of Ne towards the present showed a decreasing trend from northern to western and eastern localities, consistent with founder events following the species dispersal through Australia. Among all analysed samples, the largest recent Ne values corresponded to clade IV localities from the Northern Territory, in line with the clade analysis. Likewise, recent Ne values remained relatively large for almost all clade I localities, which did not show evidence of recent declines (Figs 4b, S8, S9).

Four localities from the Central Zambezian region in Africa exhibited distinct profiles compared with the other localities within the same clade: ZAM1716 and TAN2 from C3 + C4 clade II, and ZAM1505 and TAN1603 from C4 clade IIIa (Fig. S8). ZAM1716 showed a further increase in Ne after the recent decline, and TAN2 displayed a longer period of constant Ne before the decline. ZAM1505 did not show signs of decline and TAN1603 presented a recent steep increase in Ne instead. Notably, TAN1603 currently extends over a large area (LTD, field observation) and was identified as particularly introgressed by Olofsson et al. (2021); thus, the large change in Ne could reflect an increase in both population size and gene flow from non-C4 clades. Signatures of admixture were also detected in TAN2 (although its profile is quite the opposite) but not in ZAM1716 or ZAM1505 (Olofsson et al., 2021), implying that other sources of heterogeneity cannot be ruled out (Notes S1).

In general, both Stairway Plot and fastsimcoal2 models (Methods S1, S2; Fig. S10; Tables S5, S6) suggest that an expansion occurred in the ancestral populations of the clades, being compatible with the evolution of clades II and IIIa in the Central Zambezian region and the spread of clades I and IV out of this centre of origin, while some localities have likely experienced recent declines.

Four-population models

The best supported scenario for the joint demographic history of the four A. semialata clades corresponded to the isolation with migration model IM4, which involved one sudden Ne change in each current clade and gene flow between both ancestral populations and current clades (Fig. 5; Table S7). The split of the most recent common ancestor into ancestral C4 and non-C4 populations was estimated to be c. 634 kga (c. 3 Ma) and was followed by the expansion of both populations, although the expansion was larger in the ancestral C4 population. The split of non-C4 clades I and II was estimated to be c. 228 kga (c. 1 Ma) and the split of C4 clades IIIa and IV c. 66 kga (c. 0.3 Ma). A bottleneck was inferred at the origin of each clade, but it was stronger in the C4 clades. Subsequent increases in Ne were inferred to roughly 140 kga (c. 0.7 Ma) for clade II, 72 kga (c. 0.4 Ma) for clade I, and 60 kga (c. 0.3 Ma) for clades IIIa and IV. The C4 clades recovered Ne values close to their ancestral population shortly after they split, in relative terms, and overall clade IV experienced the largest increase in Ne towards the present. Besides, specifying further Ne changes in the ancestral C4 and non-C4 populations did not significantly improve the fit of the model (IM4ac, Table S7). Under the IM4ac model, the time estimates of ancestral Ne changes move towards the time of split between the ancestral populations under IM4, and the time of split moves towards 1 million generations ago, while all other estimates remain similar (Table S7).

Details are in the caption following the image
Scheme of the joint demographic history of the four Alloteropsis semialata clades inferred with fastsimcoal2 under the best fitted model (IM4, see Supporting Information Table S7 for parameter estimates and model comparisons). Results are based on the joint site frequency spectrum (SFS) obtained for a sample size of 10 individuals per clade. Column width is proportional to effective population size (N) and height, to time (T). Current effective population sizes: N1 – clade I, N2 – clade II, N3 – clade IIIa, N4 – clade IV. Effective population sizes of each clade before sudden expansions: N1TC to N4TC, respectively. Ancestral effective population sizes: NA12 – ancestral non-C4, NA34 – ancestral C4, NA – most recent common ancestor. Time of split events: TS – between non-C4 and C4 ancestral populations, TS12 – between non-C4 clades I and II, TS34 – between C4 clades IIIa and IV. Time of sudden expansions for each clade: TC1 to TC4, respectively. Horizontal arrow size is proportional to migration rates between populations, forward in time. The vertical arrow represents time increasing from the present to the past.

Although all migration rates were low, the highest rate was estimated from the ancestral C4 population to the ancestral non-C4 population (looking forward in time). This rate was one order of magnitude higher than migration from clade IIIa to clade II and from clade II to clade I, and two orders of magnitude higher than the rest (Fig. 5; Table S7). Therefore, our models support the ancient hybridization between C4 and non-C4 populations that was inferred from the incongruence between chloroplast and nuclear phylogenies and among nuclear gene trees (Bianconi et al., 2020a; Raimondeau et al., 2023), as well as the contribution of gene flow from C4 to non-C4 backgrounds to the diversification of the photosynthetic types proposed before (Olofsson et al., 2016; Dunning et al., 2017). In this regard, our models point to a scenario of constant migration between types (Methods S2; Tables S6, S7), but the considerably low estimates of migration rates (< 1 migrant - gene copy - per generation) and the geographic context would be rather compatible with recurrent episodes of genetic exchange (Bianconi et al., 2020a; Olofsson et al., 2021).

Discussion

Here, we combined niche models projected into paleoclimate layers spanning the last 5 Myr with demographic models based on genomic data to capture the impact of the emergence of a key trait, photosynthetic innovation, on the fate of different populations of the grass A. semialata. We analysed a total of 510 samples that represent the three photosynthetic types (C3, C3 + C4, and C4) and the four main nuclear lineages (clades I, II, III, and IV) of this species (Fig. 1). We observed that niche breath, geographic range, and effective population size were generally coupled in these four clades. They were all successful in the sense that they occupied distinct niches and experienced demographic expansions. However, overall, the acquisition of C4 photosynthesis was followed by a larger expansion of C4 populations (Figs 4, 5; Table S7), linked to the ability of C4 populations to occupy a broader climatic space (Figs 3, S3, S5).

The ancestral C4 population underwent higher population growth than the ancestral non-C4 in the region of origin after they split (Fig. 5). This points to an advantage brought by the C4 trait, under otherwise equal conditions for both ancestors. Such an advantage is further reflected in the unparalleled expansion of C4 clade IV (Fig. 4). It dispersed from Africa to Oceania within the last million years and attained the highest population growth among current clades. The rapid spread of clade IV is consistent with a large potential range over time derived from a wider climatic niche (Figs S3a, S5), together with long-distance dispersal events. These events probably facilitated the dispersal across the Indian Ocean since we did not identify potential land corridors from Africa to Asia, and transoceanic dispersal was also frequent in other angiosperm groups (de Queiroz, 2005; Linder et al., 2018). The Australian localities are at the edge of the expansion range and accordingly show footprints of serial founder effects: population growth was more recent and lower as populations moved away from the northern region, the potential entry to the continent (Olofsson et al., 2019), and recent declines were detected towards the easternmost localities (Figs 4b, S8, S9). Moreover, the Australian localities largely represent the climatic flexibility of the A. semialata C4 type, as they span most of the temperature and precipitation conditions that the C4 populations experience across the range (Figs S3b, S4).

By contrast, the other C4 clade analysed, clade IIIa, showed the lowest population growth (Fig. 4) and the narrowest potential range over time (Fig. S5). It was almost confined to the region of origin in Central Zambezia. This region has a higher elevation than the surrounding lowlands and is associated with miombo forest, features that possibly acted as dispersal barriers (Bianconi et al., 2020a; Olofsson et al., 2021). The extent of the miombo forest also varied during the glacial cycles (Ivory et al., 2018), and these fluctuations could be related to the recent declines observed in most clade IIIa localities (Fig. 4b). The trajectory of clade IIIa could thus result from geographic contingencies. This hypothesis is in line with the dependency of the C4 trait effect on the particular context of each lineage (Christin & Osborne, 2014), as shown over the long-term evolution of C4 grasses (Spriggs et al., 2014; Aagesen et al., 2016). The two A. semialata C4 clades (III and IV) do not differ in photosynthetic performance, although the genetic variation underlying the C4 pathway does vary among them (Lundgren et al., 2016, 2019; Dunning et al., 2019a,b). In this way, the success of clade IV could be driven by other adaptations not necessarily related to the C4 pathway (see Christin & Osborne, 2014), such as enhanced dispersal linked to changes in seed morphology or germination control (see Linder et al., 2018). However, despite the substantial phenotypic diversity within A. semialata, so far, there is no evidence of consistent trait differences among the clades beyond the photosynthetic pathway (see Pereira et al., 2023) that would support this hypothesis. Clades were found to be similar in terms of plant height, general morphology, flowering phenology, and seed size (Lundgren et al., 2015), even though a comprehensive assessment of trait variation across the species range is still missing (see Pereira et al., 2023). Besides, we note that we have not analysed the C4 polyploids that prevail in Central Zambezia and belong to clade IIIb, the other component of the C4 clade III (Olofsson et al., 2021). Polyploidy may indeed improve establishment and ecological flexibility in grasses (Linder et al., 2018), but the role of the polyploids in the evolution and success of the A. semialata types (and clades) is not understood yet and deserves further investigation.

The C3 + C4 clade II was also distributed in Central Zambezia, with local declines being compatible with the geographic constraints of the region. Nonetheless, clade II showed a larger population growth than clade IIIa (Fig. 4), and interestingly, its recent expansion (c. 0.7 Ma, Fig. 5; Table S7) did not coincide with any major change in geographic or climatic conditions (Fig. S5). This suggests that something happened to the C3 + C4 plants that made them more successful within the region they already inhabited. In this regard, previous studies demonstrated the transfer of genes encoding one of the core enzymes of the C4 cycle, the phosphoenolpyruvate carboxykinase decarboxylase (PCK), from C4 to C3 + C4 lineages after they diverged (Dunning et al., 2017). The transfer of functional genes, between close or distant taxa, is currently recognized as an important source of adaptive variation for plant evolution (Christin et al., 2012; Dunning et al., 2019b; Olofsson et al., 2019; Bianconi et al., 2020b; Wickell & Li, 2020; Hibdige et al., 2021; Raimondeau et al., 2023). Therefore, it is tempting to suggest that the acquisition of this or any other C4-related character might have promoted the demographic expansion of clade II in the absence of evident external triggers. Along these lines, our demographic models confirm that the diversification of the photosynthetic types occurred in the face of gene flow, between both ancestral populations and current clades, and that this preferentially happened from C4 to non-C4 groups (Fig. 5; Table S7).

Last, the C3 clade I also underwent a rapid population growth (Fig. 4), consistent with a dispersal out of Central Zambezia as clade IV but into colder habitats. The niche of clade I was broader than that of clade II (Fig. S3), but its potential range remained smaller and patchier (Figs 3, S5). The range of clade I is currently limited to South African mountains, where the C3 plants seem to be well-adapted as reflected in the local demographic trajectories (Fig. 4). The success of clade I was probably mediated by the development of freezing tolerance to cope with low-temperature extremes (Osborne et al., 2008). In this respect, the evolution of cold tolerance across different C3 lineages is considered another key factor in shaping the global distribution of grasses (Edwards & Smith, 2010; Linder et al., 2018).

Overall, our inferences from the niche models projected into past climates and from the demographic models were concordant with each other, and with the spatial and temporal frame previously proposed for the photosynthetic diversification within A. semialata (Lundgren et al., 2015; Olofsson et al., 2016, 2019, 2021; Dunning et al., 2017; Bianconi et al., 2020a). Our approach provided new insights into the relative success of the clades after the acquisition of the C4 trait, regardless of the limitations. We assumed niche conservatism to estimate the distribution of the clades over space and time. Yet, the possibility of niche evolution cannot be discarded, especially for clade IV giving its impressive range expansion. We also assumed the uncertainty on the mutation rate (1 × 10−8 per site per generation) and generation time (5 yr) used for demographic modelling. In our demographic models, besides, clade IV was only represented by Australian samples while the non-Australian ones would conform to ‘ghost’ populations not accounted for. This could mislead the demographic inferences to some extent (see Methods S2; Excoffier et al., 2013; Maisano Delser et al., 2019; Momigliano et al., 2021). Despite all the caveats, altogether our results indicate that the A. semialata C4 clades hold a history of larger effective population size than the non-C4 clades. Furthermore, our results show that niche breath and predicted geographic extent are generally coupled with estimated effective population size in these clades, supporting an immediate beneficial effect of the C4 trait on the demography of the populations driven by a broader climatic niche of the C4 plants.

Conclusion

By integrating niche and demographic modelling, our study supports that C4 photosynthesis was immediately beneficial to the A. semialata populations where it emerged. The C4 trait probably conferred them the ability to thrive under a broader set of temperature and precipitation conditions, and thus to extend the geographic range and to increase the population size beyond other (non-C4) populations in the face of recurrent environmental changes. Furthermore, the posterior transfer of some C4-related characters to non-C4 populations might have promoted the recent expansion of non-C4 populations in the region of origin. Yet, the varying trajectories observed across C4 localities and clades highlight that the effect of the photosynthetic trait might be masked by geographic contingencies. Overall, our insights into the demographic history of A. semialata C4 and non-C4 populations represent a step forward to disentangle how the emergence of a key trait can influence evolutionary trajectories over different timescales.

Acknowledgements

We thank Paolo Momigliano for advice on demographic analyses, and Lewis A. Jones and Sofía Galván for suggestions on data visualization. This study was funded by the European Research Council under the European Union's Horizon 2020 research and innovation programme (grant agreement 947921) as part of the MAPAS project. Computational resources were provided by the Galicia Supercomputing Center (CESGA). Open access charge was covered by Universidade de Vigo and Consorcio Interuniversitario do Sistema Universitario de Galicia (CISUG). SG was funded by the Ministry of Universities and the Next Generation European Union programme through a Margarita Salas Grant from Universidad Complutense de Madrid (CT31/21). LTD was supported by a Natural Environment Research Council Independent Research Fellowship (grant no.: NE/T011025/1).

    Competing interests

    None declared.

    Author contributions

    GS, P-AC and SV conceived and designed the study. GS performed the demographic analysis. SG performed the niche analysis. GS interpreted the results and wrote the manuscript with help from SG, LTD, P-AC and SV.

    Data availability

    All ddRADseq data were retrieved from NCBI SRA BioProjects PRJNA560360 and PRJNA649872. The reference genome of A. semialata ASEM_AUS1_v1.0 was accessed through GenBank no. QPGU01000000. Sample information is provided in Table S1. The scripts used in this study are available from GitHub: https://github.com/MAPASlab/A_semialata_demography.