Volume 233, Issue 1 p. 479-495
Full paper
Free Access

Pathogen-driven coevolution across the CBP60 plant immune regulator subfamilies confers resilience on the regulator module

Qi Zheng

Qi Zheng

Department of Plant and Microbial Biology, Microbial and Plant Genomics Institute, University of Minnesota, St Paul, MN, 55108 USA

Key Laboratory of Biology and Genetic Improvement of Maize in Southwest Region, Maize Research Institute, Sichuan Agricultural University, Chengdu, Sichuan, 611130 China

Search for more papers by this author
Kristina Majsec

Kristina Majsec

Department of Plant and Microbial Biology, Microbial and Plant Genomics Institute, University of Minnesota, St Paul, MN, 55108 USA

Search for more papers by this author
Fumiaki Katagiri

Corresponding Author

Fumiaki Katagiri

Department of Plant and Microbial Biology, Microbial and Plant Genomics Institute, University of Minnesota, St Paul, MN, 55108 USA

Author for correspondence:

Fumiaki Katagiri

Email:[email protected]

Search for more papers by this author
First published: 05 October 2021
Citations: 7

Summary

  • Components of the plant immune signaling network need mechanisms that confer resilience against fast-evolving pathogen effectors that target them. Among eight Arabidopsis CaM-Binding Protein (CBP) 60 family members, AtCBP60g and AtSARD1 are partially functionally redundant, major positive immune regulators, and AtCBP60a is a negative immune regulator. We investigated possible resilience-conferring evolutionary mechanisms among the CBP60a, CBP60g and SARD1 immune regulatory subfamilies.
  • Phylogenetic analysis was used to investigate the times of CBP60 subfamily neofunctionalization. Then, using the pairwise distance rank based on the newly developed analytical platform Protein Evolution Analysis in a Euclidean Space (PEAES), hypotheses of specific coevolutionary mechanisms that could confer resilience on the regulator module were tested.
  • The immune regulator subfamilies diversified around the time of angiosperm divergence and have been evolving very quickly. We detected significant coevolutionary interactions across the immune regulator subfamilies in all of 12 diverse core eudicot species lineages tested. The coevolutionary interactions were consistent with the hypothesized coevolution mechanisms.
  • Despite their unusually fast evolution, members across the CBP60 immune regulator subfamilies have influenced the evolution of each other long after their diversification in a way that could confer resilience on the immune regulator module against fast-evolving pathogen effectors.

Introduction

Pathogen effectors, typically proteinaceous, are delivered from pathogens into plant cells and manipulate plant functions to improve pathogen environments in planta, including compromising plant immune signaling (Toruño et al., 2016). The plant immune signaling network characterized in Arabidopsis has a remarkable amount of resilience as different parts of the network have compensating functions that buffer the disabling impacts of mutations or pathogen effectors (Tsuda et al., 2009; Hillmer et al., 2017; Katagiri, 2018). We are interested in how such network resilience evolved in biological systems. To gain insights into this topic, we have been studying evolution of important components of the immune signaling network. Members of the Calmodulin (CaM)-Binding Protein (CBP) 60 family are such network components (Wang et al., 2009, 2011; Zhang et al., 2010; Truman et al., 2013).

Arabidopsis has eight CBP60 family members (AtCBP60a-g and AtSARD1) (Reddy et al., 2002; Zhang et al., 2010; Wang et al., 2011). The CBP60 protein family was defined by the domain conserved among known CBP60 proteins (Pfam: ‘calmodulin_bind’, PF07887). Despite the Pfam domain name, the actual CaM-binding domains are outside the conserved domain (Fig. 1) (Reddy et al., 2002; Wang et al., 2009, 2011). Thus, we call the conserved domain the CBP60-conserved domain. AtCBP60g and AtSARD1 are major positive regulators of immunity controlling activation of many immune responses, including synthesis of the important immune hormone salicylic acid (SA) (Wang et al., 2009, 2011; Zhang et al., 2010). They are transcriptionally induced during immune responses. Their functions as positive immune regulators are partially redundant because an atcbp60g atsard1 double mutant has a more severe immune deficiency than either of the single mutants (Zhang et al., 2010; Wang et al., 2011). As the fungal effector VdSCP41 targets AtCBP60g (Qin et al., 2018), CBP60g could be under selective pressure from pathogen effectors, including VdSCP41. By contrast, AtCBP60a is a negative regulator of SA signaling and immunity (Truman et al., 2013). In an atcbp60a mutant, the basal concentration of SA is higher (Truman et al., 2013) and decrease of the post-induction level of AtCBP60g mRNA is slowed (Lu et al., 2018). The functions of the other AtCBP60 members (AtCBP60b–f) are unknown as mutants lacking any one of them did not show substantial changes in immunity or other obvious phenotypes (Truman et al., 2013). AtCBP60g and AtSARD1 are DNA-binding proteins (Zhang et al., 2010; Qin et al., 2018). As the DNA-binding activity resides within the CBP60-conserved domain, it is likely that all CBP60 proteins are DNA-binding proteins. Phylogenetic analysis of AtCBP60 members with a CBP60 from the moss Physcomitrella patens as the outgroup suggested that the immune regulators AtCBP60a, AtCBP60g and AtSARD1 form one clade and AtCBP60b–f form a separate clade (Wang et al., 2011).

Details are in the caption following the image
Domain structures of Arabidopsis CBP60 proteins. The scale shown below the figure indicates the amino acid number. The amino acid lengths between the CBP60-conserved domain and the C-terminal region of the C-calmodulin-binding domain (CaMBD) in AtCBP60a–f vary from 133 to 261: in this figure, the length for AtCBP60b was used.

Gene duplication events are very common in land plants, often as a result of frequent whole-genome duplication during their evolution (Clark & Donoghue, 2018). One common outcome is that one of the duplicated genes becomes pseudogenized or deleted. Otherwise, gene duplication can contribute to resilience of the organisms or provide sources of innovations (Crow & Wagner, 2006). Innovations involve functional specialization of duplicated genes, including neofunctionalization of one of the duplicated genes.

We applied phylogenetic analysis to the CBP60 protein family members in 247 diverse land plant species and found that the immune-related clade diversified from the highly conserved, prototypical group around the time of seed plant divergence. The CBP60a, CBP60g and SARD1 immune regulator subfamilies diversified within the immune-related clade around the time of angiosperm divergence. The immune regulator subfamilies have been evolving very quickly, which suggests strong selection imposed by fast-evolving pathogen effectors. Among 12 diverse core eudicot species, the rapid evolution of the immune regulator subfamilies was caused by both higher proportions of polymorphic amino acid sites and fast evolution rates per polymorphic site.

We further investigated possible coevolutionary interactions across these fast-evolving CBP60 immune regulator subfamilies which were suggested in a previous work (Truman et al., 2013). We define coevolution of subfamilies by nonindependence (i.e. interactions) across the courses of evolution after their diversification. We developed a geometric analysis platform for the physical-chemical characteristics of amino acids, Protein Evolution Analysis in a Euclidean Space (PEAES). The pairwise distance rank on PEAES (PEAES-PDR) was used to detect signatures of possible coevolutionary interactions among the immune regulator subfamilies in the CBP60-conserved domain in the 12 core eudicot species linages. We detected significant coevolutionary interactions, which were consistent with two coevolutionary mechanism hypotheses. First, two positive regulators evolve to be as different as possible, which makes it difficult for a single pathogen effector to target both at once. Second, the negative regulator evolves to retain similarity to the positive regulators at the sites where the positive regulators cannot be very different. If a pathogen effector targets both positive and negative regulators, the negative impact of the effector on immunity as a result of targeting the positive regulators is moderated by its simultaneous targeting of the negative regulator. Our discoveries strongly suggest that functionally related protein subfamilies influence the evolution of each other so that high degrees of functional resilience can be achieved under strong and rapidly varying selection imposed by fast-evolving pathogen effectors.

Materials and Methods

The CBP60 protein sequence set

A protein sequence database consisting of sequences from 271 species was constructed. The sequence data sources (Gonzales et al., 2005; Sato et al., 2008, 2011; Al-Dous et al., 2011; van Bakel et al., 2011; Bombarely et al., 2012; Goodstein et al., 2012; Ming et al., 2013; Singh et al., 2013; Kitashiba et al., 2014; Yagi et al., 2014; Proost et al., 2015; Unver et al., 2017; Li et al., 2018; Leebens-Mack et al., 2019; Zheng et al., 2019) are listed in Supporting Information Table S1. The database was searched by Blastp (Camacho et al., 2009) using each of the Arabidopsis CBP60 members as a query and the union of the subject sequences that yielded bit scores higher than the Arabidopsis CBP60 that yielded the lowest bit score was identified as the CBP60 protein sequence set, consisting of 1432 sequences. The identified CBP60 protein sequence set did not include some distantly related CBP60 homologs (Notes S1). We left them out of the set because they do not allow functional interpretations of CBP60 evolution, as our knowledge of CBP60 functions is limited to the Arabidopsis members and their orthologs. The CBP60 protein sequence set was subjected to quality controls to remove possible products of assembly or annotation errors and to limit sequence representation to one species per genus for balanced overall representation. These quality controls resulted in the final set consisting of 1024 sequences from 247 species (Table S2). To ease identification of plant species, their common names or genus names are used.

Phylogenetic analysis of CBP60 protein sequences

Generally, ClustalW (Thompson et al., 1994), a maximum likelihood (ML) method (Guindon & Gascuel, 2003), and iTOL (Letunic & Bork, 2019) were used for multiple sequence alignment, tree inference, and tree visualization, respectively. Maximum likelihood was also used for inference of most probable ancestor sequences at branch points in trees. Mega7 (Kumar et al., 2016) was used for ClustalW and ML with the default parameters, except for 80% site coverage in ML (unless stated otherwise). We used the most recent common ancestor (MRCA)-anchored, clade-by-clade tree inference (MACCTI) approach, in which the aligned sequence region used for tree inference is locally adjusted. Briefly, an overview tree for relationships among all major clades and MRCAs on the main CBP60 lineages was inferred using only the sequences with alignable C-terminal regions, the sequences of each clade and its flanking MRCAs on the main CBP60 lineages were used to infer the subtree for the clade, the MRCAs of the clades were used to infer the backbone tree showing the relationships among the clade MRCAs, and the clade MRCAs in the backbone tree were replaced with the corresponding clade trees to reconstruct the entire tree. As species phylogeny, duplication and deletion (SPDD rule) generally explained our tree of 1024 CBP60 sequences at high levels, we concluded that the tree was sufficiently accurate to support the analysis presented here. Detailed methods and discoveries made based on the phylogenetic relationships but not described in the main text, are given in Notes S2.

CaM-binding site prediction

CaMELS (Abbasi et al., 2017) was used for CaM-binding site prediction. The decipher r bioconductor package (Wright, 2016) was used to predict alpha-helical propensity. A Python script, hydrophobic_moment.py (https://gist.github.com/JoaoRodrigues/568c845915aea3efa3578babfd72423c), was used for calculation of the hydrophobic moment.

Benchmark protein sequences and amino acid sites

Johnson et al. (2019) selected 353 genes that are single-copy in angiosperm species for phylogenetic studies. We selected 298 genes that had orthologs in Arabidopsis, rice and the basal angiosperm Amborella. The protein sequence database for the 12 core eudicot species we chose for the study was searched by Blastp with default parameters (Camacho et al., 2009), except that the E-value cutoff value was 1 × 10−50, using the Arabidopsis protein sequences for the 298 genes. We further selected 225 proteins that had high scoring pairs (HSPs) in each of the 12 species. The HSP with the highest score was chosen if a single query resulted in multiple HSPs from a single species. The HSP subject protein sequences of the 12 species were subjected to multiple sequence alignment using ClustalW (Thompson et al., 1994) with default parameters in the msa package from the BioConductor project (Bodenhofer et al., 2015). High-quality aligned amino acid sites were selected for the 12 core eudicot species (‘benchmark.aa.sites.12sp.fa’ in Dataset S1; 12 species × 70 022 sites). The high-quality sites or polymorphic sites were sampled with replication with the appropriate sample size for a statistic of interest 50 000 times to estimate the median and the 95% and 99% confidence intervals of the benchmark.

Parsimonious substitution count

The parsimonious substitution count for a particular site in a protein sequence alignment is the minimum number of amino acid substitutions that conform leaf amino acids of the site to the species phylogenetic tree structure. See Notes S2 for detail.

PEAES-PDR analysis

Protein Evolution Analysis in a Euclidean Space is a geometric description of the physical-chemical characteristics of 20 amino acids and a gap in a Euclidean space consisting of the five dimensions determined by Venkatarajan & Braun (2001) and one gap dimension (Table S3). The dissimilarity between two amino acids (or between an amino acid and a gap) is defined by the Euclidean distance between them (Table S4). Twelve Eudicot species with high-quality genome sequences were selected to represent 12 diverse taxonomic orders (Fig. S1). One sequence for each of the CBP60a, CBP60bcd, CBP60g and SARD1 subfamilies was selected for each species (48 sequences). As some species had more than one paralog, two sets of 48 sequences were made by selecting different paralogs from these species. Each of these two sets was analyzed separately. The CBP60-conserved domains of 48 sequences were aligned by ClustalW (Thompson et al., 1994), and then the multiple sequence alignment was manually edited to remove insertional polymorphisms represented by only one or two sequences and to make the alignments consistent between the two sets. The alignments for the two sets are provided in Dataset S1. To quantify the degree of diversity at each site, the mean pairwise PEAES distance was calculated for each site for each subfamily. The PEAES-PDR analysis was conducted with 265 sites from the 296-site CBP60-conserved domain multiple sequence alignment, which had amino acid variation across the 12 eudicot species in at least one of the CBP60a, CBP60g or SARD1 subfamilies. The PDR was calculated at each site between two CBP60 subfamilies for each of the 12 eudicot species based on the PEAES metric according to the procedure depicted in Fig. 2. For enrichment of the sites in the first and third quadrants over the second and the fourth quadrants in each of the CBP60g-, SARD1-, and CBP60a-common comparisons, the odds ratio of the site numbers, (first quadrant) × (third quadrant)/{(second quadrant) × (fourth quadrant)}, was examined. A higher odds ratio was predicted by Hypotheses I1 and I2 (see later). A single set of null sequences with 265 sites for 12 species in each of three subfamilies was simulated by randomly substituting each site with a site in the benchmark protein sequence alignment with the same amino acid variant number across 12 species. The null odds ratio distribution for each comparison was estimated by calculating the PEAES-PDR odds ratios for each of 5000 simulated null sequence sets. The obtained null odds ratio distribution was used to calculate the P-values for the actual odds ratios (one-sided). The P-value was corrected for multiple tests by Benjamini–Hochberg’s false discovery rate (FDR) across the species in each comparison. Details of the PEAES-PDR method and its simulation analysis are given in Notes S3. R scripts used for PEAES-PDR analysis and for the simulation are ‘peaes.pdr.figures.rev.f.for.pub.r’ and ‘sim.peaes.pdr.v12.for.pub.r’ in Dataset S1.

Details are in the caption following the image
Pairwise distance ranks (PDRs) based on the Protein Evolution Analysis in a Euclidean Space (PEAES) distance metric. First, for each of the 296 amino acid sites in the multiple sequence alignment of the CBP60-conserved domain, a pairwise PEAES-distance matrix was made between members of two subfamilies (CBP60g and SARD1 in the figure as an example) in the 12 core eudicot species. Second, in the pairwise PEAES-distance matrix at a particular site (site #x), for each species (carrot in the figure as an example) the pairwise distance between the members of the two subfamilies of the species (red square in the matrix) was compared with the pairwise distance between the member of one subfamily of this species (carrot) and the member of the other subfamily of a different species (pink squares). The rank of the within-species comparison was determined among these 23 pairwise distances. This rank (21.0 for carrot) is called the PDR at each site in each species.

Results

The immune-related CBP60 clade neofunctionalized from the prototypical CBP60 group

We inferred phylogenetic relationships among 1024 CBP60 protein sequences from 247 diverse land plants: one liverwort, 35 moss, nine lycophyte, 44 fern, 69 gymnosperm and 89 angiosperm species (Figs 3, S2; the latter includes sequence names). Most plant species had at least one member of the highly conserved, ‘prototypical’ CBP60 group that includes AtCBP60b–f (salmon color arc of the middle ring in Fig. 3). The tree topology indicates that a major clade diversified from the prototypical group around the time of Gymnosperm divergence (blue arc of the middle ring in Fig. 3). This clade further diversified into the three immune regulator CBP60a, CBP60g and SARD1 subfamilies (pink, cyan and green arcs in the outermost ring in Fig. 3) around the time of angiosperm divergence. Hence, we call this clade the immune-related clade. It is evident from the branch lengths that the immune regulator subfamilies have been evolving much faster than the prototypical group.

Details are in the caption following the image
The immune-related clade diversified around the time of seed plant divergence and has been evolving very quickly. The phylogenetic tree of 1024 CBP60 protein sequences from 247 land plant species was inferred using RaXML (Stamatakis, 2014) based on ClustalW alignment (Thompson et al., 1994) and visualized using iTOL (Letunic & Bork, 2019). Colored dots at the leaves of the tree represent the land plant group (from liverwort to angiosperm) from which the CBP60 member originated. The inner ring with the yellow (low) to red (high) color range shows the calmodulin (CaM)-binding score predicted by CaMELS (Abbasi et al., 2017). The middle ring shows the prototypical group (salmon) and the immune-related clade (blue). The outer ring shows the CBP60 subfamilies according to AtCBP60 names. The AtCBP60 leaf positions are indicated in the outermost layer. The branch length scale of 0.1 (substitution per site) is shown at the top left.

None of the six basal angiosperm species had SARD1 subfamily members. This could be a result of SARD1 deletion in basal angiosperms or to SARD1 subfamily diversification from the CBP60g subfamily after basal angiosperm divergence. A likelihood ratio test of the tree topologies for these two possible cases favored the tree structure with SARD1 deletion in basal angiosperms (P < 0.01; Fig. S3a,b). We identified multiple SARD1 subfamily members in recently published genome sequences of a basal angiosperm water lily order (Zhang et al., 2020) (Fig. S3c). This new information strongly supports our conclusion that diversification of all three immune regulator subfamilies occurred around the time of angiosperm divergence.

The common ancestor of the immune regulator subfamilies appears to have lost CaM-binding ability

AtCBP60a–f have CaM-binding domains (CaMBDs) in their C-terminal regions (C-CaMBD), AtCBP60g has one in its N-terminal region (N-CaMBD), but AtSARD1 lacks CaM-binding ability (Reddy et al., 2002; Wang et al., 2009, 2011) (Fig. 1). We investigated when C-CaMBD was lost during evolution of the immune regulator subfamilies using CaMELS (Abbasi et al., 2017), which was the only algorithm among those we tested that correctly predicted C-CaMBD and N-CaMBD among AtCBP60 members. Generally, C-CaMBD was conserved among CBP60 proteins, except for some specific clades, including the immune regulator subfamilies (Fig. 3, innermost ring). C-CaMBD in AtCBP60a and N-CaMBD in AtCBP60g appear to have been reacquired relatively recently. These observations suggest that CaM-binding activity may not be essential for the CBP60 immune regulator functions, or, alternatively, loss of C-CaMBD may have coincided with evolution of a CaM-binding adaptor protein that works together with the immune regulators. The latter could explain reacquisition of CaMBD in some of the immune regulators.

There were two forms of C-CaMBD loss (Fig. 4). One form is deletion of the C-terminal region corresponding to C-CaMBD, such as in the SARD1 subfamily members (Fig. 4a). The other form is mutations in C-CaMBD with weakly to moderately alignable sequences remaining in the C-terminal region. (Wang et al., 2009). A secondary structural property of amphipathic helix may be important for CaMBDs (Degrado et al., 1987). When the CaM-binding prediction scores at the C-CaMBD region were compared with the α-helix length and the hydrophobic moment (as an indication of a potential amphipathic α-helix), a long α-helix appeared important while its amphipathicity did not (Fig. 4b). Although we cannot exclude a possible high false-negative rate in CaMELS predictions, mutations that led to loss of CaMBD might have disrupted the CaMBDs through disruption of alpha-helices.

Details are in the caption following the image
Loss of calmodulin (CAM)-binding ability is caused by loss of the C-terminal region of the C-calmodulin-binding domain (C-CaMBD) or mutations in C-CaMBD. (a) Distribution of the length of the region C-terminal to the CBP60-conserved domain in each subfamily in angiosperms. (b) Contributions of the length of alpha-helix and its hydrophobic moment to the CaM-binding prediction score according to the CaMELS algorithm (Abbasi et al., 2017). AA, amino acid.

Fast evolution of the CBP60 immune regulator subfamilies among core eudicots was caused by both high proportions of polymorphic sites and high evolution rates per site

The branch lengths of the CBP60 protein phylogenetic tree (Fig. 3) strongly suggested fast evolution of the immune regulator subfamilies, especially the CBP60g subfamily. We closely investigated their fast evolution among core eudicot species. We chose core eudicots because genome sequences (vs transcriptome sequences) of many diverse core eudicot species were available. To reduce the influence of conservation as a result of recently shared ancestry, a single species from each of 12 taxonomic orders was chosen for the analysis (Figs 5a, S1). To simplify the analysis, when one species has more than one member per subfamily, only one member was selected. This subfamily member selection should not introduce strong bias: in all such cases, the members in a single subfamily were paralogs diversified after divergence of the corresponding taxonomic orders. For example, AtCBP60b, AtCBP60c and AtCBP60d are paralogs in this timescale, and one of them was selected as the Arabidopsis sequence for the CBP60bcd subfamily. We also conducted the subsequent analyses with an alternative set of selected CBP60 members and obtained similar observations (indicated as ‘the alternative set’ for the corresponding figures in the Supporting Information), confirming the absence of strong bias owing to CBP60 member selection.

Details are in the caption following the image
The immune regulator subfamilies have been evolving very quickly among 12 diverse core eudicot species. (a) Phylogeny of the 12 core eudicot species used. (b) Proportions of the polymorphic sites of the CBP60 subfamilies are compared with the polymorphic site proportion distribution of the samples of the same sizes from the benchmark proteins. (c) Examples of the parsimonious substitution count. Two amino acid site examples with two variants, D and N, for parsimonious substitution count = 1, 2 or 3 are shown. While a red ‘X’ in a tree shows the particular substitution event for the example amino acid variation, an orange ‘X’ shows a possible substitution event as multiple sets of substitution event positions are possible in the latter case. (d) Cumulative distributions of the parsimonious substitution count of the CBP60 subfamilies are compared with those of samples from the benchmark proteins. Significant right shifts of the cumulative distributions for the immune regulator subfamilies indicate significantly higher parsimonious substitution counts than the benchmark proteins. The CBP60 subfamilies are color-coded in (b) and (d): salmon, CBP60a; cyan, CBP60g; green, SARD1; orange, CBP60bcd. Horizontal thick solid line, median; thin solid line, 95% confidence interval; thin dashed line, 99% confidence interval (b, d). In (d), the cumulative distributions for the CBP60 subfamilies are slightly positively offset for better visualization.

We compared the CBP60 subfamily members with benchmark protein sequences in the 12 core eudicot species. The benchmark protein sequences were selected for high-quality multiple sequence alignments from protein sequencies encoded by a set of 353 ‘single copy’ genes for angiosperm phylogeny studies (Johnson et al., 2019). Single-copy genes suggest that the function of each gene has been conserved during angiosperm evolution. The selected benchmark protein sequence alignment across the 12 species had 70 022 amino acid sites, including 39 561 polymorphic sites. Random sampling from the benchmark sites was used to estimate the median and the 95% and 99% confidence intervals of the benchmark. We judge an observed value to be significantly different when it is outside the 99% confidence interval.

Outcomes of fast evolution could be detected as an increased proportion of polymorphic sites or an increased evolution rate per polymorphic site, or both. First, we examined the proportion of polymorphic sites among the amino acid sites of the high-quality alignments between the CBP60 subfamilies and the benchmark proteins. The proportions of polymorphic sites were significantly higher with the immune regulator subfamilies, especially with the CBP60g subfamily, while no significant difference was observed with the CBP60bcd subfamily (see Figs 5b, S4a for the alternative set).

Second, we examined the evolution rate per polymorphic site using the parsimonious substitution counts per amino acid site among the 12 core eudicot species given their species phylogeny (examples of the parsimonious substitution count in Fig. 5c). The protein sequences from 12 diverse species represent sparse data, and the actual substitution count is probably higher than the parsimonious count. As we are interested in whether the parsimonious substitution count distribution of a CBP60 subfamily is shifted higher or lower compared with that of the benchmark, the cumulative distributions of the parsimonious substitution counts were compared (Figs 5d, S4b for the alternative set). Note that when the parsimonious substitution count distribution is shifted higher, the cumulative distribution curve is shifted toward the right in the plot. Significantly higher parsimonious substitution counts, which indicate significantly higher evolution rates per site, were observed with all immune regulator subfamilies, especially with the CBP60g subfamily. No significant difference was observed with the CBP60bcd subfamily. Thus, fast evolution of the CBP60 immune regulator subfamilies is evident in both higher proportions of polymorphic sites and higher evolution rates per polymorphic site, compared with the benchmark.

Fast evolution could be caused by release from negative selection, including pseudogenization, as well as by strong and varying selection. Despite their very fast evolution relative to the CBP60bcd prototypical subfamily, pseudogenization is not the case for the CBP60 immune regulator subfamilies as their Arabidopsis members are all functional and the vast majority of diverse angiosperm species have maintained members of all three immune regulator subfamilies. Thus, their very fast evolution is probably a result of strong and varying selection, such as pressure from fast-evolving pathogen effectors targeting CBP60 immune regulator members. A fungal effector is known to target AtCBP60g (Qin et al., 2018). It is conceivable that multiple effectors target these major positive immune regulator subfamily members at a variety of sites.

Protein evolution analysis in a Euclidean space

Phylogenetic analysis of sequences using standard evolution models assumes stationary, reversible and homogeneous evolution (Naser-Khdour et al., 2019). These conditions are clearly violated when evolution is under strong and varying selection. We postulated that when a site is under strong selection, the amino acid at that site is selected based mainly on physical-chemical characteristics of amino acids and is relatively independent of the lineage ancestry when the lineage divergence is sufficiently old. Divergence of the 12 core eudicot species lineage is probably sufficiently old for evolution of many amino acid sites in the CBP60 immune regulator subfamilies, judging by their high parsimonious substitution counts. Thus, we decided to use a metric of physical-chemical characteristics of amino acids to evaluate relatedness of selection at each site. Venkatarajan & Braun (2001) described physical-chemical characteristics of 20 amino acids by five-dimensional Euclidean coordinates after applying linear dimensionality reduction to 237 physical-chemical properties. We added one more dimension to include absence of an amino acid (i.e. a gap in an alignment; Table S3). This physical-chemical description of amino acids in a six-dimensional Euclidean space allows geometrical tracking of evolution at each site in a protein sequence alignment. For example, the pairwise Euclidean distance between amino acids can be defined as the amino acid dissimilarity measure (Table S4). We call this geometric analysis platform PEAES.

In the subsequent section, we examine possible coevolutionary interactions among the CBP60 immune regulator subfamilies. We focused our analysis on the CBP60-conserved domain because it is impossible to obtain a reliable multiple sequence alignment across the subfamilies outside the conserved domain (Fig. S5). The CBP60-conserved domain sequence that is highly conserved in the prototypical group contains 293 amino acids. To align the domain sequences from the immune regulator subfamilies together, three gaps were added, resulting in the multiple sequence alignment consisting of 296 sites (Fig. S6). Site positions within this 296-site alignment are used subsequently.

With PEAES, the measure of diversity at each site in the aligned sequences was defined as the mean of all pairwise distances among amino acids at that site (see Figs 6a, S7a for the alternative set) shows the conservation and diversity at each site in different CBP60 subfamilies. The CBP60bcd subfamily represents the prototypical group. It is evident that while many sites are highly conserved in the CBP60bcd subfamily, many sites in each immune regulator subfamily, particularly the CBP60g subfamily, are highly diverse. The mean of the diversity measure across the sites for the CBP60bcd subfamily is significantly lower than that for the benchmark proteins (the 99% confidence interval shown in the parentheses in Fig. 6b), indicating that the CBP60-convserved domain is extremely well conserved within the CBP60bcd subfamily in the core eudicots. However, the means of the diversity measures across the sites for the immune regulator subfamilies are significantly higher than that for the benchmark, indicating that the immune regulator subfamilies are highly diverse compared with the benchmark even within the domain conserved across all CBP60 subfamilies (see Figs 6b, S7b for the alternative set). This trend of highly diverse immune regulator subfamilies and highly conserved prototypical group is also evident using other measures, including the number of highly diverse sites (site diversity measure > 0.4415, which is the 90th percentile value among the benchmark proteins) and the number of strictly conserved sites (Fig. 6b; the number of highly diverse sites in the CBP60a subfamily is the only exception as it is not significant although it is higher than the benchmark mean value).

Details are in the caption following the image
The CBP60-conserved domains of the immune regulator subfamilies are highly diverse among 12 core eudicot species. (a) The mean pairwise Protein Evolution Analysis in a Euclidean Space (PEAES) distance for each site across the species is shown as a line plot in each panel. One CBP60 member per subfamily was selected for each species. Strictly conserved sites are shown as segments at the bottom in each plot. The segments in black are the sites strictly conserved across all CBP60 members of the core eudicot species. The subfamily-specific sites are shown in salmon, cyan, green and orange for the CBP60a, CBP60g, SARD1 and CBP60bcd subfamilies, respectively. Short segments represent subfamily-conserved gaps in the alignment. Highly diverse sites (pairwise distance mean > 0.4415) are shown at the top in each plot as segments colored according to the subfamily color code. (b) Three measures of the diversity level, the mean of the PEAES diversity measure across the sites, the number of highly diverse sites, and the number of strictly conserved sites, for each subfamily and the benchmark protein are shown. For the benchmark (gray-shaded), the mean value and the 99% confidence interval (in parentheses) are shown. The 12 core eudicot species are quinoa, coffee, tomato, monkeyflower, carrot, sunflower, poplar, soybean, peach, orange, cotton and Arabidopsis. AA, amino acid.

Highly nonrandom, lineage-specific coevolutionary interactions across the immune regulator subfamilies are prevalent

As they share a common ancestor as recently as the time of angiosperm divergence and they have overlapping or opposing functions in immune signaling (Wang et al., 2009, 2011; Zhang et al., 2010; Truman et al., 2013), the immune regulators CBP60a, CBP60g and SARD1 probably form a regulatory module in the immune signaling network. We investigated the possibility that coevolution of the regulatory module components confers resilience against negative impacts of pathogen effectors targeting the module components. There may be pressure for the partially functionally redundant, positive immune regulators, CBP60g and SARD1, to be as dissimilar as possible at critical sites, which would reduce the probability that they are both targeted by a single pathogen effector (Hypothesis M1; ‘M’ for mechanistic). On the other hand, if both the negative regulator CBP60a and one of the positive regulators are targeted by a single pathogen effector, simultaneous impairment of the positive and negative regulators would moderate the negative impact of the effector on immunity (Truman et al., 2013). If this is the case, the negative immune regulator CBP60a may be under pressure to remain similar to one or both of the positive regulators at critical sites (Hypothesis M2). Such selection imposed by pathogen effectors could be strong. In addition, the direction of selection and the sites under selection could change over short time periods within each plant species lineage. This is because pathogen effectors evolve very quickly, and multiple effectors from multiple pathogens may target the regulatory module of a single plant species. To detect coevolutionary interactions under rapidly changing selection, the analysis used must be specific to the site and the species lineage.

We defined the PDR of a single site as the rank of the pairwise PEAES distance between the members of two subfamilies of a particular species lineage among all permutations of the species lineages for the member of one of the two subfamilies (PEAES-PDR; Fig. 2). Among the 296 sites in the CBP60-conserved domain alignment, 265 sites that had amino acid variation across 12 species in at least one of the CBP60a, CBP60g or SARD1 subfamilies were subjected to the following PEAES-PDR analysis (266 sites for the alternative set). To allow a species lineage-specific analysis, two hypotheses about the coevolutionary interactions among the three immune regulator subfamilies were derived from mechanistic hypotheses M1 and M2. If a site is not dissimilar between CBP60g and SARD1, CBP60a should have an amino acid similar to the amino acid in CBP60g or SARD1 to confer a protection effect (Hypothesis I1; ‘I’ for interactions). On the other hand, at a site where CBP60a cannot provide protection through similarity to CBP60g or SARD1 (e.g. because it would impair CBP60a function), CBP60g and SARD1 should be dissimilar to avoid both proteins being targeted by a single effector (Hypothesis I2). As we cannot know the null distribution of the PDR values at a particular site in a subfamily, we cannot test the hypotheses at the each-site level. Instead, we tested whether sites with particular PDR values are significantly enriched according to the hypotheses. To test Hypotheses I1 and I2, the PDR between CBP60g (or SARD1) and CBP60a was plotted against the PDR between CBP60g and SARD1 for all sites of a single species. The plot was divided into four quadrants at the median rank of 12 on both axes (Fig. 7a). Hypotheses I1 and I2 predict site enrichment in the third and first quadrants, respectively. Fig. 7(b) shows such plots of CBP60g:SARD1 PDR vs CBP60a:CBP60g PDR (CBP60g-common comparison), of SARD1:CBP60a PDR vs CBP60g:SARD1 PDR (SARD1-common comparison), and of CBP60a:CBP60g PDR vs SARD1:CBP60a PDR (CBP60a-common comparison) for three species, the last of which was included as a case with no particular expectation (see Fig. S8 for all species; and Fig. S9 for the alternative set). Then, we statistically tested whether the odds ratio of the site numbers for (first quadrant) × (third quadrant)/{(second quadrant) × (fourth quadrant)} was significantly higher than the null expectation. For the null distribution of the odds ratio, we randomly replaced each site in each subfamily with a site in the benchmark alignment that had the same number of amino acid variants across 12 species. The null distribution in each of the three comparisons was estimated from 5000 such randomly replaced sets (Fig. 7c) and used in a one-sided test.

Details are in the caption following the image
Nonrandom coevolutionary interactions that could protect the positive immune regulators are prevalent among the 12 core eudicot lineages. (a) Four quadrants in the SARD1:CBP60a pairwise distance rank (PDR) vs CBP60g:SARD1 PDR plot (SARD1-common comparison) as an example. The color codes for the first to fourth quadrants shown are used in Fig. 8. (b) Plots for CBP60g:SARD1 PDR vs CBP60a:CBP60g PDR (CBP60g-common comparison; cyan color), SARD1:CBP60a PDR vs CBP60g:SARD1 PDR (SARD1-common comparison; green color), and CBP60a:CBP60g PDR vs SARD1:CBP60a PDR (CBP60a-common comparison; salmon color) are shown for three plant species, poplar, soybean and peach. Similar plots for all 12 core eudicot species are shown in Supporting Information Fig. S8. The P-value (one-sided test with the null distribution in (c), Benjamini–Hochberg false discovery rate-corrected) is shown at the bottom right. When P < 0.05, the principal component 1 (PC1) axis is shown as a black solid line, indicating significant enrichment of sites in the first and third quadrants over the second and fourth quadrants. (c) Null distributions of the PDR odds ratios (blue curves) with color-coded vertical lines for the actual odds ratio for 12 species in each of three comparisons. (d) The corrected P-values for the comparisons are shown for the 12 core eudicot species. Yellow background, P < 0.05.

Significantly higher odds ratios (Benjamini–Hochberg FDR = 0.05) were observed in nine out of 12 species lineages in the CBP60g-common comparison and in eight out of 12 species lineages in the SARD1-common comparison (Fig. 7d). In the union of the CBP60g- and SARD1-common comparisons, every species lineage showed nonrandom coevolutionary interactions among the three immune regulator subfamilies, which is consistent with Hypotheses I1 and I2 (and consequently with Hypotheses M1 and M2). None of the 12 species lineages showed a significantly higher odds ratio in the CBP60a-common comparison (Fig. 7d), where no higher odds ratio was predicted by the hypotheses. The sites of CBP60g- and SARD1-common comparisons are shown along the CBP60-conserved domain in Fig. 8 (Fig. S10 for the alternative set), with the site-indicating bars color-coded according to the quadrants, as in Fig. 7(a). The distribution of the colors of site bars does not show strong similarity between more closely related species, consistent with our assumption that the species lineage divergences were sufficiently old and that observed selections at the sites are largely consequences of species lineage-specific selection after lineage divergence. The notion of species lineage-specific selection is further supported by the fact that the values obtained by projecting the centered data onto y = x in each plot in Fig. S8 show no obvious trend toward higher correlation between more closely related species pairs (Figs S11, S12 for the alternative set). This projection onto y = x signifies the enrichment in the first and third quadrants (Fig. S11a). We conclude that many sites in the CBP60-conserved domain of the three immune regulator subfamilies have probably been coevolving under strong and varying pressure from pathogen effectors in a species lineage-specific manner, conferring a high degree of resilience on the CBP60 immune regulatory module.

Details are in the caption following the image
Site selections are largely specific to each of the core eudicot lineages. The sites in CBP60g-common and SARD1-common comparisons in each plant species (Supporting Information Fig. S8a,b) are shown along the CBP60-conserved domain with the color-scaled quadrant information (Fig. 7a): blue, quadrant 1; yellow, quadrant 2; red, quadrant 3; magenta, quadrant 4; the closer a site is to the median rank point, (12, 12), the fainter the color. The gray bar directly above the amino acid position scale at the bottom shows the position of the VdSCP41-binding region (Qin et al., 2018) within the CBP60-conserved domain. AA, amino acid.

Simulation of protein sequence evolution confirmed that PEAES-PDR can detect coevolution based on Hypotheses I1 and I2

As PEAES-PDR is a new approach to statistically detect coevolutionary interactions in protein sequence evolution, we performed simulation analysis for its validation. We simulated evolution of CBP60g, SARD1 and CBP60a protein sequences along the 12 species phylogeny with different levels of coevolutionary selection: selection on CBP60g and SARD1 sequences was according to Hypothesis I2 and selection on CBP60a sequences was according to Hypothesis I1. The simulated sequences for 12 species were analyzed for the PEAES-PDR odds ratio, and the distribution of the odds ratios from 100 rounds of simulation per parameter value set was examined.

For a protein sequence, it is not realistic to assume that any of 21 amino acids (including a gap) can occur at a site. We used two different amino acid variation sets that specify possible amino acids at a particular site among the 265 polymorphic sites for each subfamily: a site randomly selected from the ‘Benchmark’ protein sequence alignments while conserving the amino acid variant number per site across 12 species; or the corresponding site in all angiosperm members of each subfamily (‘Angio CBP60’; ‘CBP60a.Angio.align.fas’, ‘CBP60g.Angio.align.fas’, and ‘SARD1.Angio.align.fas’ for the sequence alignments are in Dataset S1), which may represent most of possible amino acids at each site in each subfamily.

The strength of coevolutionary selection vs relatively neutral independent selection was tuned by the CBP60g, SARD1 and CBP60a threshold parameters (value range [0,1]). Smaller CBP60g, smaller SARD1 or larger CBP60a parameter values represent stronger coevolutionary selection on CBP60g, SARD1 or CBP60a sequences, respectively. The CBP60g, SARD1 and CBP60a threshold parameter values of 1, 1 and 0, respectively, represent no coevolutionary selection.

The distributions of the odds ratios together with the amino acid variation set and a few representative parameter value sets are shown in Fig. 9 (see Notes S3 for the results of the entire 48 parameter value sets). The ‘Random’ distributions were sampled from the null odds ratio distributions used in analysis of the actual CBP60g, SARD1 and CBP60a sequences (Fig. 7c). With no coevolutionary selection (CBP60g, SARD1 and CBP60a threshold values of 1, 1 and 0), the odds ratio distributions with the ‘Benchmark’ variation set were essentially the same as the ‘Random’ distributions, indicating that the ‘Benchmark’ variation set with no coevolutionary selection serves as the negative control. When strong coevolutionary selection was applied with lower CBP60g and SARD1 threshold parameter values (threshold values: 0, 0.1 and 0.5 or 0.1, 0 and 0.5), the odds ratios with the CBP60g- and SARD1-common comparisons were higher than those with ‘Random’, indicating that the PEAES-PDR analysis can detect a signature of such coevolutionary selection.

Details are in the caption following the image
The pairwise distance rank (PDR) odds ratios for CBP60g- and SARD1-common comparisons increase when protein sequence evolution was simulated according to Hypotheses I1 and I2. Evolution of 265 amino acid sites for CBP60g, SARD1 and CBP60a were simulated according to the 12-species phylogeny with different simulation parameter values (left table), and the simulated 12-species sequences were subjected to Protein Evolution Analysis in a Euclidean Space (PEAES)-PDR analysis. The distributions of the simulated odds ratios (100 simulations × 12 species = 1200 odds ratios per box-and-whiskers) for three comparisons are shown in the box plot. Two amino acid variation sets based on the ‘Benchmark’ protein sequence alignment or members of each subfamily in angiosperm species (‘Angio CBP60’) were used in the simulation. The CBP60g and SARD1 threshold parameters can take values from 0 to 1 : 1 represents no coevolutionary selection and 0 represents strong selection according to Hypothesis I2. The CBP60a threshold parameter can take values from 0 to 1 : 0 represents no coevolutionary selection and 1 represents strong selection according to Hypothesis I1. The parameter value ‘Random’ represents 265 sites randomly sampled from the Benchmark amino acid variation set for each of 12 species for each subfamily (i.e. no simulation). The ‘Random’ distributions are the same as the null odds ratio distributions in Fig. 7(c), except that the sample size is 1200 in this figure. The median values of the ‘Random’ cases are shown as vertical lines with color-coding to aid comparisons. In the box plot, the left and right edges of a box show the lower and upper quartiles, the thick horizontal line in the box shows the median, and the left and right ends of whiskers show the minimum and maximum, respectively. The odds ratio in the x-axis is log-scaled to make the distributions closer to symmetric.

With the ‘Angio CBP60’ variation set, the odds ratios with the CBP60g- and SARD1-common comparisons were higher than the ‘Random’ distributions even without coevolutionary selection (threshold values: 1, 1 and 0), suggesting that amino acid variations have already been selected according to Hypotheses I1 and I2. When strong coevolutionary selection was applied with lower CBP60g and SARD1 threshold parameter values (threshold values: 0, 0.1 and 0.5, or 0.1, 0 and 0.5), the odds ratios with the CBP60g- and SARD1-common comparisons substantially increased, further confirming that the PEAES-PDR analysis can detect a signature of coevolutionary selection according to Hypothesis I2.

Although the effects of the CBP60a threshold parameter on CBP60g- and SARD1-common comparisons were relatively small, they were detectable when the CBP60a threshold value was increased from 0 to 0.5, with either variation set (Notes S3). Thus, the PEAES-PDR analysis can also detect a signature of coevolutionary selection according to Hypothesis I1.

Discussion

One type of protein family evolution analysis focuses on the emergence of conserved domain structures (Domazet-Lošo & Tautz, 2010). However, emergence of particular conserved domains may not provide information about evolution of particular biological processes and functions underlying them as some domains, such as DNA-binding domains, have versatile molecular functions that can be easily repurposed for different biological processes (e.g. Gordân et al., 2011). As the CBP60-conserved domains of AtCBP60g and AtSARD1 have sequence-specific DNA-binding activity (Zhang et al., 2010; Qin et al., 2018), CBP60 proteins are probably DNA-binding transcription factors. A DNA-binding domain structure provides a backbone structure for physical interactions with DNA, and its binding specificity could vary in different proteins. It is possible that diversification of the immune-related clade from the prototypical group coincided with a substantial DNA-binding specificity change and consequently resulted in a large change in the gene set regulated by CBP60 transcription factors; that is, immune-related CBP60 transcription factors have probably been neofunctionalized for a different biological process. Thus, tracking neofunctionalization of particular subfamilies that are important for biological processes of interest within a larger protein family defined by the conserved domain structure provides insights into the evolution of biological processes.

AtCBP60a, AtCBP60g and AtSARD1 belong to the immune-related clade and regulate immune responses, including SA signaling. The fact that nonseed land plants, namely liverworts, mosses, lycophytes and ferns, do not have a member of the immune-related clade strongly suggests that SA signaling and SA-dependent immunity are regulated differently between angiosperms and nonseed plants. As SA is a major immune hormone in angiosperms (Vlot et al., 2009), the organization of immune signaling networks could be very different between angiosperms and nonseed plants. However, this notion of different network organization does not exclude the possibility that angiosperms and nonseed plants share some elemental immune signaling machineries (de Vries et al., 2018). Instead, the notion implies that some properties of the immune signaling networks, such as network resilience, are very different between angiosperms and nonseed plants.

We discovered significant coevolutionary interactions at multiple sites in the CBP60-conserved domain across the CBP60a, CBP60g and SARD1 immune regulator subfamilies in separate core eudicot species lineages, using PEAES-PDR (Figs 7, 8, S8–S12). These observations indicate that although the three immune regulator subfamilies diversified around the time of angiosperm divergence, the immune regulator subfamilies have been influencing fast evolution of one another in each plant species lineage. These coevolutionary interactions appear to have been maintained through the history of selection in the pathogen effector landscape that has been changing rapidly in a manner specific to plant species lineages.

Based on genome-wide analysis in yeast, Kuzmin et al. (2020) quantitatively generalized the evolutionary fate of duplicated genes: high functional entanglement (˜ overlapping functions) between duplicated genes leads to loss of one of the genes; intermediate entanglement leads to small divergence between duplicated genes; low entanglement leads to large divergence. This rule for the fate of duplicated genes does not apply to CBP60g and SARD1 as they have intermediate functional entanglement (Zhang et al., 2010; Wang et al., 2011) but are highly divergent. Our simulation analysis demonstrated that this discrepancy can be explained by strong diversifying selection (Hypothesis I2), probably imposed by pathogen effectors. The third gene with the opposing function, CBP60a, coevolutionarily interacts and further complicates evolution of the triplicated genes. Thus, our discovery presents an uncommon pattern of gene evolution after duplication (or triplication), probably as a result of strong and varying, lineage-specific selection.

The observed coevolutionary interactions are consistent with our hypotheses of coevolution mechanisms that promote functional resilience of the positive immune regulators, CBP60g and SARD1, against pathogen effectors. Hypothesis M1 states that coevolution occurs via selection for amino acid dissimilarity between functionally overlapping, positive immune regulators to prevent simultaneous targeting by a single effector. Hypothesis M2 states that coevolution occurs via selection for amino acid similarity between functionally opposing, positive and negative immune regulators to ensure simultaneous targeting by a single effector. There are other important immune signaling network components in Arabidopsis that contain multiple regulator subfamilies within larger protein families, such as AtMPK3 and AtMPK6 (Xu et al., 2014), AtEDS1, AtPAD4, and AtSAG101 (Feys et al., 2005), and AtNPR1, AtNPR3 and AtNPR4 (Fu et al., 2012; Ding et al., 2018). It will be interesting to apply the PEAES-PDR approach to these protein subfamilies to test whether their evolutionary patterns are also consistent with these hypotheses about coevolution mechanisms that lead to increased resilience of the regulatory modules against pathogen effectors.

The results of the PEAES-PDR analysis can also generate specific, mechanistic hypotheses at the molecular level. First, Qin et al. (2018) reported that AtCBP60g is targeted by the fungal effector VdSCP41. Although AtSARD1 can coimmunoprecipitate VdSCP41 after overexpression in Arabidopsis protoplasts, this AtSARD1–VdSCP41 interaction appears to be much weaker than the AtCBP60g–VdSCP41 interaction (Figs 3a, 4c, and fig. 3 and supplementary fig. 2A in Qin et al., 2018). These observations may suggest that AtSARD1 mostly evades being targeted by VdSCP41. VdSCP41 binds a C-terminal region of AtCBP60g (corresponding to the region C-terminal to site 130 of the CBP60-conserved domain, shown by a gray bar in Fig. 8) (Qin et al., 2018). Our PEAES-PDR analysis detected multiple AtCBP60g sites C-terminal to site 130 potentially protected by dissimilarity between AtCBP60g and AtSARD1 (blue and magenta color bars on the line for Arabidopsis; CBP60g in Fig. 8). If VdSCP41 or a very similar effector is a major driver for the dissimilarity between AtCBP60g and AtSARD1, substituting the amino acids at these sites in AtCBP60g with those at the sites in AtSARD1 may substantially weaken the interaction between AtCBP60g and VdSCP41.

Second, the sites indicated by yellow-colored bars in Fig. 8 (i.e. sites in the second quadrant in Fig. 7a) are those that are not protected either by dissimilarity between two positive regulators or by similarity between a positive regulator and the negative regulator. In some cases, these coevolutionarily unprotected sites coincide in both CBP60g- and SARD1-common comparisons in a single species. One possibility is that such sites may not have been targeted by any effectors in a recent lineage history, and hence there is no signature of protective selection. Another possibility is that they may be protected by some other molecular mechanisms, such as immunity mediated by specific resistance (R) proteins (Khan et al., 2016). A guard-type R protein detects some biochemical change caused by a pathogen effector in the cognate host protein and triggers immune responses. If there is protection by an R protein, keeping the amino acids at the coevolutionarily unprotected sites that are similar between CBP60 and SARD1 may be important for function of the R protein-based protection mechanism. For example, the Arabidopsis R protein SNC1 may guard AtCBP60g and AtSARD1 (Sun et al., 2018) and provide an opportunity to test this hypothesis. Furthermore, identification of proteins that interact with CBP60g and SARD1 whose interactions are sensitive to substitutions of the amino acids at the coevolutionarily unprotected sites with those at the same sites in other species may lead to the discovery of such alternative protection mechanisms.

We demonstrated that coevolutionary interactions across the CBP60 immune regulator subfamilies of overlapping or opposing functions are consistent with the mechanistic hypotheses for increased resilience of this particular regulatory module in the immune signaling network. It is conceivable that this concept can be generalized among other immune signaling network components. Some pathogen effectors target multiple components of the immune signaling network (Toruño et al., 2016), suggesting the existence of many other multiple-targeting effectors. At the same time, immune signaling network components are highly interconnected (Sato et al., 2010), and it is common for them to have overlapping or opposing effects on immunity. These two conditions, multiply targeting pathogen effectors and potential functional interactions among the effector targets, are exactly the conditions that appear to have been driving coevolution for increased resilience of the CBP60 immune regulator module. We developed a PEAES-PDR approach, which was powerful in detecting this type of coevolutionary interaction among alignable sequences and in generating specific and mechanistic hypotheses. Expanding coevolutionary interaction investigations to unalignable sequences of multiply targeted proteins will further advance our understanding of how the immune signaling network has been evolving to maintain or improve resilience in a strongly selecting and rapidly changing pathogenic landscape. This will be an exciting future direction in the study of biological network evolution.

Notes added

While this work was under review, a report suggesting a functional role of AtCBP60b in immunity was published (Huang et al., 2021). The function was specific to AtCBP60b but not to AtCBP60c or AtCBP60d. The diversification among AtCBP60b, AtCBP60c and AtCBP60d occurred after the Brassicales order divergence (Fig S2 and ‘CBP60.final.tree.nwk’ in Dataset S1). Therefore, with the evolutionary timescale of our interest (divergence of the 12 core eudicot species), acquisition of an AtCBP60b-specific function appears to be a consequence of a recent adaptation. Thus, this report of the AtCBP60b role does not affect the major conclusions of this work.

Acknowledgements

We thank Gane Ka-Shu Wong and Eric Carpenter for allowing us access to the 1000 Plants transcriptomic sequence database and Fay-Wei Li for allowing us access to the Waterfern and Floating fern genome sequences, before their publications. We also thank Yaniv Brandvain, Ya Yang, and Emma Goldberg for technical advice and discussion and Jane Glazebrook for editing of the manuscript. QZ was supported by a scholarship from China Scholarship Council and funds from the Zhiming Zhang Laboratory. This work was supported by National Science Foundation (grant nos. MCB-1518058 and IOS-1645460 to FK).

    Author contributions

    FK conceived the project. QZ assembled the protein sequence database. QZ, KM and FK analyzed the data. KM and FK wrote the manuscript.

    Data availability

    No new data were generated in support of this research. The data used in the study are publicly available, and their sources are described in the Materials and Methods section and in Table S1.