Genome-wide analysis of MIKC-type MADS-box genes in wheat: pervasive duplications, functional conservation and putative neofunctionalization
Summary
- Wheat (Triticum aestivum) is one of the most important crops worldwide. Given a growing global population coupled with increasingly challenging cultivation conditions, facilitating wheat breeding by fine-tuning important traits is of great importance. MADS-box genes are prime candidates for this, as they are involved in virtually all aspects of plant development.
- Here, we present a detailed overview of phylogeny and expression of 201 wheat MIKC-type MADS-box genes. Homoeolog retention is significantly above the average genome-wide retention rate for wheat genes, indicating that many MIKC-type homoeologs are functionally important and not redundant. Gene expression is generally in agreement with the expected subfamily-specific expression pattern, indicating broad conservation of function of MIKC-type genes during wheat evolution.
- We also found extensive expansion of some MIKC-type subfamilies, especially those potentially involved in adaptation to different environmental conditions like flowering time genes. Duplications are especially prominent in distal telomeric regions. A number of MIKC-type genes show novel expression patterns and respond, for example, to biotic stress, pointing towards neofunctionalization.
- We speculate that conserved, duplicated and neofunctionalized MIKC-type genes may have played an important role in the adaptation of wheat to a diversity of conditions, hence contributing to the importance of wheat as a global staple food.
Introduction
Bread wheat (Triticum aestivum) is one of the most important crops worldwide, contributing a significant amount of calories and proteins to the global human diet (Veraverbeke & Delcour, 2002; Shiferaw et al., 2013). Bread wheat is hexaploid and was first domesticated some 8000–25 000 yr ago in the region we now call the Middle East (Smith & Nesbitt, 1995; Allaby et al., 2017). Wheat originated from three diploid progenitor species: Triticum urartu (A-genome donor), an Aegilops speltoides-related grass (B-genome donor) and Aegilops tauschii (D-genome donor) (Shewry, 2009). Because of its hexaploidy and an abundance of repetitive and transposable elements, bread wheat has one of the largest crop plant genomes (c. 16 Gbp), making it challenging to work with from a genetics, genomics and breeding perspective (Borrill et al., 2015). However, recent advances in sequencing technology have led to a high-quality genome assembly and annotation by the International Wheat Genome Sequencing Consortium (IWGSC, 2018). Further, large-scale RNA-seq analyses provided insights into expression patterns of homoeologous genes in different developmental stages and under a variety of stress conditions, building a rich resource for more detailed analyses (Ramírez-González et al., 2018).
Transcription factors (TFs) are a major driver in evolution as well as in domestication and bear the potential for crop improvement and trait fine-tuning (Martínez-Ainsworth & Tenaillon, 2016). MADS-box genes constitute one of the largest families of plant TFs (Riechmann et al., 2000). They can be divided into two phylogenetically distinct groups: type I and type II (Alvarez-Buylla et al., 2000). Although the function of most type I MADS-box genes remains to be illuminated, several type II genes are key domestication genes in different eudicot and monocot crops (reviewed in Schilling et al., 2018). Plant type II MADS-domain proteins possess a typical domain structure, which is composed of the MADS, I, K and C-terminal domains (Kaufmann et al., 2005). The MADS domain enables the DNA binding, nuclear localization and dimerization of the TF (Riechmann et al., 1996; Immink et al., 2002), while the I and K domains facilitate dimerization and higher-order complex formation of two or more MADS-domain proteins (Yang & Jack, 2004; Melzer et al., 2008; Theißen et al., 2016). The C-terminal domain allows for transcriptional activation of some MADS-domain proteins (Honma & Goto, 2001). Because of this characteristic domain structure, type II genes are also referred to as MIKC-type MADS-box genes (Kaufmann et al., 2005). MIKC-type MADS-box genes are involved in virtually all aspects of plant development, including root, flower, seed and embryo development (Gramzow & Theissen, 2010). They have also been reported to be involved in different stress responses (Arora et al., 2007; Jia et al., 2018; Wei et al., 2018). Thus, understanding MIKC-type MADS-box genes is important for understanding plant development, which is in turn crucial for plant breeding and crop improvement (Boden & Østergaard, 2019).
MIKC-type MADS-box genes have been phylogenetically and functionally characterized in a variety of model systems (Arabidopsis (Arabidopsis thaliana) and Brachypodium distachyon (Parenicova et al., 2003; Wei et al., 2014)) as well as important crop plants (banana, rice, brassica, cotton; Arora et al., 2007; Duan et al., 2015; Liu et al., 2017; Nardeli et al., 2018). Individual wheat MIKC-type MADS-box genes have also been studied for almost two decades. The most prominent example is probably VERNALIZATION1 (VRN1), an APETALA1 (AP1)-like key regulator of flowering time as well as floral meristem determination (Murai et al., 2003; Trevaskis et al., 2003; Yan et al., 2003; Harris et al., 2017; Li et al., 2019) and one of the most important loci that distinguishes spring from winter wheat varieties (Yan et al., 2003). Other wheat MADS-box genes have been implicated in the control of flowering time, ovule development and pistilloidy (Meguro et al., 2003; X. Y. Zhao et al., 2006; Yamada et al., 2009; Yang et al., 2017). Although overviews on MADS-box genes in wheat are available (T. Zhao et al., 2006; Paolacci et al., 2007; Ma et al., 2017), a detailed genome-wide phylogenetic and functional characterization of wheat MIKC-type MADS-box genes is still missing.
To better understand the dynamics of MIKC-type gene evolution in wheat and to facilitate future research on this important TF family, we provide a detailed overview of the number, phylogeny and expression of MIKC-type MADS-box genes in the recently released genome of T. aestivum (IWGSC, 2018). We found a number of wheat MIKC-type subfamilies to be significantly larger than expected and hypothesize that extensive sub- and neofunctionalization in those subfamilies contributed to the global distribution of wheat.
Materials and Methods
Sequence search and annotation of MIKC-type MADS-box genes
Wheat coding sequence (CDS) predictions, refmap comparison between IWGSC and The Genome Analysis Centre (TGAC) and functional annotations of both high and low confidence (HC and LC) wheat genes were downloaded from the IWGSC archive v.1.0 (https://urgi.versailles.inra.fr/download/iwgsc/IWGSC_RefSeq_Annotations/v1.0/) (IWGSC, 2018) and the CSS–TGAC comparison was downloaded from https://opendata.earlham.ac.uk/opendata/data/Triticum_aestivum/TGAC/v1/annotation/ (Clavijo et al., 2017). Functional annotations were filtered for Protein family database (Pfam) identifiers of the MADS and K domains (PF00319 and PF01486), respectively (El-Gebali et al., 2018). A total of 439 sequences were identified (see a list of all gene IDs in Supporting Information Table S1). Of these, 188 sequences had a MADS box and a K box (181 HC plus seven LC), 240 sequences (159 HC plus 71 LC) had only a MADS box, and 21 had only a K box (16 HC plus five LC) (Table S1). Splice variants were excluded and only the first variant was kept for further analysis, with three exceptions (see Table S2).
All CDSs were translated into amino acid sequences and aligned with all MADS-domain protein sequences of rice (Oryza sativa (Arora et al., 2007)) with MAFFT (L-INS-i algorithm) (Katoh & Standley, 2013; Katoh et al., 2017) using only the MADS domain of each sequence. Subsequently, a phylogeny was generated using Iq-Tree (Nguyen et al., 2014) and ModelFinder (Kalyaanamoorthy et al., 2017). This allowed type I and type II (MIKC-type) MADS-domain proteins to be distinguished (Table S1). All type I MADS-box CDSs were excluded from subsequent analyses.
We evaluated the predicted gene structure of all genes, assuming a canonical M-I-K-C domain structure. In cases where either MADS or K boxes were absent from gene predictions, we compared the sequences with closely related rice and Arabidopsis genes and TGAC gene predictions, and screened the surrounding genomic regions using the NCBI conserved domain database (Marchler-Bauer & Bryant, 2004; Marchler-Bauer et al., 2010, 2014, 2016). In 10 cases, gene prediction was repeated with FGENESH+ or FGENESH_C (Solovyev et al., 2006) using rice MADS-box genes of the same clade (Table S3). In one case (TaAGL17-A2-1, TraesCSU01G209900), the TGAC CDS was used instead of IWGSC prediction for the subsequent phylogeny, because it encoded for a canonical MIKC structure, as compared with the IWGSC prediction, which comprised only a MADS box. This approach yielded 193 wheat MIKC-type MADS-box sequences.
In parallel, a Blast search was carried out by which we identified eight additional MIKC-type genes (Table S3) (https://urgi.versailles.inra.fr/blast_iwgsc/) (Alaux et al., 2018; IWGSC, 2018).
Altogether, a total of 201 wheat MIKC-type MADS-box genes were identified (Tables 1, S2).
MIKC-type subclade | Wheat triad | Not categorizeda | Rice orthologs | n : n : n | Gene number | Gene location | Named by | Alternative gene names from previous publicationsf | Reference for alternative name | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Total | Full length | Mb | Kc | Chrd | Genomese | ||||||||
MIKC* | TaMIKC-1 | OsMADS68 | 1 : 1 : 1 | 3 | 3 | 0 | 0 | 4 | ABD | This study | – | – | |
TaMIKC-2 | OsMADS63/OsMADS62 | 1 : 1 : 1 | 3 | 3 | 0 | 0 | 4 | ABD | This study | – | – | ||
OsMADS32 | TaMADS32 | OsMADS32 | 1 : 1 : 1 | 3 | 3 | 0 | 0 | 3 | ABD | This study | TaWM16 | Paolacci et al. (2007) | |
AGL12 | TaAGL12-1 | OsMADS26 | 1 : 1 : 1 | 3 | 3 | 0 | 0 | 7 | ABD | Paolacci et al. (2007) | TaAGL32 | T. Zhao et al. (2006) | |
TaAGL12-2 | OsMADS33 | 1 : 1 : 1 | 3 | 3 | 0 | 0 | 2 | ABD | This study | – | – | ||
AP1 (SQUA) | TaAP1-1 | OsMADS14 | 1 : 1 : 1 | 3 | 3 | 0 | 0 | 5 | ABD | Paolacci et al. (2007) | VRN1; TaVRT-1 | Danyluk et al. (2003); Yan et al. (2003); Fu et al. (2005) | |
TaAP1-2 | OsMADS18/OsMADS20 | 1 : 1 : 1 | 3 | 3 | 0 | 0 | 2 | ABD | Paolacci et al. (2007) | – | – | ||
TaAP1-3 | OsMADS15 | 1 : 1 : 1 | 3 | 3 | 0 | 0 | 2 | ABD | Paolacci et al. (2007) | FUL2 | Li et al. (2019) | ||
SVP (StMADS11) | TaSVP-1 | OsMADS22 | 1 : 1 : 1 | 3 | 3 | 0 | 0 | 6 | ABD | Paolacci et al. (2007) | – | – | |
TaSVP-2 | OsMADS55 | 1 : 1 : 1 | 3 | 3 | 0 | 0 | 7 | ABD | Paolacci et al. (2007) | TaVRT-2 | Kane et al. (2005) | ||
TaSVP-3 | OsMADS47 | 1 : 1 : 1 | 3 | 2 | 0 | 1 | 4 | (A)BD | Paolacci et al. (2007) | – | – | ||
SEP1 | TaSEP1-1 | OsMADS1 | 1 : 1 : 1 | 3 | 2 | 0 | 1 | 4 | (A)BD | This study | TaSEP-1 | Paolacci et al. (2007) | |
TaSEP1-2 | 1 : 1 : 1 | 3 | 2 | 1 | 0 | 4 | (A)BD | This study | WLHS1; TaSEP-2 | Paolacci et al. (2007); Shitsukawa et al. (2007a) | |||
TaSEP1-3 | 2 : 1 : 1 | 4 | 4 | 0 | 0 | 4 | AABD | This study | – | – | |||
TaSEP1-4 | OsMADS5 | 1 : 1 : 1 | 3 | 3 | 0 | 0 | 7 | ABD | This study | TaSEP-6 | Paolacci et al. (2007) | ||
TaSEP1-5 | 1 : 2 : 1 | 4 | 4 | 0 | 0 | 7 | ABBD | This study | – | – | |||
TaSEP1-6 | OsMADS34 | 1 : 1 : 1 | 3 | 3 | 0 | 0 | 5 | ABD | This study | TaSEP-5 | Paolacci et al. (2007) | ||
SEP3 | TaSEP3-1 | OsMADS7/45 | 1 : 1 : 1 | 3 | 3 | 0 | 0 | 7 | ABD | This study | WSEP; TaSEP-4 | Shitsukawa et al. (2007a); Paolacci et al. (2007) | |
TaSEP3-2 | OsMADS8/24 | 1 : 1 : 1 | 3 | 3 | 0 | 0 | 5 | ABD | This study | TaMADS1; TaSEP-3 | X. Y. Zhao et al. (2006); Paolacci et al. (2007) | ||
AGL6 | TaAGL6 | OsMADS6/OsMADS17 | 1 : 1 : 1 | 3 | 3 | 0 | 0 | 6 | ABD | This study | – | – | |
AG/STK | TaAG-1 | OsMADS58/OsMADS66 | 1 : 1 : 1 | 3 | 3 | 0 | 0 | 1 | ABD | Paolacci et al. (2007) | WAG-1 | Hirabayashi & Murai (2009) | |
TaAG-2 | OsMADS3 | 1 : 1 : 1 | 3 | 3 | 0 | 0 | 3 | ABD | Paolacci et al. (2007) | WAG-2 | Hirabayashi & Murai (2009) | ||
TaSTK-1 | OsMADS13 | 1 : 1 : 1 | 3 | 3 | 0 | 0 | 5 | ABD | This study | TaAG-3; WSTK | Paolacci et al. (2007); Yamada et al. (2009) | ||
TaSTK-2 | OsMADS21 | 1 : 1 : 1 | 3 | 3 | 0 | 0 | 1 | ABD | This study | TaAG-4 | Paolacci et al. (2007) | ||
SOC1 (TM3) | TaSOC1-1 | OsMADS56 | 1 : 1 : 1 | 3 | 3 | 0 | 0 | 1 | ABD | Paolacci et al. (2007) | – | – | |
TaSOC1-5 | 1 : 1 : 1 | 3 | 3 | 0 | 0 | 1 | ABD | This study | – | – | |||
TaSOC1-3 | OsMADS50 | 1 : 1 : 1 | 3 | 3 | 0 | 0 | 4(5) | ABD | Paolacci et al. (2007) | WSOC1 | Shitsukawa et al. (2007b) | ||
TaSOC1-4 | 1 : 1 : 1 | 3 | 1 | 2 | 0 | 7(4) | A(BD) | This study | – | – | |||
TaSOC1-2 | – | 1 : 1 : 1 | 3 | 3 | 0 | 0 | 6 | ABD | Paolacci et al. (2007) | – | – | ||
AP3 (DEF) | TaAP3-1 | OsMADS16 | 1 : 1 : 1 | 3 | 3 | 0 | 0 | 7 | ABD | This study | WAP3 | Hama et al. (2004) | |
TaAP3-2 | 1 : 1 : 1 | 3 | 3 | 0 | 0 | 6 | ABD | This study | – | – | |||
PI (GLO) | TaPI-1 | OsMADS4 | 1 : 1 : 1 | 3 | 3 | 0 | 0 | 1 | ABD | This study | WPI-1 | Hama et al. (2004) | |
TaPI-2 | OsMADS2 | 1 : 1 : 1 | 3 | 3 | 0 | 0 | 3 | ABD | This study | WPI-2 | Hama et al. (2004) | ||
ABS (GGM13) | TaBS-1 | OsMADS29 | 1 : 1 : 1 | 3 | 3 | 0 | 0 | 6 | ABD | This study | TaGGM13; WBSIS | Paolacci et al. (2007); Yamada et al. (2009) | |
TaBS-2 | OsMADS31 | 1 : 1 : 1 | 3 | 3 | 0 | 0 | 2 | ABD | This study | – | – | ||
TaBS-3 | OsMADS30 | 1 : 1 : 1 | 3 | 3 | 0 | 0 | 7 | ABD | This study | – | – | ||
TaBS-4 | 2 : 1 : 2 | 5 | 5 | 0 | 0 | 3 | AABDD | This study | – | – | |||
TaBS-5 | # | – | 6 | 2 | 0 | 4 | x | A(AAAA)D | This study | – | – | ||
TaBS-6 | # | – | 4 | 0 | 4 | 0 | 2(6) | (AABB) | This study | – | – | ||
TaBS-7 | # | – | 3 | 2 | 1 | 0 | 1 | (B)DD | This study | – | – | ||
TaBS-8 | # | – | 3 | 0 | 3 | 0 | x | (ADD) | This study | – | – | ||
TaBS-9 | # | 1 : 1 : 1 | 3 | 1 | 2 | 0 | 7(3) | (AB)D | This study | – | – | ||
AGL17 | TaAGL17-1 | OsMADS27/OsMADS61 | 1 : 1 : 1 | 3 | 3 | 0 | 0 | 2 | ABD | Paolacci et al. (2007) | – | – | |
TaAGL17-2 | OsMADS25 | – | 5 | 2 | 2 | 1 | 5 | (AA)BD(U) | Paolacci et al. (2007) | – | – | ||
TaAGL17-4 | 1 : 1 : 1 | 3 | 3 | 0 | 0 | 7 | ABD | This study | – | – | |||
TaAGL17-5 | 2 : 1 : 2 | 5 | 5 | 0 | 0 | 7 | AABDD | This study | – | – | |||
TaAGL17-3 | OsMADS57/OsMADS23 | 1 : 1 : 1 | 3 | 3 | 0 | 0 | 6 | ABD | Paolacci et al. (2007) | – | – | ||
TaAGL17-6 | # | OsMADS59 | – | 6 | 3 | 3 | 0 | 7 | (B)BBU(UU) | This study | – | – | |
TaAGL17-7 | 0 : 2 : 1 | 3 | 2 | 1 | 0 | 1 | (B)BD | This study | – | – | |||
TaAGL17-8 | 1 : 1 : 1 | 3 | 3 | 0 | 0 | 1 | ABD | This study | – | – | |||
TaAGL17-9 | 1 : 1 : 1 | 3 | 3 | 0 | 0 | 1 | ABD | This study | – | – | |||
TaAGL17-10 | – | – | 3 | 2 | 1 | 0 | 5 | (B)DU | This study | – | – | ||
TaAGL17-11 | # | – | 10 | 4 | 6 | 0 | 7(5) | (AA)AAB(B)D(DDD) | This study | – | – | ||
FLC | TaFLC-1 | OsMADS37 | 1 : 1 : 3 | 5 | 3 | 2 | 0 | 7(3) | ABD(DD) | This study | TaAGL12 | Ruelens et al. (2013) | |
TaFLC-2 | OsMADS51 | 1 : 1 : 1 | 3 | 3 | 0 | 0 | 4(5) | ABD | This study | – | – | ||
TaFLC-3 | 1 : 1 : 1 | 3 | 3 | 0 | 0 | 3 | ABD | This study | – | – | |||
TaFLC-4 | 2 : 1 : 2 | 4 | 3 | 1 | 0 | 3 | (A)ABD(D) | This study | TaAGL22, TaAGL33 | IWGSC (2018); Sharma et al. (2017) | |||
TaFLC-5 | 1 : 0 : 1 | 2 | 2 | 0 | 0 | 3 | AD | This study | – | – | |||
TaFLC-6 | # | 1 : 1 : 1 | 3 | 2 | 1 | 0 | 3 | A(B)D | This study | – | – | ||
Total count | 201 | 164 | 30 | 7 |
- A complete list of all wheat MIKC-type genes can be found in Supporting Information Table S2.
- a # indicates that the number of homeologs could not be determined owing to insufficient phylogenetic resolution, genes were not included into homoeolog count (Table 2). For details see Fig. S1.
- b Gene encodes for the MADS but not the K domain.
- c Gene encodes for the K but not the MADS domain.
- d Number in parentheses indicates a different chromosome for one of the genes; x, genes located on more than two different chromosomes.
- e Parentheses indicate truncated genes, encoding for either the MADS or the K domain.
- f For more gene names and NCBI accession numbers, see Table S2.
Maximum likelihood phylogeny of MIKC-type MADS-box genes
Based on the first phylogeny, MIKC-type sequences were sorted into the major grass MIKC-type subfamilies (Table S2) (Gramzow & Theissen, 2015). Afterwards, subfamily alignments of MIKC-type protein sequences were created using wheat, rice and Arabidopsis protein sequences (Parenicova et al., 2003; Arora et al., 2007; Verelst et al., 2007) using Mafft (E-INS-i algorithm) (Katoh & Standley, 2013; Katoh et al., 2017). Subfamily alignments were then merged using Mafft (E-INS-i algorithm) (Katoh & Standley, 2013; Katoh et al., 2017). The full-length alignment of all MIKC-type MADS-domain proteins was analyzed with Homo v.1.3 (Jermiin, 2017) to confirm that the sequences met the phylogenetic assumption of evolution under reversible conditions. Individual residues were subsequently masked with Alistat v.1.3 (Wong et al., 2014), leaving only sites with a completeness score (Cc, defined as the number of unambiguous characters in the column/number of sequences) > 0.5 (a total of 207 sites) (alignment in Notes S1).
Using the masked protein alignment, a phylogenetic tree was inferred under maximum likelihood with Iq-Tree (Nguyen et al., 2014). The substitution model was calculated with ModelFinder (integrated in Iq-Tree; best-fit model: JTT + R5 chosen according to the Bayesian information criterion) (Kalyaanamoorthy et al., 2017). Consistency of the phylogenetic estimate was evaluated with Ultrafast bootstraps as well as a Shimodaira–Hasegawa approximate likelihood ratio test (SH-aLRT) test (1000 replicates each) (Guindon et al., 2010; Minh et al., 2013; Hoang et al., 2018). The resulting tree file was visualized with Geneious v.11.1 (https://www.geneious.com) (Figs 1, S1) and FigTree v.1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/) (Fig. S2).

Naming of MIKC-type MADS-box genes
We suggest a consistent naming pattern for all MIKC-type wheat MADS-box genes, taking into account their subfamily association, phylogenetic relationships as well as their subgenome location (A, B or D). Each gene name starts with an abbreviation for the species name Triticum aestivum (Ta), followed by the name of the most prominent Arabidopsis gene from this subfamily (e.g. ‘SEP1’ for SEPALLATA1-like genes, ‘AGL6’ for AGL6-like genes, or ‘BS’ for Bsister genes). Exceptions are OsMADS32-like genes, which are not found in Arabidopsis and have been named after the rice gene. The gene names include an A, B or D, indicating the subgenome they are located in, for example TaAGL6-B1. Putative homoeologs have identical gene names except for the subgenome identifier (e.g. TaAGL6-A1, TaAGL6-B1, TaAGL6-D1). Genes belonging to one subfamily but different triads within the same genome were consecutively numbered (e.g. TaAGL12-A1 and TaAGL12-A2). Inparalogs (e.g. as a result of tandem duplications or transposition) were distinguished by consecutive numbers separated by a dash. Hence, the name of the gene with the ID TraesCS7B01G020900 is TaSEP1-B5-1 as it is a SEP1-like gene, and more precisely one of two inparalogs of the B genome (Fig. 1; Table 1; all gene names are listed in Table S2). In the case of SEP1- and SEP3-like subfamilies, our phylogenies did not provide clear orthologous relationships between SEP1-like and SEP3-like genes from wheat and Arabidopsis. The SEP1–SEP3 subfamilies have proved difficult to resolve in other large phylogeny reconstructions (Gramzow & Theissen, 2015), but synteny studies and dedicated SEP1–SEP3 phylogenies provide evidence for the SEP1–SEP3 distinction and for the orthologous relationships between rice and Arabidopsis genes (Zahn et al., 2005; Ruelens et al., 2013; Gramzow & Theissen, 2015; T. Zhao et al., 2017). As we did detect well-supported associations between previously identified rice SEP1-like (OsMADS1, OsMADS5 and OsMADS34) and SEP3-like (OsMADS7, OsMADS8) genes and putative wheat orthologs, we used those associations to identify wheat SEP1- and SEP3-like genes. We used a similar strategy to identify AG- and STK-like genes from wheat (Kramer et al., 2004; Dreni et al., 2013; Gramzow & Theissen, 2015).
Identification of homoeologs
Homoeologous genes were identified by phylogeny (Fig. 1). Separate subphylogenies were calculated for the four largest subfamilies in order to resolve the relationships between the genes more clearly (Fig. S1a–d). In some cases, where Ultrafast bootstraps and SH-aLRT were not high enough to support a clade (> 90 and 75, respectively), synteny and previous classifications were considered (Ramírez-González et al., 2018). Thirty-eight genes belonging to the FLC-, Bsister and AGL17-subclades could not be assigned to a category, because their homoeolog status could not definitely be determined (Tables 1, 2; Fig. S1).
Homoeologous group (A : B : D) | All wheat genesa | Wheat MIKC MADS (all) | MIKC MADS full-length only | ||||
---|---|---|---|---|---|---|---|
Number of groups | Number of genes | % of genesb | Number of groups | Number of genes | % of genesc | ||
1 : 1 : 1 | 35.8% | 42 | 126 | 62.7 | 40 | 120 | 73.2 |
n : 1 : 1/1 : n : 1/1 : 1 : nd | 5.7% | 4 | 17 | 8.5 | 2 | 8 | 4.9 |
1 : 1 : 0/1 : 0 : 1/0 : 1 : 1 | 13.2% | 1 | 2 | 1.0 | 6 | 12 | 7.3 |
Other ratiose | 8.0% | 4 | 18 | 9.0 | 3 | 10 | 6.1 |
Orphans/singletons | 37.1% | 0 | 0 | 0.0 | 1 | 1 | 0.6 |
Not categorizedf | – | – | 38 | 18.9 | – | 13 | 7.9 |
99.8% | 201 | 100.0 | 165 | 100.0 |
Plant cultivation, RNA isolation and RT-PCR
Triticum turgidum cv Kronos (tetraploid) and T. aestivum cv Cadenza (hexaploid) were used for reverse transcription polymerase chain reaction (RT-PCR) analysis. Plants were grown in a glasshouse with no additional lighting at an ambient temperature of 20–24°C until heading stage. RNA was extracted from the frozen flag leaf tip using the RNeasy Mini Kit (Qiagen, Hilden, Germany) according to the manufacturer's instructions. cDNA was generated with a Superscript IV Reverse Transcriptase Kit (Invitrogen/Thermo Fisher Scientific, Waltham, MA, USA) using an oligo dT primer. RT-PCR was carried out with a Phusion High-Fidelity DNA Polymerase (Thermo Scientific™, Thermo Scientific/Thermo Fisher Scientific, Waltham, MA, USA). Primer location and sequences are shown in Fig. S3 and Table S4, respectively.
Expression analysis of MIKC-type MADS-box genes using RNA-seq
RNA-seq data of 193 wheat MADS-box genes were downloaded from www.wheat-expression.com and http://bar.utoronto.ca/efp_wheat/ (Ramírez-González et al., 2018). For the remaining eight genes, which were identified by Blast, no expression data were available. Developmental stages refer to 70 tissues/time points from spring wheat cv Azhurnaya (Ramírez-González et al., 2018). Expression levels were downloaded from www.wheat-expression.com as log2(transcripts per million) (log2tpm) and a heatmap was generated with Morpheus (https://software.broadinstitute.org/morpheus/). Genes were clustered according to their expression using K-means (K-means = 15; metric, Euclidian; method, complete) (Fig. S4). All genes, modules and tissues are listed in Tables S5 and S6.
The triad expression analysis (Fig. S5) was carried out essentially as described previously (Ramírez-González et al., 2018). Expression data from spring wheat cv Azhurnaya were downloaded from www.wheat-expression.com as tpm for root, leave and shoot, spike and grain for all triads with a 1 : 1 : 1 ratio. Triads with a total expression below 0.5 tpm were excluded. Expression data were normalized within the triads and triads were assigned balanced, A/B/D suppressed or A/B/D dominant profiles according to Ramírez-González et al. (2018) (Table S7). Triangular plots were generated using the R package ggtern (Hamilton & Ferry, 2018).
Results
The wheat genome contains 201 MIKC-type MADS-box genes
A total of 439 coding sequences (including splice variants) were identified on the basis of the functional annotation (Pfam domains) in the recently released IWGSC wheat genome (Table S1) (IWGSC, 2018). This dataset was simplified by keeping only one splice variant from each genomic locus for further analyses (Table S2). MIKC-type MADS-box genes were differentiated from type I MADS-box genes using a phylogenetic approach (Table S1; see the Materials and Methods section for details). An additional eight MIKC-type genes were identified using Blast search. Altogether, we identified 201 MIKC-type (type II) MADS-box genes in wheat (Tables 1, S2). Because MIKC-type MADS-box gene nomenclature in wheat is currently not consistent, with several genes having several synonymous names, we renamed all genes according to their subfamily association (Tables 1, S2; see the Materials and Methods section for details).
The domain structure of MIKC-type MADS-domain proteins is crucial for their function, with MADS and K domains being indispensable for DNA binding and protein complex formation, respectively (Kaufmann et al., 2005), although genes encoding truncated proteins may act as dominant negative versions (Ferrario et al., 2004; Seo et al., 2012). A total of 164 out of 201 MIKC-type genes encoded for both domains, while 29 genes lacked a K box (14%) and seven lacked a MADS box (3%) (Tables 1, S2; Fig. 1). One gene encoded for a MADS and an SRP54 domain (TaBS.8A, TraesCS4A01G044400LC) (Fig. S1b; Table S2).
For a number of genes, the predicted gene structures were very long (30 kb and above; Table S2). We confirmed mRNA transcripts spanning over especially long introns (c. 29 kbp) for three genes from the FLC-clade using RT-PCR and Sanger sequencing (Fig. S3).
MIKC-type MADS-box genes belong to well-defined subfamilies
A maximum likelihood phylogenetic tree of all MIKC-type MADS-box genes from Arabidopsis thaliana, rice (Oryza sativa) and wheat shows that the wheat genome retained all 15 major grass MIKC-type MADS-box gene subfamilies: AP1, AP3, PISTILLATA (PI), AGAMOUS (AG)/SEEDSTICK (STK), AGAMOUS-LIKE6 (AGL6), AGL12, AGL17, BSISTER (BS), SUPPRESSOR OF OVEREXPRESSION OF CONSTANS1 (SOC1), SHORT VEGETATIVE PHASE (SVP), MIKC*, OsMADS32, FLOWERING LOCUS C (FLC), SEPALLATA1 (SEP1) and SEP3 (Figs 1, S2) (Gramzow & Theissen, 2015).
In many subclades, the gene phylogeny roughly followed species phylogeny, with the Arabidopsis genes displaying a sister-group relationship to the grass genes, and one or more rice MADS-box genes closely related to a triad of three wheat homoeologs (e.g. the SVP, AGL12, OsMADS32, MIKC*, AP1 and AG subclades; Fig. 1). The topology in other subclades is more complex, suggesting multiple duplication events, before and/or after polyploidization of wheat (e.g. SOC1 and SEP1 subclades; Fig. S1a). In particular, Bsister, AGL17 and FLC subclades are significantly expanded in wheat compared with Arabidopsis and rice (Figs 1, 2, S1b–d).

Wheat MIKC-type genes exhibit a high rate of homoeolog retention and gene duplication
In many flowering plants, the number of MIKC-type MADS-box genes is between 40 and 70 (Gramzow & Theißen, 2013). Rice and Arabidopsis, for example, despite their phylogenetic distance, have a similar number of MIKC-type MADS-box genes (43 and 45, respectively) (Parenicova et al., 2003; Arora et al., 2007; Verelst et al., 2007). With 201 genes, the total number of MIKC-type MADS-box genes in wheat is among the highest of hitherto characterized flowering plant species (Gramzow & Theißen, 2013; Vining et al., 2015; Nardeli et al., 2018). This is partly a result of the hexaploid nature of wheat. However, even when corrected for ploidy level, the number of MIKC-type MADS-box genes in bread wheat was significantly higher than in rice (c. 1.5-fold higher; χ2 test, P = 0.028; Fig. 2a–d). Even when truncated genes, lacking either a MADS box or a K box, were excluded, the gene number was still higher than in rice (164/3 = 54.7 > 43). This increase in number is mainly a result of the gene count in four subfamilies (SEP1-, AGL17-, FLC- and Bsister-like genes) that are significantly larger than expected (χ2 test, P = 0.002, P ≪ 0.001, P ≪ 0.001, P ≪ 0.001, respectively; Fig. 2a–d). The numbers of genes in the remaining subclades were not significantly different from the expected 3 : 1 ratio; and some were below the expected ratio (Fig. 2d). In some of the latter cases, wheat orthologs of rice genes could not be identified, indicating gene loss in the lineage leading to Triticum (e.g. AP1- and AGL6-like genes; Figs 1, 2a–d). However, as the support values for the individual branches are sometimes quite low, we also cannot exclude the possibility that some of the rice genes originated by duplication after the lineages leading to wheat and rice separated.
To better understand why MIKC-type MADS-box genes are so abundant in the wheat genome, we analyzed homoeologous groups in detail (Table 2). Approximately one-third of all wheat genes (i.e. all genes annotated in the current version of the wheat genome) are present in homoeologous groups of 3, also termed triads (1 : 1 : 1; 35.8% of genes) (IWGSC, 2018). By contrast, almost two-thirds of the 201 MIKC-type genes identified are present in triads (62.7%; Table 2). If only ‘full-length’ MADS-box genes (defined here as containing a MADS and K box) are considered, this ratio is even higher (73.2%; Table 2). Also, the percentage of MIKC-type genes with homoeolog-specific duplications is higher for MIKC-type genes as compared with all wheat genes (8.5% vs 5.7%; Table 2). Loss of one homoeolog, on the other hand, is less pronounced in MIKC-type genes (1% vs 13.2%; Table 2). Thirty-eight genes were not assigned to any of these groups (18.9%; 13% for full-length only), because the relationship of the genes could not be reliably resolved. Thus, the high homoeolog retention rate can partly explain the high number of wheat MIKC-type genes.
In addition to this, a variety of duplication patterns were observed in some subfamilies (Fig. S1; Notes S2). For example, the rice SEP1-like genes OsMADS1 and OsMADS5 are sister to 10 and seven wheat genes, respectively. In this case, the phylogenetic analysis suggests gene duplications in the lineage leading to Triticum but before the polyploidization of wheat (Fig. S1a). The Bsister gene OsMADS30 from rice is sister to 27 wheat genes, many of which lack a MADS or K box and are found in nonsyntenic regions, indicating that gene amplification occurred through transposable elements (Table 1; Fig. S1b). For the AGL17 and FLC subfamily from wheat, a number of triads can be assigned to a single rice gene (Fig. S1c, d). Several genes from these two subfamilies are found in close vicinity to each other, pointing towards tandem duplications as a mechanism for subfamily expansion (Table 1; Figs 2e, S2c,d).
It is noteworthy that several of the sequences, especially in the large clades, are supported only by low-confidence gene models or were not identified as gene sequences before (Tables S1, S4). Further research has to show whether they represent functional genes or constitute pseudogenes.
Gene duplications and truncated genes are prevalent among MIKC-type genes in distal telomeric segments
MIKC-type MADS-box genes were generally equally distributed among the chromosomes, the only exception being the three homoeologous chromosomes 7, which contained significantly more genes than expected from the chromosome lengths (χ2 test, P ≪ 0.001; Fig. 2e). This is mainly a result of AGL17-like genes; with the majority of them located in tandem locations on the distal telomeric ends of chromosomes 7 (Fig. 2e; Table S2). Overall, MIKC-type genes were equally likely to be located in the more central segments of the chromosomes (R2a, R2b and C) and in the distal telomeric parts of the chromosomes (R1 and R3) (48% and 50% of genes, respectively). However, gene location varied greatly among subfamilies (Fig. 3). In general, genes belonging to smaller subfamilies tended to be located in more central segments of the chromosomes, whereas a larger percentage of genes belonging to more expanded subfamilies were located in distal telomeric segments (Fig. 4; Table S2). Further, full-length MIKC-type genes were approximately equally likely to be located within subtelomeric vs central segments (45% vs 54%, respectively), but truncated genes (MADS or K box only) were two times as prevalent in distal telomeric segments (61% vs 31%; 8% of genes were not assigned to a chromosome) (Fig. 3a). Genes in distal telomeric segments were often found to be in close vicinity to each other, with over half of the genes in segment R1 being under 1000 kbp downstream of the closest MIKC-type gene (Table S8). By contrast, only 15% of genes in the central regions of the chromosome have a distance < 1000 kb to the next downstream MIKC-type gene (Table S8). This could be explained by the lower gene density in proximal vs distal chromosomal regions (IWGSC, 2018). However, in distal telomeric regions (R1 and R3), c. 65% of the MIKC-type genes are located next to an MIKC-type gene from the same subfamily, whereas this applies only to c. 30% of MIKC-type genes from more central regions of the chromosome (C, R2a, R2b) (Fig. 2e; Table S2). Together, this might indicate more frequent tandem duplication events in distal telomeric segments. These findings are in line with the observation that subtelomeric segments are targets of recombination events and that many fast-evolving genes lie within these segments (Glover et al., 2015; Chen et al., 2018; Ramírez-González et al., 2018).


Conserved and divergent patterns of MIKC-type MADS-box gene expression during wheat development
To characterize the expression of wheat MIKC-type MADS-box genes, we analyzed RNA-seq data of 193 wheat MADS-box genes (Borrill et al., 2016; Ramírez-González et al., 2018). Out of the 159 full-length genes, 83% were expressed in at least one developmental stage, with a wide expression range with a maximum count of 1–424 tpm (tpmmax) (Fig. 5; Table S2). The remaining 17% of full-length genes showed a very low expression with a tpmmax < 1 (Fig. 5a; Table S2) and were considered not expressed. Of the 33 truncated genes, encoding only for either K or MADS domain, 30% were expressed (four and six genes, respectively; tpmmax 1–51; Fig. 5a; Table S2). One gene, encoding for MADS and an SRP54 domain, was expressed ubiquitously in the plant, albeit at a low level (TaBS-A8; tpmmax = 3.12; Table S2; Fig. S1).

In general, MADS-box genes expression patterns are comparable with findings in rice (Arora et al., 2007; Yang et al., 2012; Liu et al., 2013) (Figs 5, S4). MIKC*-type genes are expressed in anthers (Fig. 5a,b; Table S6). Putative floral homeotic genes, such as PI/AP3-like and AG/STK-like genes, OsMADS29- and OsMADS31-like Bsister genes, AGL6-like genes and SEP1- and SEP3-like genes are expressed specifically during flower and seed development (Fig. 5). AGL17-like genes are expressed in roots. AP1-like, SVP-like and some SEP1-like genes are expressed ubiquitously throughout the plant life cycle in different tissues. The expression of AP1-like genes in vegetative and reproductive structures is in agreement with functional data demonstrating that several genes from this subfamily are involved in diverse aspects of wheat development (Li et al., 2019).
For further analysis, genes were hierarchically clustered according to expression similarities and then grouped into different expression modules (Figs 5b, S4; Table S5). This analysis showed that genes from one subfamily could differ considerably in their expression pattern. For example, members of the SEP1-like gene subfamily are grouped into five different modules and Bsister genes are found in four different modules (Fig. 5b). By contrast, AGL6-like and MIKC* genes show relatively little variation in their expression pattern (Fig. 5b). It is also noteworthy that 87 genes, including representatives from many different subclades, showed no expression or only low expression under very specific conditions during the developmental time course (45% of genes; module II; gray; Fig. 5b).
We also analyzed to what extent homoeologs differed in their transcript abundance. For this purpose, we used a framework previously described (Ramírez-González et al., 2018) and assigned, for example, balanced expression to homoeolog triads for which transcripts from every gene were similar in abundance. Suppressed and dominant categories were used if some transcripts were more abundant than others (Figs 5c, S5) (Ramírez-González et al., 2018). Expression data were from a developmental time course of the spring wheat cv Azhurnaya (Ramírez-González et al., 2018). In the tissues analyzed, the percentage of triads belonging to the balanced category was between 29.6% and 63.9%, with an average of 44.25% (Table S7), and tended to be lower than the values observed for transcripts from all wheat genes (c. 60–80%, depending on the tissue and analysis) (Ramírez-González et al., 2018). On average, triads in which the B homoeolog was suppressed relative to the A and D homeologs were the second most frequent category after balanced triads (16.50%; Table S7).
For some well characterized MADS-box genes, an unbalanced expression pattern is in agreement with previous data. For example, in the TaSEP1-2 triad, the only functional SEP1-like gene is hypothesized to be expressed from the D-genome (Shitsukawa et al., 2007a). In agreement with this, this triad is D-dominant in most tissues analyzed (Fig. S5). Likewise, the flowering time regulatory genes of the VRN1 (TaAP1-1) triad are in the D-suppressed or A-dominant category (Figs 5c, S5). This is in line with data showing that the VRN1 allele conferring spring growth habit and hence expected to be expressed most strongly (Loukoianov et al., 2005; Chen & Dubcovsky, 2012) is VRN-A1 (TaAP1-A1) in the cv Azhurnaya (Sharma et al., 2017).
Some AGL17-like and Bsister genes are expressed in response to stress conditions
AGL17- and OsMADS30-like Bsister clades have been expanding during wheat evolution (Figs 1, 2). Many of the genes from these clades are not expressed or expressed on a very low level during wheat development (Fig. 5). However, some of these genes do show expression in response to distinct stress conditions (Figs 6, S4).

Bsister genes form three distinct clades of OsMADS29- OsMADS30- and OsMADS31-like genes in grasses (Schilling et al., 2015). Most Bsister genes have been described to be important for ovule and seed development with a specific expression pattern limited to female reproductive organs (Nesi et al., 2002; Becker & Theißen, 2003; Mizzotti et al., 2012). While OsMADS29-like and OsMADS31-like wheat Bsister genes follow this conserved expression pattern (TaBS-1 and TaBS-2; Fig. 6a), some OsMADS30-like wheat genes are expressed ubiquitously during the plant life cycle (TaBS-B6-1 and TaBS-A5-4; Fig. 6a). By contrast, five closely related OsMADS30-like genes showed low or no expression in any of the developmental stages. Instead, a specific upregulation in response to inoculation with the pathogen Fusarium graminearum was detected (TaBS-4; Figs 6a, S6).
AGL17-like genes are commonly expressed in roots and leaves (Puig et al., 2013). They have also been described to be involved in osmotic and saline stress responses in rice (Puig et al., 2013). Several wheat AGL17-like genes are not expressed at all during the developmental stages analyzed (Figs 5b, 6b). TaAGL17-B3 and TaAGL17-D3 are expressed in the root, but not in leaves under control conditions and show upregulation in leaf tissue after 1 h of heat stress (Fig. 6b). However, expression is not detectable after 6 h and there is no specific response to drought stress (Fig. 6b). TaAGL17-A1 is upregulated after 6 h of heat, but not drought stress (Fig. 6b). Additionally, TaAGL17-D3 is expressed in the root, and upregulated in leaves in response to infection with stripe rust 7 d after infection (Fig. 6c).
Discussion
Many wheat MIKC-type MADS-box genes have evolutionarily conserved functions
MIKC-type MADS-box genes play a central role in plant development. They are therefore promising targets for crop breeding and improvement (Schilling et al., 2018).
Here, we identified 201 wheat MIKC-type MADS-box genes, which we assigned to 15 conserved subfamilies (Table 1; Fig. 1). In all, 62.7% of wheat MIKC-type MADS-box genes could be assigned to 1 : 1 : 1 homoeologous groups (Table 2). This is considerably above the average homoeologous retention rate in wheat (35.8%; Table 2) (IWGSC, 2018). Many MADS-domain proteins act in multiprotein complexes (Theißen et al., 2016). The composition of those protein complexes changes dynamically during development, with many proteins being part of more than one complex (Immink et al., 2009, 2010). Changes in the gene dosage may result in changes in the stoichiometry of the protein complexes, which may in turn have phenotypic effects (Birchler et al., 2005). Thus, selection may act to retain homoeologs in all subgenomes. An example of this might be the homoeologous genes VRN-A1, VRN-B1 and VRN-D1 (named TaAP1-A1, TaAP1-B1 and TaAP1-D1 here) which are important regulators of flowering time in wheat (Xu & Chong, 2018). Different allelic combinations of those genes lead to different heading dates, demonstrating that all three homoeologs are of functional relevance (reviewed in Shcherban & Salina, 2017).
We also found that the expression pattern of many wheat MADS-box genes is similar to that of close homologs in rice and other model plants, indicating that gene functions are broadly conserved between wheat and rice. However, the number of MIKC-type MADS-box gene triads where transcript abundance from all three wheat homoeologs was balanced (average across all tissues = 44.25%) was relatively low when compared with a genome-wide assessment of all transcripts (c. 60–80%, depending on triad synteny and tissue analyzed) (Ramírez-González et al., 2018). Some of the unbalanced homoeolog expression pattern may be explained by different functions of different homoeolog alleles, as is well demonstrated for VRN1 (TaAP1-1), where the ‘spring alleles’ are in some tissues more strongly expressed than the ‘winter alleles’ (Loukoianov et al., 2005; Shcherban & Salina, 2017; Xu & Chong, 2018). The tpm cutoff we used in our analysis was also relatively low (summed expression across one triad > 0.5 tpm), and some unbalanced expressions detected may be related to nonspecific ‘background’ expression with some degree of noise. Further studies are required to study the functional relevance of unbalanced homoeolog expression and to see whether this is, for example, a general feature of specific transcription factor families.
Together, the conservation of all major subclades, the high homoeolog retention rate and the conservation of expression patterns underline the high biological importance of the MIKC-type MADS-box gene family in general and of distinct subclades in particular.
Subfamily-specific expansion of wheat MIKC-type MADS-box genes may contribute to the high adaptability of wheat
The hexaploid nature of wheat and the large size of the MADS-box gene family provide an ideal opportunity to study the evolutionary fate of genes after gene duplication and polyploidization.
With 201 genes, wheat has one of the largest MIKC-type MADS-box gene counts among flowering plants (Gramzow & Theißen, 2013; Vining et al., 2015; Nardeli et al., 2018). In total, wheat has c. 3.1 times as many TFs as rice, which can generally be explained by its hexaploidy (Borrill et al., 2017). However, the number of MIKC-type MADS-box genes is > 4.5 times higher in wheat than in rice (Fig. 2a–d). The strikingly high number of MIKC-type MADS-box genes observed in this study is mainly a result of – leaving hexaploidy aside – the significant expansion of four subclades: SEP1, Bsister, AGL17 and FLC (Fig. 2).
The expansion of MIKC-type subfamilies has been reported before in different plant species, such as SOC1-like genes in Eucalyptus as well as SVP-like and SOC1-like genes in cotton (Vining et al., 2015; Nardeli et al., 2018). Intriguingly, genes from FLC, SVP and SOC1 subfamilies are involved in the control of flowering time (Moon et al., 2003; Lee et al., 2007; Schilling et al., 2018). It has been hypothesized that the expansion (and contraction) of developmental control genes, more specifically eudicot FLC-like genes, facilitate the rapid adaptation to changes in environmental factors such as temperature (Theißen et al., 2018). In a similar manner, duplications of wheat FLC-like genes might have enabled the adaptation of wheat to different climatic conditions, therefore contributing to its global distribution. It will be interesting to see whether copy number variations of FLC-like genes can be detected in different wheat varieties.
The expansion of Bsister and AGL17-like genes may similarly be explained with an adaptive advantage. However, in those cases, neofunctionalization might be involved. Bsister like genes form three subclades of OsMADS29-, OsMADS30- and OsMADS31-like genes in grasses. Expression pattern and evolutionary analyses suggest that grass OsMADS29- and OsMADS31-like genes retain a conserved role in ovule and fruit development (Yang et al., 2012; Yin & Xue, 2012), whereas OsMADS30-like genes may have functionally diverged (Schilling et al., 2015). For five wheat OsMADS30-like genes, upregulation was observed during infection with Fusarium head blight (Fusarium graminearum; Fig. 6a). MIKC-type MADS-box genes are not typically associated with biotic stress responses and it remains speculative whether and how they might be involved in responding to a Fusarium infection. However, Fusarium head blight is a floral disease and Bsister genes are expressed in the flower. This may have facilitated a co-option of theses genes into a pathogen response network. Interestingly, the lack of synteny between OsMADS30-like genes might point towards transposable elements as a possible duplication mechanism, underlining the evolutionary importance of transposable elements (Dubin et al., 2018).
An upregulation in response to stresses was also observed for some AGL17-like genes, which form the largest wheat MIKC-type subfamily. While many wheat AGL17-like genes were not supported by transcriptional data (Fig. 5), a number of them are upregulated in late stages of stripe rust infection and in response to heat stress (Fig. 6b,c). Another three AGL17-like genes were found to be upregulated in some stages of anther and grain development, a pattern unusual for AGL17-like genes (Fig. 5a). This diversity of expression patterns and putative functions adds to the complex evolution of AGL17-like genes, as genes from this subfamily have been implicated in various different functions, including root development, flowering time control, tillering, stomata development and stress response (Kutter et al., 2007; Guo et al., 2013; Puig et al., 2013; Hu et al., 2014; Xu et al., 2018).
OsMADS30- and AGL17-like genes might be involved in other stress responses as well and might be good candidates to investigate cultivar-specific resistance to biotic and abiotic stresses.
Putative adaptive advantages notwithstanding, we also observed a relatively high percentage of truncated and/or very lowly expressed MIKC-type MADS-box genes. Some of those genes may constitute pseudogenes. Whether those sequences are of functional importance remains an open question. In some cases, pseudogenes contribute, for example, transcription factor binding sites for noncoding RNA expression (Xie et al., 2019).
Dynamic evolution of MIKC-type MADS-box genes in distal telomeric regions
The cause of the expansion of the FLC-, AGL17- SEP1- OsMADS30-like subfamilies might be the chromosomal position of their genes. Distal telomeric segments have previously been described as targets of recombination events, and many fast-evolving genes lie within these evolutionary hotspots (Glover et al., 2015; Chen et al., 2018). In wheat specifically, genes related to stress response and external stimuli, notably traits with a high requirement for adaptability, have been found to be located in distal chromosomal segments (IWGSC, 2018; Ramírez-González et al., 2018). By contrast, genes related to photosynthesis, cell cycle or translation (e.g. genes involved in highly conserved pathways) are enriched in proximal chromosomal segments (IWGSC, 2018; Ramírez-González et al., 2018).
This notion is supported by our findings: genes of the larger wheat MIKC-type subclades tend to be located in distal telomeric segments (Figs 3, 4). Remarkably, genes of these expanded clades do control traits that are important for adaptation to different environments. For example, AGL17-like genes are involved in root development and stress response and FLC-like genes determine flowering time (Xu & Chong, 2018; Xu et al., 2018). On the other hand, smaller MIKC-type subclades involved in highly conserved developmental functions, such as AP3/PI- or AG/STK-like genes, which control reproductive organ identity (Theißen et al., 2016), tend to be located more in central chromosomal segments (Fig. 3). The higher prevalence of duplication events in distal telomeric chromosomal segments might thus have caused the expansion of certain subclades, possibly facilitating rapid adaptation to different environmental conditions. On the other hand, there might be an evolutionary advantage for wheat MIKC-type subclades with highly conserved functions to be located in proximal chromosomal positions: in this way, for example, developmentally detrimental gene dosage variations are minimized.
The genome architecture among different plants varies considerably. Arabidopsis and rice, for example, have relatively small subtelomeric regions (The Arabidopsis Genome Initative, 2000; Mizuno et al., 2006), whereas other plants show increased recombination rates towards the chromosome ends and relatively large distal telomeric regions with increased gene content, very similar to what is seen in wheat (Choulet et al., 2014; Glover et al., 2015; Li et al., 2015; Lambing et al., 2017; Grassa et al., 2018; Sun et al., 2018; Laverty et al., 2019). It will be interesting to see whether similar dynamics of MIKC-type MADS-box gene duplications can be observed in those species.
The high prevalence of gene duplications in distal telomeric segments most probably also led to a higher proportion of truncated genes lacking either the MADS or K box (Fig. 3a). This might render the encoded proteins functionally impaired, as suggested earlier for TaSEP1-A2 (WLHS1-A), which lacks a K domain and shows no protein–protein interactions in vivo (Fig. 1; Table 1) (Shitsukawa et al., 2007a). However, MIKC-type MADS-domain proteins without a K-domain might theoretically still be able to bind DNA (Kaufmann et al., 2005) and compete with full-length MIKC-type proteins for target sites, thus functioning as transcriptional inhibitors. On the other hand, genes lacking the MADS box but encoding a K domain might act in a dominant-negative manner by binding and sequestering other MADS-domain proteins (Ferrario et al., 2004; Seo et al., 2012). Evidence that dominant-negative versions of transcription factors can be evolutionary and developmentally important comes from basic helix–loop–helix proteins (Ledent & Vervoort, 2001; Jones, 2004). We found OsMADS30-like wheat genes, encoding for only MADS, only K and an unusual combination of MADS and SRP54 domains expressed ubiquitously (Figs 6a, S3b), hence deviating from the canonical Bsister expression in the flower and again hinting towards a possible neofunctionalization of OsMADS30-like genes during wheat evolution.
Towards identifying the evolutionary origin and the functional relevance of MIKC-type MADS-box gene duplications in wheat
An important goal for future research projects will be to infer when during evolution the expansion of some wheat MADS-box gene subfamilies occurred. The evolutionary history of bread wheat is relatively well understood and the genomes of many of the descendants of the wheat ancestors have been sequenced (Jia et al., 2013; Ling et al., 2013, 2018; Luo et al., 2017; G. Zhao et al., 2017). The genome sequences of pasta wheat and the closely related wild emmer have also been determined (Avni et al., 2017; Maccaferri et al., 2019). This offers the opportunity to study the phylogenetic intricacies of MIKC-type MADS-box gene evolution in this important group of cereals in great detail. Similarly, pangenome studies of bread wheat will shed light on the dynamics of MIKC-type MADS-boxes genes within one species (Montenegro et al., 2017; Tao et al., 2019). The importance of those studies is illustrated by the observation that the MADS-box gene VRN-D4, a close relative of VRN-A1 (TaAP1-A1), is responsible for the spring growth phenotype in many wheat varieties from South Asia. However, VRN-D4 originated by a duplication from VRN-A1 and is not found, for example, in the Chinese Spring variety from which the reference genome was generated (Kippes et al., 2015). Genomic data, together with new tools that allow the functional characterization of genes in wheat (Wang et al., 2014; Krasileva et al., 2017; Liang et al., 2017), will help to decipher the function of MIKC-type MADS-box genes and will allow testing of, for example, whether neofunctionalization of MADS-box genes plays a role in wheat pathogen resistance.
In conclusion, MIKC-type MADS-box genes are of critical importance for wheat development and hence bear significant potential for the improvement of this economically highly relevant crop. Based on our data, we speculate that MADS-box gene duplications might have been crucial in increasing the adaptability of wheat to different environmental conditions as well as in fine-tuning quantitative traits by gene duplication. By thoroughly characterizing the entire complement of wheat MIKC-type MADS box genes, we provide the basis for the development of markers for future breeding efforts as well as for the identification of gene-editing targets to improve wheat performance. Further, we frequently observe putative neofunctionalization, a requirement for understanding the emergence of new traits during evolution.
Acknowledgements
We are grateful for the support by the International Wheat Genome Sequencing Consortium and for early access to the genome data. Funding of SP by a China Scholarship Council PhD fellowship (CSC no. 201708300002) is gratefully acknowledged. We are grateful to the School of Biology and Environmental Science at the University College Dublin for general support. We thank four anonymous reviewers for very helpful comments on an earlier version of the manuscript.
Author contributions
RM and SS developed the analysis approach. SS and AK analyzed the data with supervision from RM and LSJ. SP performed the RT-PCR experiments. SS and RM wrote the manuscript. All authors read and approved the final manuscript.