Trait to gene analysis reveals that allelic variation in three genes determines seed vigour

Summary Predictable seedling establishment is essential for resource‐efficient and cost‐effective crop production; it is widely accepted as a critically important trait determining yield and profitability. Seed vigour is essential to this, but its genetic basis is not understood. We used natural variation and fine mapping in the crop Brassica oleracea to show that allelic variation at three loci influence the key vigour trait of rapid germination. Functional analysis in both B. oleracea and the model Arabidopsis identified and demonstrated activity of genes at these loci. Two candidate genes were identified at the principal Speed of Germination QTL (SOG1) in B. oleracea. One gene BoLCVIG2 is a homologue of the alternative‐splicing regulator (AtPTB1). The other gene BoLCVIG1 was unknown, but different alleles had different splice forms that were coincident with altered abscisic acid (ABA) sensitivity. We identified a further QTL, Reduced ABscisic Acid 1 (RABA1) that influenced ABA content and provide evidence that this results from the activity of a homologue of the ABA catabolic gene AtCYP707A2 at this locus. Lines containing beneficial alleles of these three genes had greater seed vigour. We propose a mechanism in which both seed ABA content and sensitivity to it determines speed of germination.

B. oleracea seed germination and hydrothermal time analysis   impact of temperature and water stress on germination can be modeled using hydrothermal time (HTT) approaches and can therefore be predicted. Germination of AGSL101, GDDH33 and A12DHd seeds was recorded at three water potentials (0, -0.41, -0.79 MPa) and three temperatures (5, 15, 20 o C) and the data were subjected to HTT analysis. In all three genotypes, germination rate decreased at lower temperatures and at lower water potentials. Linear regression lines were fitted to this response and in all three genotypes > 90% of the variance could be accounted for. The lines were compared by regression analysis. In the case of the response to water potential, lines could not be constrained to a single threshold and had a significantly (P<0.001) better fit when parallel lines were fitted and this was not improved by allowing the slopes to alter from each other. In contrast, with the temperature response the fit of lines was significantly (P<0.001) improved when the slopes of lines were allowed to be different and 96% of the variation could be explained when they were constrained to a single threshold. No more variance could be explained by allowing the threshold to differ. The data therefore conformed to the basic hydrothermal time model (i.e. a single temperature threshold, but a distribution of water potential thresholds) and so model parameters were fitted and optimized by minimizing the residual sum of squares of the differences between measured and modeled values. The greater effect of lower water potentials on A12DHd seed germination compared to AGSL101 illustrated in Fig. 1(c) is generalized in its higher Ψ b (50) (base water potential of the 50 th percentile). Ψ b (50) is a measure of the sensitivity of germination to decreased water potential, the more negative Ψ b (50) for AGSL101 and GDDH33 than that of A12DHd indicates their greater tolerance to water stress and their greater speed of germination.  oleracea bacteria artificial chromosome (BAC) identification, BAC tiling path construction and candidate gene identification described below. This was followed by fine-mapping using B.

Confirmation of the synteny between B. oleracea chromosome 1 and the top arm of chromosome 3 of Arabidopsis:
More rapid germination of substitution line AGSL101 than the A12DHd recurrent parent line confirms that an introgressed region at the bottom telomeric end of chromosome 1 contains the QTL for SOG1 identified by Bettey et al. (2000). Previously synteny has been shown between a number of regions in the Brassica C genome and Arabidopsis using the RFLP markers pR85 and pN13 (Cogan et al., 2004). To develop further markers we designed primer pairs to 30 Arabidopsis gene models that were spaced across the region in Arabidopsis.
These primers were tested to determine if they amplified a B. oleracea product and then if there was any polymorphisms between AGSL101 and the parental lines A12DHd and GDDH33. A banding pattern that is the same in AGSL101 and GDDH33, but different from that in A12DHd indicates its presence at the SOG1 locus in B. oleracea and therefore its usefulness in refining the QTL. Primers for three gene models were identified as informative markers (At3g01190, At3g02420, At3g07130). In addition, public databases (Brassica.info-SSR exchange) were also screened for BoLGC1 markers that might be informative. A number of SSRs were identified and tested for any polymorphism between the B. oleracea lines and appropriate locations on CAAAGGTGACCAAGAGGACATT. BAC sequenced and shown to contain 12 full length genes (Table S1) Seed produced from all KO mutants identified (Table S2) Screen KO mutants for germination speed

Germination phenotypes found in two genes
Arabidopsis resources Discovery that syntenic rearrangements had occurred schematic of the SOG1 region in selected BILs (F 6 ) identified as having recombination in the genotype screen. This recombination is indicated by the genotype scores (A12DHd or GDDH33) from the five polymorphic markers (Fig. S2) used in the screen. (b) The lines were also subjected to a phenotype screen at the low temperature of 8 o C to enable more recordings to be made and thus an accurate cumulative germination curve. Speed of germination was calculated as mean log time to germination as used by Bettey et al. (2000) in the original analysis that identified SOG QTL.
These data were used to refine SOG1 through statistical analysis with the genotype scores. The analysis considered each marker, or each pair of consecutive markers, separately and determined if there was any associated difference. The analysis was performed using REML, taking the marker score or scores as fixed, and variability between lines with the same marker score(s), and between replicates as random. It was anticipated that for some groups of lines with the same marker score, the QTL would not be segregating, and for others that it would, so the variance component estimating variance between lines with the same marker score was allowed to vary between marker scores. In this single marker analysis the speed of germination is significantly different (P<0.05) with GDDH33 (G) alleles at markers 4 and 5 with increased significance at marker 5 to when A12DHd (A) alleles were present. (c) This result was substantiated in the between marker analysis with the only significant difference being observed between markers 4 and 5. The analysis was repeated using germination data from the F 5 generation and this confirmed the result. The SOG1 QTL was therefore fine mapped to this region identified by markers 4 and 5.
It should be noted that in the lines analyzed, if a QTL is significant for GDDH33 at marker five the results will also show significance for the reverse effect i.e. A12DHd at marker 1 (as in S3b). This opposite significance effect is an artifact of recombination occurring at the end of the chromosome (chromosome all A12DHd to the start of the introgression) so that almost all lines with recombination will have GDDH33 at marker 5 and A12DHd at marker 1 (see schematic S3a).
Thus lines with faster germination due to GDDH33 alleles at marker 5 will also almost all have A12DHd at marker 1.

Further genotype screening and analysis show that genes associated with markers 4 and 5
separately influence germination speed: Subsequently other gene markers were developed for several of the genes identified on the BAC BoB064L23 (Table S1). These included At3g02080 and At3g01060 (BoLCVIG1) grouped with marker 4 and At3g01070 and At3g01090 grouped with marker 5 (BoLCVIG2). The BILs were genotyped again using these new markers and we found that a recombination event identified with markers 4 and 5 had occurred between the two genes BoLCVIG1 and BoLCVIG2 so that they were grouped with marker 4 (Bo3g02090) and 5 (Bo3g01150) respectively. The analysis for speed of germination of the BILs described above showed it was significantly different at markers 4 (includes BoLCVIG1) and 5 (includes BoLCVIG2). Thus genes associated with both markers independently influence SOG1. We then extended the statistical analysis to look for linkage at pairs of markers and found that markers 4 and 5 were significantly linked (P<0.012) suggesting that genes associated with these markers influence speed of germination in the same way. The primers for the additional markers were:

At3g02090
No insertion mutants c

At3g02080
No insertion mutants

(b) A section of B. oleracea chromosome 4 genotyped with 10 markers (A-J) for the collection of
BILs, and the A12DHd and AGSL101 parents. ABA content is shown on the right hand side indicating two clear groupings (lower and higher ABA content (> or < 300 ng ABA g -1 DW).
Lines with lower ABA content had been assigned GDDH33 (yellow) by the markers and those with the higher ABA content had been assigned as A12DHd (Blue)  These markers were identified following genomics NGS sequencing to a depth of 50 X coverage of both A12DHd and GDDH33 followed by alignment to the TO1000DH reference genome (Parkin et al., 2014)

Methods S1 Marker analysis to confirm collinearity between B. oleracea and Arabidopsis
Primer pairs were designed to 30 Arabidopsis gene models that were spread at intervals across the SOG1 region using Primer 3 software (http://gene.pbi.nrc.ca/cgi-bin/primer/primer3_www.cgi) and gene data from TAIR (http://www.arabidopsis.org/) to give amplicons from 200 to 700 bp. The PCR mix used was standard, but a touch-down program was used. This consisted of cycling parameters as follows: 94ºC for 5 mins; then annealing at 65ºC to 55ºC for 10 cycles dropping a degree each cycle with 30 s extension at 72ºC and 30 s denaturation at 94ºC over the 10 cycles; followed by 30 cycles of 94ºC 30 s, 55ºC 30 s, 72ºC 45 s; and a final extension at 72ºC for 15 min.
A number of publically available SSRs used in the generation of the B. oleracea integrated map (Sebastian et al., 2000) were also used. Two proved informative (Ni4B10 and OL10F10).

Marker analysis used to genotype BILs
Hybridisation of the markers designed above was used to screen an A12DHd BAC library (Howell et al., 2002) positive clones were re-probed with adjacent markers. BACs with numerous hits were end sequenced and promising BACs fully sequenced. Further probes were identified from the sequences obtained and these were used to re-probe the BAC library. The process was then repeated several times and a BAC tiling path generated. Locus specific primers were then designed. These were tested using the PCR conditions shown above. PCR amplicons for sequencing were cloned and at least three clones sequenced using standard protocols using the Big-Dye Terminator system (Applied Biosystems, Warrington, UK) with products run on an ABI Prism 3130xl Genetics Analyzer (Applied Biosystems). Primers that gave polymorphic results were selected as markers (Fig. S2). These were used to fine map the QTL within the backcrossed lines generated.

RNA extraction
Seeds were selected at random from bulked samples for RNA extraction. Total RNA was extracted in lots of 12-15 seeds and then RNA was pooled from at least 50 seeds in each replicate sample.
RNA samples were then further pooled depending on their use (see below). Extraction was by an RNAqueous kit (Ambion) in conjunction with Plant RNA isolation aid (Ambion). RNA quality was determined using a spectrophotometer (Nanodrop, USA) and electrophoretically with an Agilent Bioanalyser. RNA was treated with RNase-free DNase I (Roche Diagnostics) to remove contaminating genomic DNA.

Illumina sequencing and analysis
For Illumina sequencing RNA was combined across harvests to give two replicates of a single seed development and imbibition sample for both A12DHd and AGSL101. Oligo(dT) selection was performed twice using Dynal magnetic beads (Invitrogen). Both GAIIx and MiSeq sequencing was carried out on two biological and three technical replicates. Illumina library preparations were performed using mRNA-TruSeq sample prep kit version five (Illumina Inc., San Diego, CA, USA) according to the manufacturer's protocol (15018818 revA). The cultivar/tissue specific libraries were randomly assigned to six-nucleotide multiplex barcoded adapters and lanes.
Data was collected using short reads on an Illumina GAIIx instrument. 36 base paired-end sequence reads were base-called and scored for read quality using the Illumina CASAVA pipeline.
MISeq sequencing was carried out using read lengths of 150 bp. Subsequently TopHat, Bowtie and Cufflinks were used to map the reads to both to the BAC tilling path assembled for this work and also to the TO1000DH (Parkin et al., 2014) genomic sequence to give the number of mapped reads per kilobase of exon per million mapped reads (RPKM) a measure of transcript abundance.
Splice form identification was carried out using Miso, MapSplice and DiffSplice programs allowing identification of both misplicing and intron retention.

PCR of alternative splice forms
Expression of alternative splice forms of BoLCVG1 and BoLCVG2 was determined by Illumina sequencing as described above and was verified via PCR. The development and imbibition sample series described above were both pooled into early and late RNA samples. Synthesis of cDNA was performed on 2 µg total RNA using SuperScript TM II Reverse Transcriptase Reactions were performed in triplicate and contained 5 μl of SYBR Green I Master, 2 μl PCRgrade water, 1 ml of 10 mM forward and reverse primers and 1 μl cDNA (diluted 1:10) in a final volume of 10 ml. PCR were carried out under the following conditions: one cycle at 95 C for 10 min followed by 50 cycles at 95 o C for 30s, 65 o C for 30 s, and 72 o C for 30 s. Data was analysed using LightCycler® 480 software (version 1.5; Roche Diagnostics). All qRT-PCR products were cloned into pGEM-T vector (Promega) and sequenced verified.

Data analysis
Sequence reads were aligned to the published B. oleracea genome assembly (Parkin et al., 2014) using Tophat and Bowtie algorithms. Expression levels were calculated using TopHat, Cufflinks and Cuffdiff to quantify expression differences. Normalisation was carried out using reads per kilobase of exon model per million mapped reads (RPKM counts). RPKM values are useful for analyzing differences in the abundance of alternative splice variants between samples, as correction for the length of each splice variant is essential for this type of analysis. MiSO, DEXSeq, and CuffDiff (implemented in the Cufflinks package) were all separately used to test for differences in alternative splicing between two samples. Counts of four or more were used to define gene expression. Visualization was performed by the Integrative Genome Viewer (IGV; Robinson et al., 2011).