NorWood: a gene expression resource for evo‐devo studies of conifer wood development

Summary The secondary xylem of conifers is composed mainly of tracheids that differ anatomically and chemically from angiosperm xylem cells. There is currently no high‐spatial‐resolution data available profiling gene expression during wood formation for any coniferous species, which limits insight into tracheid development. RNA‐sequencing data from replicated, high‐spatial‐resolution section series throughout the cambial and woody tissues of Picea abies were used to generate the NorWood.conGenIE.org web resource, which facilitates exploration of the associated gene expression profiles and co‐expression networks. Integration within PlantGenIE.org enabled a comparative regulomics analysis, revealing divergent co‐expression networks between P. abies and the two angiosperm species Arabidopsis thaliana and Populus tremula for the secondary cell wall (SCW) master regulator NAC Class IIB transcription factors. The SCW cellulose synthase genes (CesAs) were located in the neighbourhoods of the NAC factors in A. thaliana and P. tremula, but not in P. abies. The NorWood co‐expression network enabled identification of potential SCW CesA regulators in P. abies. The NorWood web resource represents a powerful community tool for generating evo‐devo insights into the divergence of wood formation between angiosperms and gymnosperms and for advancing understanding of the regulation of wood development in P. abies.

Notes S1 Supplementary figures, genotype confirmation for biological replicates, phylogenetic analysis and gene model annotation refinement for cellulose synthase family members.

Table S1
Section pooling and cell profiling (see separate file). Table S2 Gene expression and sequence data quality (see separate file).     Notes S1

Genotyping
To confirm that the three trees were clonal biological replicates of genotype 'Z4006' a genotype test was performed looking at Single Nucleotide Polymorphisms (SNPs) in genes from the top 100 longest scaffolds in the genome assembly (Nystedt et al., 2013). Read information from the three replicates, as well as three samples from an independent experiment used here as a control, were merged using samtools-1.3.1 mpileup (-d100000). SNPs were called using bcftools-1.3.1 call (-v -c). A Principle Component Analysis (PCA) plot was created from the resulting variant call format (vcf; Li, 2011) file using the R-3.3.0 and the package SNPRelate-1.6.4 (Zheng et al., 2012). From the PCA plot (Fig. A) it was clear that the replicates from the NorWood tree sample series formed a tight cluster in contrast to the three control samples, which were distinctly different to the NorWood samples as well as being variable among themselves. This pattern was expected as the control samples were not obtained from clonal replicates.

Fig. A. Genotyping of clone Z4006.
A PCA-plot of SNPs calls from genes on the 100 longest scaffolds shows that the three clones (red) forms a tight cluster while the three independent control samples (blue) distinctly separates from the three clones.

Merging genes
An initial phylogenetic tree (Fig. B) of all genes in the cellulose synthase gene family (as identified in PlantGenIE). Two clusters of genes (highlighted in blue in Fig. B below) contained examples of fragmented gene annotation, with two gene fragments located adjacent to each other on a single scaffold in the genome assembly. In both cases evidence was found supporting a merge of the gene models. Samtools-1.3.1 (Li, 2011) mpileup (-d1600 -C50 -d1600) was used to merge reads from several samples (T[1-3]-08 to T[1-3]-12) in the region of both genes (MA_10429339:200-6,500), spanning MA_10429339g0010 (466 -2979) and MA_10429339g0020 (3047-5917), from which a consensus sequence was called using bcftools-1.3.1 call (-c) and vcfutils.pl vcf2fq (which is included with bcftools). In addition the GBrowse genome browser at ConGenIE.org showed an existing assembled transcript (Trinity RNA-Seq assembly; comp95354_c1_seq1), as shown in Fig. C, spanning both regions. A multiple-sequence alignment produced using MUSCLE (v3.8.31;Edgar, 2004;McWilliam et al., 2013) showed that the consensus sequence and the read sequence were almost identical (Fig. Ca). For the final phylogenetic tree (Fig. S3), a new gene model was predicted from transcript comp95354_c1_seq1 using EMBOSS Transeq (Rice et al., 2000;McWilliam et al., 2013). The resultant gene model covered both of the original, but fragmented, gene models (Figs Ca, D).
The method was repeated on scaffold MA_183130 in the region MA_183130:3000-13000 and supported a merge of the two gene models MA_183130g0010 (3629-11651 bp) and MA_183130g0020 (11711-12567 bp). The scaffold region also contained an aligned transcript (comp89823_c0_seq1; Fig. Cb). A MUSCLE multiple alignment including the third gene model (MA_ 8061832g0010) located in the cluster together with the other two models indicated in Fig. B showed that MA_8061832g0010 was a fragment with 100% identity to the merged gene model on comp_89823_c0_seq1 (Fig. E).
The newly derived gene models were added to the current version of the genome using WebApollo (Lee et al., 2013;Sundell et al., 2015) at ConGenIE.org.    Fig. 1 (MA_183130g0010, MA_183130g0020, MA_8061832g0010) and a gene prediction of the consensus sequence produced using FgenesH.