Single‐base methylome profiling of the giant kelp Saccharina japonica reveals significant differences in DNA methylation to microalgae and plants

Summary Brown algae have convergently evolved plant‐like body plans and reproductive cycles, which in plants are controlled by differential DNA methylation. This contribution provides the first single‐base methylome profiles of haploid gametophytes and diploid sporophytes of a multicellular alga. Although only c. 1.4% of cytosines in Saccharina japonica were methylated mainly at CHH sites and characterized by 5‐methylcytosine (5mC), there were significant differences between life‐cycle stages. DNA methyltransferase 2 (DNMT2), known to efficiently catalyze tRNA methylation, is assumed to methylate the genome of S. japonica in the structural context of tRNAs as the genome does not encode any other DNA methyltransferases. Circular and long noncoding RNA genes were the most strongly methylated regulatory elements in S. japonica. Differential expression of genes was negatively correlated with DNA methylation with the highest methylation levels measured in both haploid gametophytes. Hypomethylated and highly expressed genes in diploid sporophytes included genes involved in morphogenesis and halogen metabolism. The data herein provide evidence that cytosine methylation, although occurring at a low level, is significantly contributing to the formation of different life‐cycle stages, tissue differentiation and metabolism in brown algae.


Supplementary Methods I:
Part 1: Constructing the pseudo-chromosomes using a genetic linkage map 4994 markers of 31 linkage groups from the genetic linkage map analysis (1) were integrated into the assembled Saccharina japonica scaffolds to construct a physical map at the chromosomal level. All markers were aligned against the assembled kelp scaffolds using BLASTN (2) with an e-value cut-off 1E-15. Markers that mapped to more than one scaffold (150x) were discarded. A total of 88 scaffolds had markers from more than one linkage group, which could have been either the consequence of incorrect assembly or an incorrect linkage group. To correct the incompatibility between the physical map and linkage groups, reads from two mate-paired libraries (3kb and 5kb) were aligned to the assembled scaffolds with BWA (3). If the incompatible positions on scaffolds were not covered by mate-paired reads, the scaffold was splited at the gaps (intergenic). If the incompatible positions were mapping to paired-end reads, we selected the linkage group with a larger number of markers and discarded markers in other linkage groups. Six scaffolds were split and 4687 markers were used to construct the physical map of S. japonica.
The order of scaffolds on chromosomes were sorted by mean value of markers on the same scaffolds. In order to determine the orientation of scaffolds, we compared the physical order of the markers with their linkage-map order. If the markers on a scaffold had the same order between the physical and the linkage map order, the orientation of the scaffold was considered in positive orientation and vice versa. At the end, scaffolds were concatenated into chromosomes with 200 bp gaps.
A total of 1,576 scaffold were anchored to the genetic map, accounting for 64.69 % (352.93 Mbp) of the assembled kelp scaffolds. The remaining scaffolds were used to construct the artificial chromosome 0.

Part 2: Methylation profiles of sex-determining regions and their gene expression
Sex-determining regions were well studied in XY and ZW systems. Similar sexdetermining processes were recently reported for the UV system that brown algae possess (4). To study the influence of DNA methylation on sex determination in S. japonica, we looked at genes and their regulatory elements involved in the UV system in relation to their expression and MLgf (Fig. S20, Table S23). In S. japonica, there are 18 genes known to be involved in sex determination of which, 12 are located in the sex-determining region (SDR) and 6 have moved out to auto-chromosomes (named as PAR) (Fig. S20). Of the 12 genes located in the SDR, 6 are for female and 6 are for male gametophytes. Interestingly, 4 genes that left the SDR were highly expressed in SP and had a significantly negative correlation with their MLgf in all life-cycle stages. These four genes include a MEMO domain (SJ13722), a glycosyltransferase family (PF01697, SJ18945), a RING-type zinc finger (15797) and a Rab-GTPase-TBC domain (PF00566, SJ15874). The male genes inside the SDR show no correlation between the MLgf and their expression. However, they have a much lower MLgf in FG than in MG (Fig. S20).

Part 3: DNA methyltransferases evolution and their putative function in S. japonica
Recent rapid advances in next-generation sequencing technologies have enabled the establishment of methylation maps at a single-base resolution (6). To date, wholegenome methylomes have been reported from a range of most major eukaryotic groups (7) such as the early-diverging vascular plants Selaginella moellendorffii, the moss Physcomitrella patens, the silkworm Bombyx mori and honey bee Apis mellifera (8). Furthermore, several model eukaryotes are devoid of DNA methylation altogether including the yeast Saccharomyces cerevisiae, the nematode Caenorhabditis elegans, the fruit fly Drosophila melanogaster (except in the early stages of embryogenesis) and the brown alga Ectocarpus siliculosus (7). DNA methyltransferases (DNMTs or DNA MTases) usually methylated DNA in all of these organisms. These enzymes catalyze the transfer of the methyl group from a cofactor molecule S-adenosyl-lmethionine (AdoMet or SAM) to the C5 position of the cytosine residues. To uncover DNMTs in S. japonica, we identified and compared the DNMTs family of S. japonica with DNMTs from other eukaryotes. Potential candidates were identified and downloaded by using a hidden Markov model (HMM) corresponding to the C5cytosine-specific DNA MTase domain in Pfam (9) (PF00145, named DNA_methylase in http://pfam.xfam.org). All proteins with a score above 125 were regarded as members of the MTase family. This threshold was used instead of the default cutoff defined in Pfam (http://pfam.xfam.org) in order to increase the sensitivity. The presence of the crucial MTase domain was verified by using the Interpro database (10). For assessing phylogenetic relationships, only MTase domains were used to construct phylogenetic trees. Over 50 species with approximately 130 DNMTsbwere clustered into three main clades based on the MTase domain. Six different DNMTs were identified in S. japonica and all of them seem to belong to the DNMT2 family (Fig. 4A). To shed further light on the structure of DNMT2 in S. japonica, we identified every domain of the human DNMT3 and DNMT1 genes to see if any of those domains were associated with the DNMT2 domains in S. japonica (Fig. S26). Genes containing PWWP (PF00855) and ADD (IPR025766) were found in S. japonica, together with domains PHD (IPR001965) and the Bromodomain (PF00439). However, none of these domains were associated with any of the DNMT2 domains in S. japonica (11), (13), (Fig. S25-27).
However, DNA methylation can also be influenced by processes that demethylate either passively or actively by demethylases. For instance, a passive loss of 5mC can be achieved through successive cycles of DNA replication in the absence of functional DNMT1/MET1 (16,17). Furthermore, demeter (DME) / demeter-like1-3 (DML) / repressor of silencing 1 (ROS) family of DNA glycosylases could rapidly erase 5mC independent of DNA replication in specific biological settings in plants as they share the RRM-fold domain (PF15628) and DNA glycosylases domain (IPR011257), the first one is reported to catalyze the removal of the 5mC base and subsequently cleave the backbone through lyase activity and the second act to repair oxidative damage in DNA (16, 18-21). The methyl-CpG-binding domain protein 2 (MBD2, PF01429) was also reported to be able to catalyze the reaction of removal the methyl group (16). To uncover proteins involved in active DNA demethylation, we searched for DME, DML, ROS and MBD domains in coding genes of S. japonica. We found an RRM fold domain and a DNA glycosylase domain (Fig. S28, S29).

Quality control
We use FastQC to perform basic statistics on the quality of the raw reads. Then, those read sequences produced by the Illumina pipeline in FASTQ format were preprocessed through Trimmomatic software which can be summarized as below: (1) Remove low-quality reads: scan the read with a 4-base wide sliding window, cutting when the average quality per base drops below 15 (SLIDINGWINDOW: 4:15); (2) Remove leading low quality or N bases (below quality 3) and trailing low quality or N bases (below quality 3) (LEADING:3, TRAILING:3); (3) Remove adapters: there are two modes to remove the adapter sequence: a. alignment with the adapter sequence, the number of matching bases were greater than 10 and mismatch=2; b. when read1 and read2 overlapping base scoring greater than 30, removed non-overlapping portions (ILLUMINACLIP: adapter.fa: 2: 30: 10); (4) Drop reads below the 36 bases long; (5) Discard those reads that cannot form pairs; The remaining reads that passed all the filtering steps were counted as clean reads and all subsequent analyses were based on them. We use FastQC to perform basic statistics on the quality of the final reads.

Reads mapping to the reference genome
Bismark software (22) was used to perform alignments of bisulfite-treated reads to a reference genome using default parameters. The reference genome was transformed to a bisulfite-converted version (C-to-T and G-to-A converted) and then indexed using bowtie2 (23). Sequence reads were also transformed into fully bisulfite-converted versions (C-to-T and G-to-A converted) before they were aligned to similarly converted versions of the genome in a directional manner. Sequence reads that produced a unique best alignment from the two alignment processes (original top and bottom strand) were then compared to the genomic sequence and the methylation state of all cytosine positions in the reads were inferred. Read pairs that shared the same coordinates in genome were regarded as duplicated ones, which were removed before methylation state calling was conducted, thus avoiding a potential methylation level calculation bias. The results of the methylation extractor were transformed to bigWig format for visualizations using IGV browser. The bisulfite non-conversion rate was calculated as the percentage of cytosines sequenced at cytosine reference positions in the lambda genome.

Transcriptome analysis
Sequencing libraries were generated using the rRNA-depleted RNA by NEBNext® UltraTM Directional RNA Library Prep Kit for Illumina® (NEB, USA) following manufacturer's recommendations. Briefly, fragmentation was carried out using divalent cations under elevated temperature in NEBNext First Strand Synthesis Reaction Buffer (5X). First-strand cDNA was synthesized using random hexamer primer and M-MuLV Reverse Transcriptase (RNaseH). Second-strand cDNA synthesis was subsequently performed using DNA Polymerase I and RNase H. In the reaction buffer, dNTPs with dTTP were replaced by dUTP. Remaining overhangs were converted into blunt ends via exonuclease/polymerase activities. After adenylation of 3' ends of DNA fragments, NEBNext Adaptor with hairpin loop structure were ligated to prepare for hybridization. In order to select cDNA fragments of preferentially 150~200 bp in length, the library fragments were purified with AMPure XP system (Beckman Coulter, Beverly, USA). Then 3µl USER Enzyme (NEB，USA) was used with size-selected, adaptor-ligated cDNA at 37°C for 15 min followed by 5 min at 95°C before PCR. Then PCR was performed with Phusion High-Fidelity DNA polymerase, Universal PCR primers and Index (X) Primer. At last, products were purified (AMPure XP system) and library quality was assessed on the Agilent Bioanalyzer 2100 system. After cluster generation, the libraries were sequencing on an illumine Hiseq 2500 platform and 125bp paired-end reads were generated. After quality control of the raw reads, clean data were mapped to the reference genome using Bowtie v2.0.6 and TopHat v2.0.9(25). The mapped reads of each sample were assembled by both Scripture (beta2) (26) and Cufflinks (v2.               Richfactor is calculated as the number of methylated genes belong to a KEGG pathway / the number of all genes belong to this pathway. The bigger Richfactor indicate the more obvious enrichment. Richfactor is calculated as the number of methylated genes belong to a KEGG pathway / the number of all genes belong to this pathway. The bigger Richfactor indicate the more obvious enrichment.