Evolution of l‐DOPA 4,5‐dioxygenase activity allows for recurrent specialisation to betalain pigmentation in Caryophyllales

Summary The evolution of l‐DOPA 4,5‐dioxygenase activity, encoded by the gene DODA, was a key step in the origin of betalain biosynthesis in Caryophyllales. We previously proposed that l‐DOPA 4,5‐dioxygenase activity evolved via a single Caryophyllales‐specific neofunctionalisation event within the DODA gene lineage. However, this neofunctionalisation event has not been confirmed and the DODA gene lineage exhibits numerous gene duplication events, whose evolutionary significance is unclear. To address this, we functionally characterised 23 distinct DODA proteins for l‐DOPA 4,5‐dioxygenase activity, from four betalain‐pigmented and five anthocyanin‐pigmented species, representing key evolutionary transitions across Caryophyllales. By mapping these functional data to an updated DODA phylogeny, we then explored the evolution of l‐DOPA 4,5‐dioxygenase activity. We find that low l‐DOPA 4,5‐dioxygenase activity is distributed across the DODA gene lineage. In this context, repeated gene duplication events within the DODA gene lineage give rise to polyphyletic occurrences of elevated l‐DOPA 4,5‐dioxygenase activity, accompanied by convergent shifts in key functional residues and distinct genomic patterns of micro‐synteny. In the context of an updated organismal phylogeny and newly inferred pigment reconstructions, we argue that repeated convergent acquisition of elevated l‐DOPA 4,5‐dioxygenase activity is consistent with recurrent specialisation to betalain synthesis in Caryophyllales.

Evolution of L-DOPA 4,5-dioxygenase activity allows for recurrent specialisation to betalain pigmentation in Caryophyllales Introduction As sessile organisms, plants have exploited metabolic systems to produce a plethora of diverse specialised metabolites (Weng, 2014). Specialised metabolites are critical for survival in particular ecological niches and are often taxonomically restricted (Weng, 2014). In flowering plants, one remarkable example of a taxonomically restricted specialised metabolite occurs in the angiosperm order Caryophyllales (Brockington et al., 2011;Timoneda et al., 2019). Here, tyrosine-derived betalain pigments have evolved to replace anthocyanins, which are otherwise ubiquitous across flowering plants (Bischoff, 1876;Clement & Mabry, 1996). Betalains are also present in the fungal lineage Basidiomycota (Musso, 1979) and the bacterial species Gluconacetobacter diazotrophicus (Contreras-Llano et al., 2019).
In contrast to the anthocyanin pathway, the betalain biosynthetic pathway is relatively simple, involving as few as four enzymatic steps to proceed from tyrosine to stable betalain pigments: yellow betaxanthins and violet betacyanins (Fig. 1). The core genes encoding betalain synthesis enzymes have been elucidated, primarily through heterologous assays in Saccharomyces cerevisiae and Nicotiana benthamiana, which has emerged as an essential tool for betalain research in planta (Polturak et al., 2016;Timoneda et al., 2018). The key enzymatic step in betalain biosynthesis involves conversion of L-3,4-dihydroxyphenylalanine (L-DOPA) to betalamic acid, the central chromophore of betalain pigments. L-DOPA 4,5-dioxygenase is encoded by the gene DODA, a member of the LigB gene family (Christinet et al., 2004). Within Caryophyllales, a gene duplication in the LigB/ DODA gene lineage gave rise to the DODAa and DODAb clades, with L-DOPA 4,5-dioxygenase activity previously inferred to have evolved at the base of the DODAa lineage (Brockington et al., 2015). On the basis of this Caryophyllales-specific DODAa/DODAb duplication, and subsequent losses of DODAa loci in anthocyanic lineages, we previously argued for a single origin of betalain pigmentation, with multiple reversals to anthocyanin pigmentation (Brockington et al., 2015).
However, evolutionary patterns within the DODAa lineage are complex (Brockington et al., 2015). In addition to the DODAa/DODAb duplication, there have been at least nine duplications in the DODAa lineage resulting in all betalain-pigmented lineages of Caryophyllales containing at least three DODA genesat least one homologue from the DODAb lineage and at least two paralogues from the DODAa lineage, with Beta vulgaris containing five copies of DODAa.I nB. vulgaris, only two DODAa paralogues have been studied (Sasaki et al., 2009; Gand ıa-Herrero & Garc ıa-Carmona, 2012; Hatlestad et al., 2012;Chung et al., 2015;Bean et al., 2018), with one paralogue, BvDODA1 (hereafter termed BvDODAa1), found to exhibit high levels of L-DOPA 4,5-dioxygenase activity and the other paralogue, BvDODA2 (hereafter termed BvDODAa2), exhibiting no or only marginal L-DOPA 4,5-dioxygenase activity. Bean et al. (2018) compared BvDODAa1 and BvDODAa2 and identified seven divergent residues that, when altered in BvDODAa2, were sufficient to allow BvDODAa2 to convert L-DOPA to betalamic acid in yeast. Like BvDODAa2, two DODAa paralogues from other species (Parakeelya mirabilis and a Ptilotus hybrid) have also been shown to have limited or no capacity to produce betalamic acid (Chung et al., 2015). Extensive gene duplication within the DODAa lineage and the conserved presence of paralogues exhibiting no or only marginal L-DOPA 4,5-dioxygenase activity suggests further sub-and/or neofunctionalisation events occurring within the DODAa clade, although the evolutionary significance of this is unclear.
In the current study, we explore the evolution of L-DOPA 4,5dioxygenase activity in Caryophyllales, focusing on paralogy within the DODA lineage. In the context of an updated organismal phylogeny and new pigment data, we select species representing key inferred transitions in pigment gain and loss. We then assess their DODA paralogues for levels of L-DOPA 4,5-dioxygenase activity using an established heterologous assay in N. benthamiana. We use the production of betacyanin in this heterologous assay as a proxy for L-DOPA 4,5-dioxygenase activity, with the relative strength of betacyanin production indicating the relative strength of L-DOPA 4,5-dioxygenase activity between paralogues. By mapping activity to a comprehensive DODA phylogeny, we reveal that multiple acquisitions of high L-DOPA 4,5dioxygenase activity are linked with repeated gene duplication events within the DODAa lineage. Furthermore, we find that marginal levels of L-DOPA 4,5-dioxygenase activity are distributed across the DODAa lineage, implying that marginal levels of activity were present prior to multiple origins of high activity. Recurrent origins of high activity are accompanied by convergent shifts in residues known to be sufficient to confer L-DOPA 4,5-dioxygenase activity. We reconcile these data with inferred patterns of pigment evolution and argue that recurrent acquisition of high L-DOPA 4,5-dioxygenase activity underlies polyphyletic patterns of betalain pigmentation in Caryophyllales.

DNA/RNA extraction and cDNA synthesis
Tissue sampled for extraction was snap frozen and ground frozen using a Tissue Lyser II homogeniser (Qiagen). DNA was extracted using the Qiagen DNeasy Plant Mini Kit and RNA was removed by the Qiagen DNase-Free RNase Set. RNA extraction was carried out using PureLink Plant RNA Reagent (Invitrogen) and a TURBO DNA-free kit (Ambion). Both DNA and RNA quantity and quality were assessed by NanoDrop (Thermo Fisher Scientific, Waltham, MA, USA) and agarose gel electrophoresis. An Agilent Technologies Bioanalyzer (Santa Clara, CA, USA) was used to assess the quantity and quality of RNA for transcriptome sequencing. First-strand cDNA synthesis was performed using BioScript Reverse Transcriptase (Bioline Reagents, London, UK) and an oligo dT primer. All protocols were carried out according to the manufacturers' specifications unless otherwise specified.

Transcriptome and genome sequencing and assembly
Transcriptomes of fresh young leaves of S. chinensis and fresh young leaves and flowers of K. bowkeriana were sequenced at BGI using BGISEQ (Hong Kong, China). Downstream processing and assembly optimisation were performed following Haak et al. (2018). Genome sequencing of C. litoralis, L. aethiopicum, S. chinensis and S. arvensis was performed on a HiSeq X-Ten with one sample per lane and assembled following Pucker et al. (2019). Full details can be found in Supporting Information Methods S1.

Research
New Phytologist the 3 0 end of CrDODAa. For P. campestris, S. marina and T. imperati, degenerate PCR primers were designed based on known DODAa sequences in order to amplify a partial sequence, then inverse PCR was used to obtain the full-length coding sequences, following the protocol described by Ren et al. (2005). The full-length coding sequences for DODA genes were isolated from cDNA or gDNA by PCR using Phusion High-Fidelity DNA polymerase (Thermo Fisher Scientific), and then cloned into pBlueScript SK (New England Biolabs, Hitchin, UK) and verified by Sanger sequencing (Source BioScience, Nottingham, UK); for M. crystallinum, C. gigantea and S. halimifolium, DODA genes were synthesised by BioMatik (Cambridge, Canada), Twist Bioscience (San Francisco, CA, USA) and Integrated DNA Technologies (Iowa, IA, USA), respectively. Oligonucleotides are listed in Table S1 and the sequences have been deposited in GenBank (Table S2).

Species phylogeny and pigment reconstruction
To enable trait reconstruction across the order Caryophyllales, we generated a comprehensive genus-level species tree using publicly available sequence data compiled by PYPHLAWD (Smith & Walker, 2019), with constraints of the backbone topology from Walker et al. (2018) and Thulin et al. (2016Thulin et al. ( , 2018. We calibrated branch lengths to time using TREEPL (Smith & O'Meara, 2012), which implements the penalised likelihood approach of Sanderson (2002). Full details can be found in Methods S3. Pigment data at genus resolution were used to reconstruct the evolution of betalain pigmentation on the time-calibrated genus-level phylogeny of Caryophyllales. We surveyed the literature for pigment data and determined the pigmentation status of 174 genera, classifying them as anthocyanin-pigmented, betalain-pigmented or unknown (Table S3). We reconstructed ancestral states using maximum likelihood (Pagel, 1994(Pagel, , 1999 and Bayesian inference via stochastic mapping (Huelsenbeck et al., 2003;Bollback, 2006), using the R packages APE v.5.0 and PHYTOOLS v.0.6-70, respectively in R v.3.6.0 (Revell, 2012;Paradis & Schliep, 2019;R Core Team, 2019) under an equal rate and an asymmetric rate model. For stochastic mapping, we enforced a prior that the root of Caryophyllales was anthocyanin-pigmented. Full details can be found in Methods S4.

DODA gene phylogeny and ancestral sequence reconstruction
We compiled a dataset of publicly available and early release genome and transcriptome assemblies (Table S4) and used a baited search approach with iterative refinement  to infer a gene tree of DODA sequences in Caryophyllales. Full details can be found in Methods S5. To create a sequence dataset computationally and numerically tractable for ancestral sequence reconstruction, we used a custom python script to subsample the DODAa gene tree (Fig. S1), using a strategy designed to maintain within-paralogue diversity. We created a final dataset of 198 sequences (indicated in Table S5), ensuring that a representative of all functionally characterised DODAa loci was included. Ancestral sequence reconstructions were conducted for codons and amino acids in IQ-TREE v.1.6.10 (Nguyen et al., 2015;Kalyaanamoorthy et al., 2017). All scripts, alignments and trees are available on GitHub (https://github.com/NatJWalker-Hale/DODA). Full details can be found in Methods S6.

Vector generation and transient expression assay
Construction of the multigene vectors containing the genes of the betalain biosynthetic pathway (DODA, BvCYP76AD1, MjcDOPA-5GT) was carried out using MoClo GoldenGate cloning following the protocol described (Engler et al., 2014;Timoneda et al., 2018) in order to produce level 2 binary vectors (Fig. S2). DODA genes were cloned into level 0 vectors using their coding sequence, except PtDODAa for which the gDNA sequence was used. Level 1 vectors were verified by sequencing and level 2 vectors were verified by restriction digests. Transient expression using agroinfiltration of N. benthamiana was performed as described previously . The following were used as controls in every experiment: positive, pBC-BvDODAa1; negative, pLUC (Fig. S2). Upon transient transformation in N. benthamiana, DODA genes encoding enzymes that carry out the L-DOPA 4,5-dioxgenase reaction necessary for betalamic acid production will produce betalains. For instance, the previously characterised B. vulgaris DODA, BvDODAa1 (Hatlestad et al., 2012), exhibits high levels of L-DOPA 4,5-dioxygenase activity as inferred by a high level of betacyanin production under heterologous expression (Polturak et al., 2016;Timoneda et al., 2018). Accordingly, we use the production of betacyanin in this heterologous assay as a proxy for L-DOPA 4,5-dioxygenase activity, with the relative strength of betacyanin production indicating the relative strength of L-DOPA 4,5-dioxygenase activity between loci. The assay is designed to give a clear comparative measure of biologically relevant levels of L-DOPA 4,5-dioxygenase activity in planta. We categorise the levels as high L-DOPA 4,5-dioxygenase activity (hereafter also referred to as 'high activity'), low or marginal L-DOPA 4,5-dioxygenase activity (hereafter also referred to simply as 'marginal activity'), or no L-DOPA 4,5-dioxygenase activity (hereafter also referred to as 'no activity').

Betalain quantification using HPLC
A single sample was taken from each infiltration spot 4 d post-infiltration, snap frozen in liquid nitrogen and stored at À80°C until needed. Samples were homogenised frozen using a single 5 mm glass bead in a Tissue Lyser II homogeniser (Qiagen). Betalains were extracted overnight at 4°C in 80% aqueous methanol containing 50 mM ascorbic acid with a volume of 1 ml extraction buffer per 50 mg fresh weight of leaf tissue. After extraction, the samples were clarified twice by centrifugation at 21 130 g for 10 min. HPLC analysis was performed using a Thermo Fisher Scientific Accela HPLC autosampler and pump system incorporating a photodiode array detector. Betalains were separated using a Luna Omega column (100 A, 5 lm, 4.6 9 150 mm) from Phenomenex (Torrance, CA, USA) under the following conditions: 3 min, 0% B; 3-19 min, 0-75% B; 7 min, 0% B where www.newphytologist.com mobile phase A was 0.1% formic acid in 1% acetonitrile and solvent B was 100% acetonitrile, and at a flow rate of 500 llmin À1 . We quantified the betacyanin compound, betanin, because it has been shown to be the predominant pigment arising from the transient expression assay . Betanin was detected by UV/VIS absorbance at a wavelength of 540 nm. Identification and quantification of betanin was carried out using a commercially available B. vulgaris extract (Tokyo Chemical Industry UK Ltd, Oxford, UK) and a pure betanin standard (provided by F. Gand ıa-Herrero, Universidad de Murcia, Spain). Negative controls (uninfiltrated tissue or pLUC) were set as background and removed from all other samples.

Synteny analysis of gene cluster
Synteny from sequenced genomes was evaluated to explore the conservation of clustering of BvDODAa1 and BvCYP76AD1 as observed in B. vulgaris (Brockington et al., 2015). BvDODAa1 (Hatlestad et al., 2012) and BvCYP76AD1 (DeLoache et al., 2015;Polturak et al., 2016;Sunnadeniya et al., 2016) were identified from the B. vulgaris genome and the related microsynteny was visualised by MCSCANX (Wang et al., 2012). Pairs of homologous genes from the genomes of Amaranthus hypochondriacus, Chenopodium quinoa, B. vulgaris and M. crystallinum were identified using LAST with default parameters (Kiełbasa et al., 2011). Only restricted syntenic regions containing collinear genes along with their neighbouring genes were evaluated.

Results
Ancestral state reconstruction of pigmentation in Caryophyllales suggests at least four origins of betalain pigmentation We inferred a time-calibrated, genus-level maximum likelihood species tree for 640 genera of Caryophyllales, constraining our inference to match the most recent phylogenomic hypotheses (Walker et al., 2018). Our inferred topology and divergence times agree well with the current understanding of Caryophyllales (Walker et al., 2018), with some minor or unsupported incongruences (Figs 2, S3). Our updated pigmentation dataset contains data for 174 genera or 27% of genera represented in the tree topology. Maximum likelihood reconstruction under a symmetric (ER) and an asymmetric (ARD) model produced the same inferences with four transitions from anthocyanins to

Research
New Phytologist betalains predictedin Stegnospermataceae, Amaranthaceae, the raphide clade (sensu Stevens, 2017) and the Portulacineae and one reversal from betalains to anthocyanins in Kewaceae (Fig. S4). The ER model generated slightly more equivocal reconstructions along the backbone of Caryophyllales than the ARD model but statistical support for each model is nearly equivalent (DAIC ARD-ER = 0.23). Posterior probabilities of node states from Bayesian reconstruction under both models similarly suggested four transitions from anthocyanins to betalains along the backbone of the tree and one reversal (Figs 2,  S4).
Betalain-pigmented species are inferred to contain a minimum of three DODA genes with at least two DODAa and one DODAb We used a baited search approach to populate an expanded DODA gene tree containing 318 Caryophyllales species from 34 families (increased from the 95 species and 26 families previously analysed; Brockington et al., 2015). The phylogeny includes denser sampling from the anthocyanin-pigmented lineages Caryophyllaceae and Molluginaceae, and additionally samples the anthocyanin-pigmented lineages Macarthuriaceae, Kewaceae and Limeaceae, and the betalain-pigmented lineage Stegnospermataceae. The phylogeny is mostly congruent with earlier analyses, revealing a gene duplication after the divergence of Physenaceae, resulting in two well-supported paralogues corresponding to DODAa and DODAb clades in Brockington et al. (2015) (Figs S5, S6). Further duplications have occurred, particularly in the DODAa lineage, so that betalain-pigmented species are inferred to contain at least three genes predicted to encode a full-length DODA protein, with at least two DODAa and one DODAb (Figs S5, S6).

DODAa homologues are present in a wide range of anthocyanic lineages across the Caryophyllales
We performed a search for DODA loci across all anthocyanic lineages (Macarthuriaceae, Limeaceae, Kewaceae, Caryophyllaceae and Molluginaceae). Here, we recovered single DODAa genes from seven species representing four early diverging lineages of Caryophyllaceae: P. tetraphyllum (Polycarpaeae), C. ramosissimum and P. campestris (Paronychieae), T. imperati and C. litoralis (Corrigioleae), and S. marina and S. arvensis (Sperguleae; Figs S5, S6). The DODAa genes from two of these species, C. litoralis and P. campestris, were found to have mutations causing premature stop codons (Fig. S7). We then searched for DODA genes in anthocyanic M. australis (Macarthuriaceae), L. aethiopicum (Limeaceae), K. caespitosa (Kewaceae) and P. exiguum (Molluginaceae) using trancriptome and genome sequencing. We detected a DODAb gene in all four species (Figs S5, S6). Single DODAa genes were recovered from M. australis and L. aethiopicum, and two DODAa homologues were recovered from K. caespitosa.A DODAa gene could not be detected in P. exiguum despite c. 2069 short read coverage based on its estimated genome size .

DODAb homologues exhibit no or negligible amounts of L-DOPA 4,5-dioxygenase activity
We selected four betalain-pigmented species that represent each of the inferred origins of betalain pigmentation, three of which are represented by complete annotated genomes: S. halimifolium (Stegnospermataceae), B. vulgaris (Amaranthaceae; Dohm et al., 2014), M. crystallinum (Aizoaceae; W. C. Yim, unpublished) and C. gigantea (Cactaceae; Copetti et al., 2017). S. halimifolium, M. crystallinum and C. gigantea each contain one DODAb and two DODAa genes, and B. vulgaris contains one DODAb and five DODAa genes (Figs S5, S6). The DODA genes for all four species were separately cloned into multigene vectors containing all necessary genes for betalain biosynthesis and these vectors were transiently transformed into N. benthamiana to test their heterologous expression ( Fig. S1; Timoneda et al., 2018). We used the production of betacyanin in this heterologous assay as a proxy for L-DOPA 4,5-dioxygenase activity, with the relative strength of betacyanin production indicating the relative strength of L-DOPA 4,5-dioxygenase activity between loci. Upon heterologous expression, none of the DODAb genes from S. halimifolium, B. vulgaris, M. crystallinum and C. gigantea produced visible betacyanin pigmentation ( Fig. 3a; Timoneda et al., 2018). Traces of betanin were detected for all loci by HPLC, but amounts were extremely low compared to the positive control, BvDODAa1(< 0.1%; Fig. 3b).
A single DODAa homologue in each betalain-pigmented species exhibits high levels of L-DOPA 4,5-dioxygenase activity Using the same procedure as that used to test the DODAb enzymes, we found that only a single DODAa from each study species showed astrongproductionofbetalainpigmentationincludingBvDODAa1 (Figs 3, S8-S11). All paralogues exhibiting high production of betanin in the heterologous assay (which we term as having high levels of L-DOPA 4,5-dioxygenase activity) are hereafter named "a1". The amount of betanin produced by the different orthologues of DODAa1 varies relative to BvDODAa1, indicating that there may be differences in the effectiveness of these DODAa enzymes to convert L-DOPA to betalamic acid (Fig. 3b). For example, ShDODAa1 produced a comparable amount of pigment to BvDODAa1, CgDODAa1p r o d u c e dc. 60% as much as BvDODAa1, and McDODAa1producedc.80%morethanBvDODAa1.
Numerous DODAa homologues in both betalainpigmented and anthocyanin-pigmented lineages exhibit marginal levels of L-DOPA 4,5-dioxygenase activity We found that in each betalain-pigmented species there is at least one DODAa paralogue that exhibits marginal activity (consistent with Chung et al., 2015 andBean et al., 2018). For example, in M. crystallinum, pigmentation was also observed for McDODAa2, albeit at a much lower level than DODAa1f r o m M. crystallinum or any other species (Figs 3a,b, S8-S11). Depending on the strength of transient expression in particular leaves, faint pigment was also sometimes observed for BvDODAa2a n d  (Figs 3b, S9). DODAa enzymes from anthocyanic species M. australis and K. caespitosa were also found to produce a small amount of betanin, as detected by HPLC (Figs 4, S12). MaDODAa produces c.2 . 5 % the amount of BvDODAa1, and KcDODAa1 and KcDODAa2 produce c. 12% and 2% of BvDODAa1, respectively. Within Caryophyllaceae, DODAa from C. ramosissimum and T. imperati produced a small amount of betanin (6% and 2.5% of BvDODAa1, respectively; Figs 4b,c, S12).
Betalain biosynthetic genes, DODAa1 and CYP76AD1,, are colocalised in Amaranthaceae but not in the representative Aizoaceae genome We previously showed that BvDODAa1 and BvCYP76AD1 are part of a putative gene cluster, being located close to one another (< 50 kb) and also in close linkage with the MYB that regulates both of these genes (Keller, 1936;Goldman & Austin, 2000;Hatlestad et al., 2014;Brockington et al., 2015). In C. quinoa and A. hypochondriacus, other Amaranthaceae species for which there is a genome sequence available (Yasui et al., 2016;Lightfoot et al., 2017), these genes are also colocalised (Fig. 5). However, in M. crystallinum in the family Aizoaceae, the locus encoding the high-activity DODAa, McDODAa1, is located on a different chromosome from the McCYP76AD1 orthologue, and the genes are not colocalised despite considerable conservation of synteny between putatively homologous chromosomes (Fig. 5). This analysis is limited to two of the four betalain-pigmented study species because they are currently the only species for which genome assemblies are of sufficient quality to allow syntenic analysis.
Homologues exhibiting high levels of L-DOPA 4,5dioxgenase activity are not monophyletic To understand the evolution of L-DOPA 4,5-dioxgenase activity, we mapped our functional data to the DODA gene tree (Fig. 6), and also mapped functional data from studies for which DODA activity has been tested in planta, either in heterologous transient assays or stable transgenics, or using recombinant expression in Escherichia coli or yeast (Figs 3, 4;   Fig. 3 Betalain-pigmented species have a single DODA that has high activity when heterologously expressed in Nicotiana benthamiana and other DODA that have no or marginal activity. (a) A representative leaf is shown from the agroinfiltration of L-DOPA 4,5-dioxygenase (DODA) genes from Stegnosperma halimifolium, Beta vulgaris, Carnegiea gigantea and Mesembryanthemum crystallinum. The DODA coding sequences were cloned into multigene constructs containing the other structural genes necessary for betacyanin production (BvCYP76AD1, MjcDOPA-5GT). The infiltration spots are labelled according to the DODA variant. For all species, the pBC-BvDODAa1 multigene vector is included as a positive control (P). (b) The betanin content of the infiltration spots was measured using HPLC and data were represented relative to the BvDODAa1 spot present in each biological replicate. The data were combined after calculating relative amounts (RQ) for each species from individual species-specific experiments (Supporting Information Figs S8-S11). Bars show means AE SD; n = 5 for S. halimifolium, B. vulgaris, C. gigantea and M. crystallinum, except for BvDODAa2(n = 4) and BvDODAa3(n = 4). (c) A representative HPLC trace for the DODAa1 gene from each species. The left peak is betanin and the right peak is its isomer, isobetanin. The traces are offset for presentation. 'pLUC' is a negative control plasmid carrying the firefly luciferase gene and 'Betanin' is a commercially available extract from beet hypocotyl which was used to validate the retention time of betanin in these samples. (2020) Table S6). Mapping the functional data to the tree reveals that DODAa homologues exhibiting L-DOPA 4,5-dioxgenase activity have a homoplastic distribution across the DODA phylogeny (Fig. 6). In total, there are three polyphyletic gene lineages containing high levels of L-DOPA 4,5-dioxygenase activity (DODAa1): a lineage specific to Stegnosperma singly represented by ShDODAa1, a clade arising by duplication within Amaranthaceae containing BvDODAa1 and CqDODA-1, and a clade representing the remaining betalain-pigmented lineages, and containing CgDODA1, MjDODAa1, McDODAa1, PmDOD and PgDODA (Table S6). Each clade containing high-activity DODAa paralogues is sister to clades containing marginal activity DODAa paralogues, and in the case of Amaranthaceae, the DODAa1 clade is nested within clades exhibiting no or only marginal levels of L-DOPA 4,5dioxgenase activity (Fig. 6).

New Phytologist
Ancestral sequence reconstruction indicates that high L-DOPA 4,5-dioxygenase activity is a derived state within the DODAa lineage A recent study characterised seven residues that are important for L-DOPA 4,5-dioxygenase activity (Bean et al., 2018). We carried out ancestral sequence reconstruction using coding sequences on a reduced DODAa gene tree (Figs S13, S14) and observed that the three putative origins of highly active DODAa are marked by three convergent shifts in these seven residues (Figs 6, 7), which occur post-gene duplication. For the residues inferred for the clade containing the M. crystallinum and C. gigantea highly active DODAa (DDFNDDI; Fig. 7), four out of seven of the predicted residues are identical to those inferred for the Amaranthaceae highly active DODAa clade (DDYNDET). For Stegnosperma, the motif is more divergent, with three residues identical to the   KcDODAa2). The DODA coding sequences were cloned into multigene constructs with the other structural genes necessary for betacyanin production (BvCYP76AD1, MjcDOPA-5GT). The pBC-BvDODAa1 multigene vector was included as a positive control (P). (b) Betanin content of the infiltration spots was measured using HPLC. Amounts of betanin were calculated relative (RQ) to the average amount of betanin present in BvDODAa1 infiltration spots for each species and the data combined into one graph (see Supporting Information Fig. S12 for species-specific data). Bars show means AE SD; n ≥ 3. (c) A representative HPLC trace for each DODAa gene from each species is shown. The left peak is betanin and the right peak is its isomer, isobetanin. The traces are offset for presentation. 'pLUC' is a negative control plasmid carrying the firefly luciferase gene and 'Betanin' is a commercially available extract from beet hypocotyl, which was used to validate the retention time of betanin in these samples. Amaranthaceae betalain-producing DODAa clade and two residues identical to the clade containing the M. crystallinum and C. gigantea highly active DODAa. The highly active DODAa are derived from ancestral nodes with inferred motifs that share more similarity with the marginal or no activity DODAa lineages (XGFNN[N/D]T), and this motif is highly conserved across the backbone and represented at almost all ancestral nodes (Figs 6,  7). Amino acid reconstruction gave similar results (Figs S15, S16).

Discussion
Polyphyletic patterns of elevated L-DOPA 4,5-dioxygenase activity support the recurrent specialisation to betalain pigmentation Since our earlier ancestral state pigment reconstructions (Brockington et al., 2011(Brockington et al., , 2015, phylogenomic data have advanced new hypotheses for relationships within Caryophyllales (Walker et al., 2018) and the previously uncharacterised Limeaceae and Simmondsiaceae have been shown to be anthocyanic (Thulin et al., 2016). Here, we explicitly account for the large number of taxa for which pigmentation status is unknown and constrain our phylogenetic hypothesis to match the latest phylogenomic study of Walker et al. (2018), which places betalainpigmented Stegnospermataceae as sister to anthocyanic Macarthuriaceae. In the context of these new data (Fig. 2), our reconstructions do not infer a single evolution of betalains, as previously suggested (Brockington et al., 2015), but rather imply up to four separate origins of betalain pigmentation in Caryophyllales.
To explore the hypothesis of multiple origins of betalain pigmentation suggested by our reconstructions, we then selected four species that represent each of the putative origins: S. halimifolium, B. vulgaris, M. crystallinum and C. gigantea. Using heterologous transient assays, we inferred relative levels of L-DOPA 4,5-dioxygenase activity based on the proxy of betanin production (Fig. 3). Our data show that activity is barely detectable in DODAb loci from betalain-pigmented species, supporting our original hypothesis that the high levels of L-DOPA 4,5-dioxygenase activity evolved exclusively within the DODAa lineage (Brockington et al., 2015). However, although betalainpigmented species have multiple paralogues of DODAa genes, only one of these DODAa in each species encodes a protein which exhibits high levels of L-DOPA 4,5-dioxygenase activity. All betalain-pigmented species also exhibit DODAa paralogues with no or only marginal L-DOPA 4,5-dioxygenase activity and we also detected marginal activity in several anthocyanic taxa (Fig. 4). Thus, marginal activity is widespread among Caryophyllales DODAa enzymes, suggesting broader underlying catalytic promiscuity in the DODAa lineage. Catalytic promiscuity is well documented in the broader protocatechuate dioxygenase gene family of which the LigB/DODA lineage is a member (Burroughs et al., 2019). Such promiscuity is an important feature of metabolic evolution, potentially conferring evolvability (Weng & Noel, 2012;Leong & Last, 2017), and may have significant implications for the recurrent evolution of betalain pigmentation, as discussed below.
Previous analysis of the genome of B. vulgaris revealed a putative 'gene cluster' (sensu Osbourn, 2010) in which DODAa and CYP76AD1 are colocalised in chromosome 2 (Brockington et al., 2015). The B. vulgaris locus with high levels of L-DOPA 4,5dioxygenase activity falls within this operon, while the paralogues with no or only marginal L-DOPA 4,5-dioxygenase activity occur outside of the gene cluster, supporting the concept of a betalain gene cluster in B. vulgaris. Our analysis, which describes the relative synteny of homologous loci between genomes, also shows that the betalain gene cluster appears to be conserved in the genomes of C. quinoa and A. hypochondriacus (Fig. 5). These divergent species all belong to the family Amaranthaceae and represent one putative origin of betalain pigmentation (Fig. 2, origin no. 2). However, CYP76AD1 and DODAa1 are not clustered in the genome of M. crystallinum, which represents a different putative origin (Fig. 2, origin no. 3). Therefore, the absence of  Fig. 6 The DODA gene tree shows a homoplasious distribution of functionally characterised DODA genes that produce a high level of betalain pigments. The maximum likelihood phylogeny of Caryophyllales L-DOPA 4,5-dioxygenase (DODA) genes was inferred from coding sequences derived from genomes and transcriptomes. Branch lengths are expected number of substitutions per site. Scale bar gives 0.2 expected substitutions per site. Branch labels aresupport values for major paralogous clades from rapid bootstrapping and the SH-like Approximate Likelihood Ratio Test, respectively, given as RBS/SH-aLRT.Putative major duplication nodes are highlighted with stars. Branches are coloured according to the putative pigmentation state of the taxa (blue, anthocyanin; pink, betalain). Labelled tips show functionally characterised DODAs and shaded squares correspond to DODA activity (white, no activity; grey, marginal activity; black, high activity). Asterisks indicate putative pseudogenes. Annotated at tips is a colour-coded alignment of the seven residues conferring high activity reported in Bean et al. (2018) Mapping loci encoding functionally characterised DODA proteins to a comprehensively taxon-sampled DODA gene tree, we found that within the DODAa lineage, loci encoding proteins with high levels of L-DOPA 4,5-dioxygenase activity are not monophyletic (Fig. 6). Specifically, each clade containing highactivity DODAa paralogues is sister to clades containing marginal activity DODAa, suggesting polyphyletic origins of high activity, associated with gene duplication within the DODAa lineage. Previous research identified residues in seven sites which were necessary and sufficient to confer higher levels of L-DOPA 4,5-dioxygenase activity (Bean et al., 2018). Phylogenetic reconstruction of these seven residues across the DODAa clade showed that polyphyletic clades containing proteins with high activity have distinctive motifs for these seven residues (Figs 6, 7). Furthermore, motifs associated with high activity evolved at least three times from a background of motifs more similar to those proteins with no or marginal L-DOPA 4,5-dioxygenase activity. The diversity we recognise at these motifs in high-activity DODAa sequences (e.g. between BvDODAa1 and ShDODAa1) indicates that high activity may arise from divergent sequence motifs and represent molecular convergence at key functional residues.
Intriguingly, the origins of high L-DOPA 4,5-dioxygenase activity following gene duplication, and associated residue shifts, are congruent with at least three of the four origins of betalain pigmentation inferred from our pigment reconstructions (Fig. 8).
On the basis of these integrated observations we argue that betalain biosynthesis evolved multiple times in concert with recurrent gene duplication and neofunctionalisation within the DODAa clade, rather than as a single event at the base of the DODAa clade. Specifically, data in support of this model include: a background of marginal levels of L-DOPA 4,5-dioxygenase activity implying inherent evolvability of the ancestral enzyme (see following paragraph); polyphyletic origins of high L-DOPA 4,5dioxygenase activity coincident with multiple inferred origins of betalain pigmentation; and derived and convergent shifts in key residues underpinning high L-DOPA 4,5-dioxygenase activity. Together, this model explains and conceptually links the homoplastic distribution of betalain lineages and the high levels of gene duplication observed in the DODAa clade.
Polyphyletic origins of betalain pigmentation occur only within Caryophyllales (Fig. 2), while polyphyletic origins of high L-DOPA 4,5-dioxygenase activity are constrained to the Caryophyllales-specific DODAa clade (Fig. 6). Both patterns link to the concept of evolutionary precursors in which underlying evolutionary state(s) potentiate recurrent evolution of subsequent complex traits (sensu Marazzi et al., 2012). In this scenario, we propose that an initial precursor step was the evolution of tyrosine hydroxylase activity by duplication in the CYP76AD lineage, leading to an abundance of L-DOPA (DeLoache et al.,  Polturak et al., 2016;Sunnadeniya et al., 2016), the necessary substrate for betalamic acid biosynthesis. Given an abundance of L-DOPA, a subsequent precursor step was duplication in the DODA lineage that gave rise to a clade of DODAa enzymes, whose ancestral function is currently unknown, but presumably with some promiscuous ability to act on L-DOPA to produce trace betalamic acid. Subsequently, further repeated duplication within the DODAa lineage led to recurrent neofunctionalisation towards high levels of L-DOPA 4,5-dioxygenase activity and the production of betalamic acid.
Such a model, in which similar enzymatic function has arisen repeatedly from homologous but nonorthologous enzymes, is not unprecedented. Many studies indicate that this form of convergent evolution is widespread in specialised plant metabolism (Pichersky & Lewinsohn, 2011). For example, the enzymes that methylate purine intermediates in caffeine biosynthesis in Coffea vs Thea evolved from different branches of the SABATH carboxyl methyltransferase family (Yoneyama et al., 2006); and disparate origins of pyrrolizidine alkaloids have arisen through recurrent evolution of homospermidine synthase from the ubiquitous enzyme deoxyhypusine synthase (Reimann et al., 2004). However, the detection of this same phenomenon in the evolution of betalains is perhaps surprising, because the retention of enzymes for anthocyanin biosynthesis in betalain-pigmented species (Shimada et al., 2004(Shimada et al., , 2005 has encouraged the assumption that reversals to anthocyanin are more likely than multiple shifts to betalain pigmentation (Brockington et al., 2011(Brockington et al., , 2015. Reconciling polyphyletic origins of high L-DOPA 4,5dioxygenase activity with DODAa gene loss It has always been striking that each major betalain-pigmentated clade is subtended at, or towards, its base by an anthocyanic lineage (Brockington et al., 2011). Given this essential pattern, our trait reconstructions suggest multiple origins of betalain pigmentation. In turn, this implies that many lineages are anthocyanic through retention of an ancestral state, rather than through reversal from betalain pigmentation (Brockington et al., 2011;Fig. 2). Yet, we earlier reported that DODAa are lost or down-regulated in anthocyanic Caryophyllaceae and Molluginaceae, and previously argued that loss of DODAa in these anthocyanic lineages is consistent with reversals from betalain pigmentation to anthocyanin pigmentation (Brockington et al., 2015). However, the emerging picture is more complex, and with new data presented here, we find: evidence of DODAa loci in all but one of the anthocyanic lineages in Caryophyllales; evidence of DODAa gene loss in Caryophyllaceae and Molluginaceae, but no evidence of DODAa gene loss in the anthocyanin-pigmented lineages, K. caespitosa, M. australis and L. aethiopicum; and evidence of DODAa loci with marginal L-DOPA 4,5-dioxygenase activity within anthocyanic M. australis, K. caespitosa, C. ramosissimum and T. imperati. Clearly, evolutionarily disparate anthocyanic lineages show different patterns of molecular evolution with respect to DODAa. Given this complex evolutionary milieu, and in the face of compelling evidence for polyphyletic origins of elevated L- DOPA 4,5-dioxygenase activity, below we propose two alternative hypotheses to explain loss of DODAa loci in anthocyanic lineages.
1 The patterns we detect may suggest lability in early stages of betalain evolution. Anthocyanins and betalains have never been found to co-occur, but it is possible that the two classes of pigments did co-occur in early evolutionary stages. In this scenario, repeated evolution of increased L-DOPA 4,5-dioxygenase activity allowed for betalain pigmentation, initially co-occurring with anthocyanins. However, evolution of an integrated betalain pathway requires more than biosynthetic enzymes (e.g. the recruitment of the MYB transcriptional regulators; Hatlestad et al., 2014). Therefore, establishment and enhancement of the betalain pathway was only achieved in certain lineages, those which specialised to betalain pigmentation. By contrast, other lineages arising close to the origins of elevated L-DOPA 4,5-dioxygenase activity specialised to anthocyanins rather than betalains, ultimately losing the DODAa paralogues with elevated L-DOPA 4,5-dioxygenase activity. In these cases, anthocyanic lineages may have retained anthocyanins from ancestors in which both pigments coexisted, and inferences of reversals to anthocyanins based on DODAa loss are misleading. This scenario is appealing because we do not see any anthocyanin lineages that are nested deeply within the betalain-pigmented clades that stem from each inferred origin (Fig. 2), and so shifts to anthocyanin pigmentation seem less likely with evolutionary distance from inferred origins of betalain pigmentation.
2 Previously, Lopez-Nieves et al. (2018) speculated that the evolution of tyrosine-derived betalains occurred in a metabolic environment enriched for tyrosine. The arogenate dehydrogenase enzyme (ADHa) responsible for increased tyrosine availability is also lost and/or down-regulated in the anthocyanic Caryophyllaceae and Molluginaceae lineages . Therefore, the shift to anthocyanins in these taxa may indicate deeper shifts away from a tyrosine-rich metabolism towards a metabolism in which phenylalanine plays a canonical role. The primary function of the DODAa paralogues with marginal L-DOPA 4,5-dioxygenase activity is unknown, but may catalyse production of tyrosine-derived metabolites, other than betalains. In this scenario, the duplication that gave rise to the DODAb and DODAa lineages led to neofunctionalisation in the DODAa lineage towards an unknown but tyrosine-derived enzymatic activity, with marginal L-DOPA 4,5-dioxygenase activity. Loss of DODAa in Caryophyllaceae and Molluginaceae could instead reflect loss of the unknown tyrosine-derived enzymatic activity, in the context of shifts towards more phenylalanine-biased metabolism, independent of origins of high L-DOPA 4,5-dioxygenase activity.

Conclusion
The evolutionary origin of betalain pigments in Caryophyllales and the processes that led to their homoplastic distribution have been the subject of much debate. Our new data and analyses offer compelling evidence for recurrent specialisation to betalain biosynthesis. Specifically: a background of marginal levels of L-DOPA 4,5-dioxygenase activity implying inherent evolvability; polyphyletic origins of high L-DOPA 4,5-dioxygenase activity coincident with multiple inferred origins of betalain pigmentation; derived and convergent shifts in key residues underpinning high L-DOPA 4,5-dioxygenase activity; and a lack of conservation of the betalain metabolic gene cluster between putative origins of betalain pigmentation. However, our hypothesis requires future experimentation. First, it will be important to identify the primary function of those DODAa proteins that only exhibit marginal L-DOPA 4,5-dioxygenase activity. Elucidation of this unknown function will further inform the inferences made in this study and direct future hypotheses. Further validation could also emerge by considering other aspects of the betalain biosynthesis pathway. For example, as exemplified by our syntenic analyses, it may be possible to discern the signal of multiple origins at different hierarchical levels of the betalain pathway, including: patterns of gene clustering by genomic colocalisation; patterns of co-option of transcriptional regulators and other genes; and the dissection of the molecular convergence signal at key residues. It is fortunate here that three of the putative origins of betalain biosynthesis are represented by well-resourced experimental systems: Portulaca grandiflora, Mirabilis jalapa and B. vulgaris, which together promise rapid progress in this era of betalain renaissance.

Supporting Information
Additional Supporting Information may be found online in the Supporting Information section at the end of the article.                Methods S1 Genome assembly.
Methods S2 Identification of full complements of DODA genes from betalain-pigmented species for functional analysis.
Methods S5 Gene tree of Caryophyllales DODA genes.

Table S1
Oligonucleotide primers used in this study.     New Phytologist is an electronic (online-only) journal owned by the New Phytologist Trust, a not-for-profit organization dedicated to the promotion of plant science, facilitating projects from symposia to free access for our Tansley reviews and Tansley insights. Regular papers, Letters, Research reviews, Rapid reports and both Modelling/Theory and Methods papers are encouraged. We are committed to rapid processing, from online submission through to publication 'as ready' via Early View -our average time to decision is <26 days. There are no page or colour charges and a PDF version will be provided for each article.
The journal is available online at Wiley Online Library. Visit www.newphytologist.com to search the articles and register for table of contents email alerts.