Comparative analysis of four medicinal floras: Phylogenetic methods to identify cross-cultural patterns

Summary 
Four medicinal floras were compared using phylogenetic methods, to test whether there are shared patterns in medical plant use at the level of the whole medicinal floras, or for specific therapeutic applications. 
Checklists of the native plants and medicinal plants of Oman were compiled, and analyzed alongside existing checklists for Nepal, the Cape of South Africa and New Zealand. We reconstructed a plant phylogeny at generic level for Oman, and a new, more inclusive phylogeny to represent the genera found in all four local floras. Methods from community phylogenetics were used to identify clustering and overdispersion of the plants used. The impacts of using local or more inclusive phylogenies and different null model selections were explored. 
We found that Omani medicinal plant use emphasizes the same deep lineages of flowering plants as the other three medicinal floras, most strongly when comparing Omani and Nepalese medicinal plants. Drivers of this similarity might be floristic composition, opportunity for exchange of knowledge and shared beliefs in the causation of illness. Phylogenetic patterns among therapeutic applications are cross‐predictive within and between cultures, and must be interpreted with care since inappropriate use of null models can result in spurious similarity. High levels of cross‐predictivity suggest that targeting plants used for specific therapeutic applications to identify specific bioactives may have limited value. 
We outline the questions that might be addressed using a global phylogeny and medicinal plant checklists, suggest the best methods for future studies and propose how findings might be interpreted.


Societal Impact Statement
Plants are living repositories of pharmacologically active chemicals and help to meet society's health care needs directly, or by providing natural products for drug development. We describe phylogenetic approaches to compare medicinal floras from different cultures in distinct regions of the world, and consider how these findings can improve knowledge of how plants have been selected for medical purposes. Greater insight into how people have selected plants for medicinal use will benefit healthcare and drug discovery strategies, and ultimately contribute to the future health and well-being of society.

Summary
• Four medicinal floras were compared using phylogenetic methods, to test whether there are shared patterns in medical plant use at the level of the whole medicinal floras, or for specific therapeutic applications.
• Checklists of the native plants and medicinal plants of Oman were compiled, and analyzed alongside existing checklists for Nepal, the Cape of South Africa and New Zealand. We reconstructed a plant phylogeny at generic level for Oman, and a new, more inclusive phylogeny to represent the genera found in all four local floras. Methods from community phylogenetics were used to identify clustering and overdispersion of the plants used. The impacts of using local or more inclusive phylogenies and different null model selections were explored.
Phylogenetic approaches offer insights distinct from the widely used taxon-based cross-cultural investigations. In one application, phylogenetic approaches are able to identify and quantify shared preferences for plants overall (Saslis-Lagoudakis et al., 2014;Teixidor-Toneu et al., 2018;Thompson et al., unpublished). In cross-cultural studies such as these, metrics describe the related- Although phylogenetic methods are a powerful tool to explore global patterns in medicinal plant use, the scope of this approach is still poorly understood. For instance, there are no accounts yet that explore the outcomes of adding further local floras and medicinal floras to an existing phylogenetic analysis of three floras (Saslis-Lagoudakis et al., 2012). This study was among the first to use community phylogenetic tools to explore patterns of medicinally used plants. Saslis-Lagoudakis et al. (2012) revealed clustering of medicinal floras within the floras they were drawn from. They also revealed shared phylogenetic patterns across the floras: medicinal floras were more closely related than would be expected if there were not shared preferences.
The finding was taken to indicate independent discovery of plant efficacy, an interpretation supported by significant over-representation of proven bioactive species in shared lineages. In the present study, we compare the medicinal flora of Oman to those of three medicinal floras contrasted in the previous study by Saslis-Lagoudakis et al. (2012), the medicinal floras of New Zealand, South Africa and Nepal. These four represent different ecoregions. Oman, located in the Southeast of the Arabian Peninsula at the meeting point between Africa and Asia, has over 1,200 vascular plant species over half of which are annuals, flowering irregularly from year to year according to the timing and amount of rain (Ghazanfar, 2003). Mountainous areas are especially diverse floristically and include endemic species (around 5% of the species in Dhofar, southern mountains; Ghazanfar, 2003). New Zealand comprises an archipelago of three main islands in the southern Pacific Ocean, and has a flora of approximately 1900 species, 45% of which are endemic (Wilton & Breitwieser, 2010). The Cape of South Africa, located at the south and south-eastern tip of Africa has over 9,000 species of which 70% are endemic (Goldblatt & Manning, 2002). The flora of Nepal, a country spanning from lowland plains (Terai) to the highest Himalayan peaks, is estimated to have between 6,000 and 6,600 species of flowering plants (Press, Shrestha, Sutton, & Carneiro, 2000), of which around 4% are endemic (Shrestha & Joshi, 1996).
The medical tradition of Arabic regions is a pluralistic system that includes Prophetic medicine, concerned with spirit aetiologies, and Galenic humoral medicinal, concerned with environmental factors (Greenwood, 1981). The Galenic humoral system or "Unani tibb" underlies traditional medicine in northern and central Oman (Ghazanfar & Al-Sabahi, 1993) and is also part of the pluralistic medical system in southern Oman (Miller & Morris, 1988). Knowledge about herbal medicines in Oman is not traditionally written down but is passed orally from one generation to the next (Ghazanfar & Al-Sabahi, 1993). Until • We outline the questions that might be addressed using a global phylogeny and medicinal plant checklists, suggest the best methods for future studies and propose how findings might be interpreted.

K E Y W O R D S
cross-cultural, ethnobotany, medicinal flora, methods, phylogeny, therapeutic application recently, traditional medicine was the only available healthcare strategy in Oman (Ghazanfar, 1994;Ghazanfar & Al-Sabahi, 1993), but treatments using traditional medicines have become less popular in the past two decades because of the establishment of hospitals. However, minor ailments such as headaches, colds, fever, and stomach upsets, are still treated using medicinal plants (Ghazanfar & Al-Sabahi, 1993). Nepal is a mosaic of cultures with over 75 ethnolinguistic groups (mainly of Tibetan-Burman or Indio-European origin), resulting in diverse healthcare strategies (Gaenszle, Turin, Tuladhar-Douglas, & Chhetri, 2015). As in Oman, in Nepal scholarly, written medical systems are used alongside oral folk medicine. Unani medicine is one of these scholarly medicines, together with the more popular Ayurveda and Tibetan medicine (Gewali, 2008). Moreover, in Nepal shamanistic medicine is practiced when a person's illness is believed to be caused by a spirit possession (Gewali, 2008). Formal, written medical systems were not traditionally used in New Zealand or the Cape South Africa. The native people of New Zealand, the Māori, are of Polynesian origin and had developed an independent ethnopharmacopoeia due to their isolation between settlement (around 1,300 S.D; Wilmshurst, Anderson, Higham, & Worthy, 2008) and European colonization (18th century). Since contact with Europeans it is likely that their medical system was influenced by newcomers, but the precolonial medicinal flora is well documented. The Cape of South Africa is populated by various ethnic groups, but according to Kale (1995), their traditional medicine is essentially similar and based on supernatural belief. Different traditional healers play a key role: "isangomas" are mostly female spiritual healers that also address social causes of illness and "inyangas" are mostly male and use herbs and medicines to treat people (South African History Online, 2018).
The primary objective of this study is to determine whether the medicinal plants of Oman, (a) overall and (b) for specific therapeutic applications, are more closely related to those of New Zealand, the Cape of South Africa and Nepal than expected by chance. The floras of New Zealand, the Cape of South Africa and Nepal were selected for the original study to represent cultures with negligible pre-colonial contact (Saslis-Lagoudakis et al., 2012). By introducing Oman, patterns of putatively shared knowledge across disparate floras but culturally connected people are characterized for the first time. A further objective is to consider the questions that might be addressed by a global phylogenetic survey, and to outline methods appropriate to address these questions.

| Data collection
A checklist list of the plant species of Oman was compiled from the four volumes of the Flora of Oman (Ghazanfar, 2003(Ghazanfar, , 2015(Ghazanfar, , 2018Ghazanfar & Patzelt, 2007;Kearse et al., 2012). As in other crosscultural studies (i.e., Moerman et al., 1999), our checklist only includes angiosperms because gymnosperm data were not readily available. A checklist of the native medicinal species of Oman was compiled from the medicinal flora of northern Oman (Ghazanfar & Al-Sabahi, 1993) and the medicinal flora of southern Oman (Miller & Morris, 1988). The applications of medicinal plants were classified into 12 therapeutic applications (gastro-intestinal, general, gynaecology/fertility, dentistry/mouth, musculoskeletal, neurology, ophthalmology, otorhinolaryngology, other, respiratory/pulmonary, skin, and urinary) following Saslis-Lagoudakis et al.

| DNA sequences and phylogenetic reconstruction
A phylogeny for the Omani flora was reconstructed at the genus level, following Saslis-Lagoudakis et al., (2012). A list of the 452 genera present in Oman was prepared and a single exemplar rbcL sequence downloaded from Genbank for each genus using the Geneious software (Kearse et al., 2012). The plastid DNA marker rbcL was used because of the availability of data for this marker, its successful amplification among plant lineages, and its ability to resolve phylogenetic relationships in large-scale studies (Chase et al., 1993;Forest et al., 2007;Savolainen et al., 2000) . Where possible, we selected a species present in the Omani flora, but in cases where a DNA sequence for Omani species was not available, a Saudi Arabian or other species was selected; 361 exemplar sequences were downloaded from Genbank, therefore we compiled genetic data for ∼80% of the genera present in the flora. When the Genbank names did not correspond to those in the flora of Oman, a recognized synonym from the  (Hall, 1999) and adjustments were made manually. A phylogenetic tree of relationships of Oman flora was reconstructed under maximum likelihood criterion using RAxML (Stamatakis, Hoover, & Rougemont, 2008).
Data for the reconstruction of a combined phylogeny including the floras of the Cape of South Africa, Nepal and New Zealand were those used by Saslis-Lagoudakis et al. (2012). The sequence alignments were combined with the Omani sequences to construct a phylogenetic tree representing genus-level plant relationships among all four floras. For this analysis, when a genus was present in more than one flora it was included only once in the analysis. The alignment of the combined matrix was performed in MAFFT v.7 (Katoh, 2013) with manual adjustments perfomed in BioEdit v. 7.0 (Hall, 1999). Sequence data were analyzed under the maximum-likelihood (ML) criterion implemented in RAxML (Stamatakis et al., 2008). Rate smoothing was implemented in the ape package (Paradis, Claude, & Strimmer, 2004), using the chronoMPL function. This model-free method applies the mean path length for each node to all descendants for dating (Britton, Oxelman, Vinnersten, & Bremer, 2002). Where this method introduces very short negative branches, they were converted to zero-length branches.

| Data analysis
PHYLOCOM version 4.2 (Webb, Ackerly, & Kembel, 2008) was used for phylogenetic interpretation. To run an analysis, a phylogeny written in Newick format and a sample file are needed. For ecological applications, the sample files typically describe presence or absence of taxa in different communities. In our work, the sample files can describe different categories, such as presence or absence of a genus in a local flora, presence or absence of the medicinal use of a genus, or presence or absence of use of genera for specified therapeutic applications.
Three functions in PHYLOCOM, comstruct, comdist, and comdistnt, were employed to calculate two phylogenetic metrics, mean phylogenetic distance (MPD) and mean nearest taxon distance (MNTD). Comstruct is used to test whether a category specified by a sample file has phylogenetic structure. Comdist and Comdistnt are used to test for phylogenetic relatedness between categories specified by a sample file, either at deep (Comdist) or shallow (Comdistnt) levels. In general, MNTD is influenced by patterns near the tips, whilst MPD is informative of phylogenetic structure deeper in the phylogeny. Net Relatedness Index (NRI) and Nearest Taxon Index (NTI) are the standardized effect size (z-scores) of MPD and MNTD respectively according to the equations described in PHYLOCOM user's manual 4.2: In these equations, MPD random and MNTD random represent values calculated for null communities, where null communities are random samples. Sampling to generate the null communities can be implemented in different ways, according to different null models. There are two null models that we use here (Table 1). For example, under null 0 which samples any plant in the phylogeny to make a null community, and where the phylogeny only comprises genera in the local flora, comstruct analysis can be used to test whether medicinal genera (a category in the sample file) are a phylogenetically structured subset of plants in the local flora. If the phylogeny includes genera that are not in the local flora, the same question can be asked if the sample file specifies plant genera in the local community as a category. This is because, under null model 1 the null community is sampled from genera that appear in any category in the sample file ( Figure 1).
In these analyses, NRI and NTI values are outputs along with their rankLow and rankHigh values. The rankLow/High values describe the number of actual comparisons for which the observed distance in the sample is shorter/longer than the null community. From rankLow/High values, two-tailed p-values are calculated. Positive NRI/NTI values represent phylogenetic clustering, negative values indicate phylogenetic over-dispersion. NTI or NRI values > 1.96 or <−1.96 are considered significant at an alpha threshold of p < .05 (in a two-tailed p test, corresponding to p < .025 and p > .975 respectively).
Here, we carry out six tests; their purpose and the metric and function, phylogeny and null model are described in full in Table 1.
Tests 1 and 2 have equivalent aims. In both cases the analysis sets out to determine whether the medicinal flora (in this case the medicinal genera of Oman) is clustered relative to the local flora (in this case the generic-level flora of Oman). In the first test the local flora is delimited by the phylogeny. In other words, the sequences used to reconstruct the phylogeny represent only the genera from the local flora. Therefore, a null model that draws from the whole phylogeny to make a null community is used (equivalent to null model 0 in Phylocom; Webb et al., 2008). In the second test, the phylogeny included genera that are not in the local flora. Therefore, the sample file must be used to delimit the local flora and the medicinal flora, and the null community is drawn from the local flora by specifying a null model that will draw the null community from the sample file. This is Phylocom's null model 1, specifically "for each sample, species are drawn without replacement from the list of all species actually occurring in at least one sample" (page 20, Webb, Ackerly, & Kembel, 2011). Figure 1 explains the relationship between null models and sample files for local and combined phylogenies. Comparison of tests 1 and 2 can be made to determine the influence of a phylogeny built to represent a local flora, and one that is more inclusive.
Test 6 considers shallow, tip-level relatedness between therapeutic applications; for every combination of place and therapeutic application, it is tested whether the closest relatives of medicinal genera are used significantly more often than in a null sampled from all medicinal genera.
The comparisons between medicinal floras at the generic level for the 13 therapeutic applications (Tests 4, 5 and 6) are presented in Figure 2 (a-c respectively). Test 4 shows that under null 0, use for a therapeutic application in one flora predicts often use for the same therapeutic application in other floras. However, similarly high levels of cross-predictivity between therapeutic applications at deep phylogenetic levels are also revealed (Figure 2a). Using null model 1 rather than null model zero, so that the null is sampled from the medicinal plants rather than the whole phylogeny (the flora), the

F I G U R E 1
The relationship between null models and sample files for local and combined phylogenies. Test 1: Using null model 0 to test the structure of Omani medicinal genera in Omani flora, the Oman flora was all included in the local phylogeny. Test 1A: Local phylogeny represent the Oman flora; each orange colour bar indicates the position of medicinal used genus. Test 1B: A part of the sample file used for test 1. The first column is the sample, the second is the abundance (in this study it was all set to 1), the third is the species code which should be identical with the tip label in the phylogeny. Test 2: Using null model 1 to test the structure of Omani medicinal genera in the combined phylogeny. Test 2A: combined phylogeny including Oman flora; each orange colour bar indicates the position of medicinal used genus. Green bars indicate genera in the Omani flora. Test 2B: A part of the sample file used for test 2. The first column is the sample, the second is the abundance (in this study it was all set to 1), the third is the species code which should be identical with the tip label in the phylogeny. Tree figures prepared using iTOL (https://itol.embl.de/itol.cgi) Test

| D ISCUSS I ON
Phylodiversity metrics, first used in conservation biology then in macroecology and community ecology, have become a critical component of modern ecology (Tucker et al., 2016). Increasingly complete phylogenetic hypotheses and better understanding of the power and reach of the approaches have enabled a diversity of applications. The development of the field has led to a proliferation in phylodiversity metrics, with at least 70 metrics available (Tucker et al., 2016). Borrowing the appropriate metric from ecology to address questions in a different field demands good understanding of the metrics available. Since one of our goals here is to make phy-   Table 1). Thus, we apply these divergence methods to assess alpha phylodiversity or sample structure (Tests 1 and 2, Table 1), but also beta phylodiversity or the dissimilarity between samples (Tests 3 to 6, Table 1). We choose these metrics because of the simplicity of using the same metric for alpha and beta-phylodiversity, and also because MPD is an "anchor" metric, one with well-known properties that lies at the centre of a constellation of less well-known, similar metrics (Tucker et al., 2016).
The study we present here draws out methodological issues. In this case any plant present in the phylogeny can be included in the null, and the large number of green cells points to lineages that, relative to plants as a whole, are used for multiple therapeutic applications. Block B has many green cells (clustering) for comparisons between therapeutic applications within Nepal, also within Oman, and for many Nepal-Oman comparisons. Green cells are few between other pairs of cultures, and there is overdispersal for some therapeutic applications. Since the null is sampled from medicinal plants, the effect of the overall clustering of medicinal plants is removed and there is much less evidence of the same lineages used for different therapeutic applications than is apparent in block A. Block C has only has green cells when within-flora comparisons are made. Many comparisons between floras show overdispersal (red cells). The closest relative of a genus with any medicinal use is most likely found in the flora in which that genus is found, regardless of therapeutic application. The effect of floristic composition is strong when investigating tip-level relationships effort should be made to generate a better phylogeny at least by including non-local taxa as placeholders. There are three methodological issues that we do not explicitly explore here: the completeness of sampling, the hierarchical level of sampling, and the rate-smoothing approach taken. Considering completeness of sampling, in the case of Oman and Cape of South Africa our phylogenies include approximately 80% of the genera found the local flora, and our best represented flora is New Zealand with more than 88% of genera in the phylogeny. Although Tests 1 and 2 explore taxon sampling effects due to the use of local versus more inclusive phylogenies, we did not consider the effects of missing genera.
Others have explored this, for example Jantzen et al. (2019)  This is because NRI values describe a deep relationship in the phylogeny, and generic-level data could provide evidence for clustering of interest at tribal or sub familial levels. If the study is directed toward understanding tip-level, shallow relationships these might be best investigated in a species-level study of a genus of family.
The phenomenon of generic complexes, where different scientific species with the same vernacular name are used interchangeably, is well known in ethnobotany (Linares & Bye, 1987). Although generic complexes may include unrelated species, the prevalence of use of clusters of close relatives and dispersed deep linages might contribute to a pattern of overdispersed clusters when species-level analyses are performed (Souza et al., 2018). Ultimately, calculating metrics for phylogenies at family, genus and species levels might reveal test whether patterns recovered are caused by the concentration of useful species in particular taxonomic units.
The third methodological issue not explored here relates to rate-smoothing or time calibration of the phylogeny used. Our analyses were performed on a rate-smoothed but not time-calibrated phylogeny. The community phylogenetics literature includes metrics calculated using phylograms and chronograms (Li et al., 2019).
This choice can lead to significantly different results in some cases Jantzen et al., 2019;Li et al., 2019). We found rate smoothing to reduce the strong effect of taxa on long branches, and we considered this desirable, especially when some samples are small.
A final consideration is whether there should be an adjustment to the threshold for significance when many tests are carried out, as is the case here. The Bonferroni correction is sometimes used to adjust the alpha threshold for each comparison, so that the study-wide error rate remains at 0.05 (Cabin & Mitchell, 2000). The correction provides a conservative estimate of significance, minimizing type I errors but inflating type II errors. Saslis-Lagoudakis et al. (2012) highlighted results remaining significant after Bonferroni correction in their results tables. This approach has the advantage of allowing the reader to decide what is worse, false positives or negatives, when considering individual values. However, where the interest is in overall pattern rather than individual comparisons as is the case with the heat map presented here (Figure 2), we argue against Bonferroni correction.
Whether different people share preferences for medicinal plants is a question that has been addressed many times, using different methods applied at different spatial scales and at different taxonomic levels. Since independent discovery of plant properties is considered to point toward efficacy (Bletter, 2007;Moerman, 2007;Trotter & Logan, 1986), discovering shared preferences may be informative (Saslis-Lagoudakis et al., 2012). Here, our approach is phylogenetic, and we consider distantly related cultures exposed to allow the taxa belonging to lineages used more than expected to be enumerated. How these hot node methods might be applied in bioprospecting is a matter of ongoing research (Souza et al., 2018).
Another approach depends on the spatial mapping of phylogenetic diversity (PD), a diversity measure predictive of feature diversity that can have conservation applications (Faith, 1992;Forest et al., 2007 (Leonti et al., 2003;Moerman et al., 1999), this hypothesis has not been tested phylogenetically. In the present study, we find the Omani medicinal flora is most similar to the Nepalese one. One driver of this relationship could be floristic environment since we also find that the Omani flora is also more similar taxonomically to the Nepalese flora than to any of the other floras considered. That the flora-a proxy for plant availability-is the main determinant of plant use in our study, was also revealed by another phylogenetic study (Saslis-Lagoudakis, Klitgaard, et al., 2011).
Secondly, other drivers of relatedness of medicinal floras can be explored. These could include shared beliefs about illness causation, use of a shared scholarly medical system, or transmission of traditional knowledge of plants. Cultural contact between Nepal and Oman would be expected to be greater than contact between Oman and the other cultures in our study, through the movement of people and ideas. Oman was an important medieval trading post with many products entering its ports, including Indian plant drugs also important in Nepal (Amar & Lev, 2017). In terms of beliefs, the Galenic humoral system found in northern and central Oman (Ghazanfar & Al-Sabahi, 1993) also contributes to Nepalese practice through his-

| CON CLUS ION
Phylogenetic methods are a powerful tool to better understand medicinal plant use. Here, we focus on cross-cultural patterns in the use of medicinal plants. We show that a more inclusive phylogeny of plants provides an optimal framework, as long as appropriate null models are selected. We highlight the best practice for cross-cultural study, particularly in the interpretation of shared therapeutic applications. We contribute to the emerging picture of the same plant lineages being used for multiple therapeutic applications. The first study to use a phylogeny of plants to explore cross-cultural patterns at the level of whole medicinal floras, in the way we do here, identified lineages that were putatively independently discovered (Saslis-Lagoudakis et al., 2012). Here, we incorporate the medicinal flora of a cultures likely to have had opportunity for transmission of traditional knowledge with another, if not directly then through multiple intermediaries across wide distances. Ultimatel, we may be able to identify drivers of overall similarity in medicinal floras using the methods outlined here, alongside those of evolutionary anthropology (Teixidor-Toneu et al., 2018). Including cultures with more or less similar floras, health needs, theories of disease causation, and opportunity for knowledge exchange, we can more rigorously test the hypothesis that lineages were independently discovered by identifying factors contributing to knowledge transfer (Teixidor-Toneu et al., 2018). Interdisciplinary study combining plant phylogenies, evolutionary anthropology, and ethnobotanical data have great promise, but only if methods are robust (Hawkins & Teixidor-Toneu, 2017); here, we take steps to promote these methods.