Journal list menu
Collection of native Theobroma cacao L. accessions from the Ecuadorian Amazon highlights a hotspot of cocoa diversity
Olivier Fouet and Rey Gaston Loor Solorzano contributed equally to this work
Funding information: Agropolis Fondation; Agence Nationale de la Recherche, Grant/Award Number: ANR-16-IDEX-0006; Valrhona; Montpellier Université d'Excellence
Abstract
enSocietal Impact Statement
“Fine flavor” cocoa, known for its superior flavor and aroma, commands a higher price for farmers than “bulk” cocoa produced for market. These fine flavor cocoa varieties make an important contribution to the agricultural sector in Ecuador. However, cocoa diversity is threatened by deforestation. The effective preservation, characterization, and use of cocoa tree diversity are therefore essential to the future development of this market. We carried out participatory collection surveys with local communities in the Ecuadorian Amazon Forest, to evaluate the genetic diversity of native cocoa trees and protect trees as a resource for local communities. Accessing this wealth of diversity will aid farmers to safeguard cocoa against climate change and develop new varieties for market.
Summary
- The aromatic Nacional variety of Theobroma cacao, emblematic of Ecuador, is highly sought after by the chocolate industry. The modern Nacional is a hybrid population resulting from genetic admixture that has lost the specificity of the ancestral variety. In the context of progressive forest disappearance, the objective of our study was to collect, safeguard, and evaluate native aromatic cocoa trees from Ecuadorian Amazon areas previously identified as areas of origin of the Nacional variety, as well as those that extend toward the northern Amazon.
- Four collection expeditions were organized in the Ecuadorian Amazon provinces of Zamora-Chinchipe, Morona-Santiago, and Pastaza in close collaboration with local communities. A total of 283 native accessions were collected and safeguarded at experimental stations and in local communities. The genetic diversity of the cocoa trees was analyzed by comparison to known genetic groups with a set of 48 simple sequence repeat (SSR) markers.
- This new collection clearly enriches the currently known diversity and improves knowledge of the global genetic structure of cocoa trees. Our results clarify the geographic origin of the Nacional variety in the vicinity of an archeological site that housed a Maya Chinchipe population that consumed cacao 5,000 years ago. In addition, our analyses revealed clues to the origin of Criollo, another ancient variety with a fine flavor.
- These new genetic resources will be used in breeding programs for the varietal improvement of new aromatic cocoa varieties and more globally for the selection of new varieties adapted to environmental changes.
Spanish
esEl cacao “fino de aroma,” conocido por su sabor y aroma superiores, genera un beneficio más alto para los agricultores que el cacao “corriente” en el mercado. Estas variedades de cacao fino contribuyen de forma importante al sector agrícola de Ecuador. Sin embargo, la diversidad del cacao está amenazada por la deforestación. Por lo tanto, la preservación, caracterización y utilización eficaz de la diversidad de los árboles de cacao es esencial para el desarrollo futuro de este mercado. Llevamos a cabo estudios de recolección participativa con las comunidades locales en la selva amazónica ecuatoriana, para evaluar la diversidad genética de los árboles de cacao nativos y proteger los árboles como recurso para las comunidades locales. El acceso a esta riqueza de diversidad ayudará a los agricultores a salvaguardar el cacao contra el cambio climático y a desarrollar nuevas variedades para el mercado.
French
frLe cacao “fin aromatique,” connu pour sa saveur et son arôme supérieur, rapporte aux agriculteurs un prix plus élevé que le cacao « courant» produit pour le marché. Ces variétés de cacao à saveur fine apportent une contribution importante au secteur agricole en Équateur. Cependant, la diversité du cacao est menacée par la déforestation. La préservation, la caractérisation et l'utilisation efficaces de la diversité des cacaoyers sont donc essentielles au développement futur de ce marché. Nous avons mené des collectes participatives avec les communautés locales dans la forêt amazonienne équatorienne, afin d'évaluer la diversité génétique des cacaoyers indigènes et de protéger les arbres en tant que ressource pour les communautés locales. L'accès à cette richesse de diversité aidera les agriculteurs à protéger le cacao contre le changement climatique et à développer de nouvelles variétés pour le marché.
1 INTRODUCTION
Theobroma cacao L. is a diploid (2n = 2x = 20) perennial species of the Malvaceae family (Whitlock et al., 2001). The classification of T. cacao accessions according to their genetic diversity has evolved significantly over the last century. First, two distinct species, Forastero and Criollo, were proposed by Pittier (1930). Then, Cuatrecasas (1964) suggested two subspecies, ssp. cacao and ssp. sphaerocarpum. Accessions of T. cacao were also classified according to their morphogeographical traits by Cheesman (1944). More recently, a genetic study using simple sequence repeat (SSR) markers allowed a more precise classification into 10 genetic groups (Motamayor et al., 2008).
The cocoa tree is an understory tropical tree native to the upper Amazon basin. This region contains the widest diversity of Theobroma species (Thomas et al., 2012). Their distribution extends throughout the Amazon region to Mesoamerica, in part due to past human-mediated dispersal. The history of cocoa expansion remains to be elucidated to better understand the role of human selection in shaping this diversity.
Amazon rainforests have lost more than 20% of their area in the last 60 years (Nobre et al., 2016), and 27% of tree species are threatened according to the IUCN classification. Future projections show that climate change and deforestation could lead to a decline of up to 58% in Amazonian tree species richness. Species could lose an average of 65% of their original environmentally adapted area (Gomes et al., 2019). This continued loss of habitat could lead to an irreversible loss of T. cacao genetic diversity, which is essential for farmers and breeders. Indeed, wild T. cacao constitutes a reservoir of variability for the varietal improvement of cultivated species. The ability of cacao farming to adapt to changing environments or to more sustainable and resilient farming systems will greatly depend on the preservation of such agrobiodiversity (Satori et al., 2021). Conservation of these native cocoa trees requires access to appropriate land to maintain living collections and long-term accessibility for a sustainable breeding process. There is an urgent need to rescue, conserve, and characterize these cocoa genetic resources in close collaboration with local communities to make the processes more demand-driven (Kline et al., 2020). A better overall characterization of wild T. cacao diversity would help identify the best candidates for the selection of traits such as disease resistance, aroma quality or cadmium accumulation in the bean, for which there is natural variation within the species.
Historically, the economy and development of Ecuador have been closely linked to the production and export of cocoa, particularly based on the aromatic variety Nacional. Ecuador has recently become the top cocoa producer in South America, with 322,000 tons estimated for 2018–2019, or 6.7% of world production (ICCO, 2020). The Nacional variety, also called “arriba” when traditionally grown on the Pacific coast, is classified as a fine cocoa for its floral aromas. Over the last century, the gradual introduction of foreign genetic material resistant to various fungal diseases has led to genetic mixing (Loor et al., 2009) and, consequently, alterations to the Nacional flavor. Extensive monoclonal plantings, particularly the spread of the nonflavored clone CCN51, used for its high productivity (Boza et al., 2014), have led to a partial downgrading of Ecuador's fine cocoa production. The meteoric expansion of this clone now accounts for 70% of Ecuadorian production (Jaimez et al., 2022). The current Nacional plantations called “modern Nacional” correspond to a genetic mix between the ancestral Nacional and the introduced Trinitario (Loor et al., 2009). However, nonintrogressed genotypes, corresponding to representatives of the ancestral Nacional, were identified within this hybrid population. The genetic analysis of a large collection of accessions from the Upper Amazon covering a wide geographic area (Allen & Lass, 1983; Loor et al., 2012; Pound, 1938, 1945) identified the southern part of the Ecuadorian Amazon as the putative region of the ancestral Nacional.
In recent years, several genetic diversity studies have been conducted on Amazonian cocoa trees using SSR or SNP markers, including those in Peru (Arevalo-gardini et al., 2019; Zhang et al., 2006), Brazil (Sereno et al., 2006), Bolivia (Zhang et al., 2012), Guyana (Lachenaud et al., 2016; Lachenaud & Zhang, 2008), and Colombia (Osorio-Guarín et al., 2017). Several diversity hotspots were identified in the Upper Amazon between northern Bolivia and southern Colombia, particularly in the Ecuadorian Amazon (Thomas et al., 2012).
To enhance and rescue the genetic resources linked to aromatic cocoa in the context of the progressive disappearance of the forest (Figure 1), several expeditions were organized in collaboration with local Amerindian communities to collect native T. cacao trees. The diversity and genetic structure of the collected and safeguarded material compared to the global diversity of T. cacao species based on microsatellite markers are reported in this study.

2 MATERIALS AND METHODS
2.1 Method of participatory collection expeditions
Four collection expeditions were conducted in 2010, 2013, 2017, and 2019 in the Ecuadorian provinces of Zamora-Chinchipe, Morona-Santiago, and Pastaza. The sampling design of the collection expeditions performed in 2010 and 2013 has already been described by Loor et al. (2015). The last two collection expeditions in 2017 and 2019 (Dataset S1) were conducted following a participatory approach involving the local Amerindian populations in four successive steps (Figures 2 and S1).

2.2 Plant material
During the last two collection expeditions in 2017 and 2019, 150 new T. cacao accessions were collected, bringing the total number of accessions surveyed in the four collection expeditions to 283. These accessions were collected in geographical areas with diverse topology and climate and were grouped into eight distinct zones. Zones 1 to 8 from south to north were representative of the different ecological conditions (Figure 3; Table 1a,b).

| (a) Collected populations | |||||
|---|---|---|---|---|---|
| Area | Sample size | Country | Collection year | Geographical origin | Mean altitude in meters (min–max) |
| Zone 1 | 49 | Ecuador | 2013 | Palanda and Chinchipe cantons | 1,031.5 (724–1,188) |
| Zone 2 | 11 | Ecuador | 2010 | Upper Nangaritza, Numpatakaime and Shamatak rivers | 918.4 (802–962) |
| Zone 3 | 48 | Ecuador | 2010 | Yacuambi, Zamora and lower Nangaritza rivers | 932.3 (815–1,085) |
| Zone 4 | 60 | Ecuador | 2010 and 2017 | El Pangui canton | 934.2 (757–1,171) |
| Zone 5 | 22 | Ecuador | 2017 | Santiago river and Yaupi locality | 295.0 (245–357) |
| Zone 6 | 20 | Ecuador | 2017 | Upper Morona River, San Jose de Morona locality | 210.2 (201–223) |
| Zone 7 | 34 | Ecuador | 2019 | Panguietza river, Taïsha, Tutin Entsa localities | 319.3 (210–452) |
| Zone 8 | 39 | Ecuador | 2019 | Capahuari and Pastaza rivers | 254.8 (224–273) |
| (b) Control populations | |||
|---|---|---|---|
| Group name | Sample size | Country of origin | Geographical origin |
| Amelonado | 4 | Brazil | NA |
| Nanay | 13 | Peru | Nanay river |
| Iquitos | 12 | Peru | Iquitos |
| Marañon | 10 | Peru | Marañon river |
| Guiana | 11 | French Guiana | Camopi and Tanpok rivers |
| Purús | 9 | Brazil | NA |
| Contamana | 7 | Peru | Loreto, Ucayali river |
| Morona | 7 | Peru | Morona |
| Nacional | 11 | Ecuador | Coast provinces |
| Curaray | 18 | Ecuador | Morona-Santiago province |
| Caquetá | 11 | Colombia/Ecuador | Caquetá river/Pastaza and Morona-Santiago province |
| Criollo | 22 | Central América | NA |
- Abbreviation: NA, not available.
To compare the genetic diversity of the collected material with the known genetic diversity of T. cacao, representative accessions from the 10 genetic groups previously identified within the species (Motamayor et al., 2008) were included in the genetic analysis; these accessions were referred to as the control population. In addition, accessions originating from northern Peru (referred to as the “Morona” population) and from southern Colombia and the northern part of the Ecuadorian Amazon (referred to as the “Caquetá” population) were also included (Allen, 1988). A total of 418 samples were analyzed in this work.
2.3 DNA purification and PCR amplification
Leaf samples were stored under vacuum immediately after sampling to avoid degradation during travel. Genomic DNA was extracted from cocoa leaves according to Risterucci et al. (2000) and purified according to Allegre et al. (2011). A total of 48 SSR markers were selected from genomic DNA (Pugh et al., 2004) or expressed sequence tag sequencing data (Fouet et al., 2011) to study the dissimilarities between individuals. These SSRs are distributed across the 10 cocoa genetic linkage groups (Table S1). PCR amplifications were performed as described by Allegre et al. (2011). The raw data were analyzed with Genemapper 4.0 software (Applied Biosystems). To compare our results to those of other studies, control individuals that matched those used in similar studies were included in the analysis. In this study, 119 control individuals were selected from the study conducted by Loor et al. (2012) in the Ecuadorian Amazon, and 64 individuals were selected from the global analysis of the genetic diversity of T. cacao published by Motamayor et al. (2008).
2.4 Genetic diversity
The average number of alleles per locus (Na), expected heterozygosity (He) (Nei, 1978), observed heterozygosity (Ho), and number of private alleles (PA) specific to each group were calculated with the GenAlEx 6.503 program (Peakall & Smouse, 2012). Allelic richness (Ar) and private allele richness (PAr) were estimated using a rarefaction approach with the HP-rare program (Kalinowski, 2004, 2005) to correct for unequal sample sizes; a value of g equal to twice the size of the smallest sample for a diploid organism was used (el Mousadik & Petit, 1996). Unbiased genetic distance (D) or Nei's identity index (Nei, 1972) was calculated with GenAlEx to assess the similarity between all collected and reference groups based on allelic frequencies.
Wright's F-fixation indices were estimated (Weir & Cockerham, 1984). The fixation index (FIS) or inbreeding coefficient was calculated with GENETIX (Belkhir et al., 2004) to estimate the differentiation between individuals within each population. The FST index was estimated between pairs of populations using GENEPOP web version 4.2 software (Raymond & Rousset, 1995; Rousset, 2008).
2.5 Genetic structure of the population
Two methods were used to infer the genetic structure of the population investigated in this study. First, a distance model based on the dissimilarity matrix was calculated using the neighborhood assembly method (Perrier et al., 2003), as implemented in DARWIN 6.0.14 software (https://darwin.cirad.fr/). The genetic structure inferred from the 135 accessions representative of the T. cacao diversity was first evaluated for the concomitance with previously identified genetic groups (Motamayor et al., 2008). Then, a phylogenetic tree with both the collected and reference accessions was constructed using the neighbor-joining method and 500 bootstrap repetitions to assess the uncertainty of the tree structure. Then, we used a Bayesian model implemented in STRUCTURE 2.3.4 software (Pritchard et al., 2000). K groups from 2 to 20 were tested with a burn-in period of 100,000 iterations, 500,000 Markov chain Monte Carlo repetitions, and at least 10 repetitions per K. The optimal value of K, that is, the assumed top structure level, was determined according to the method described by Evanno et al. (2005) using the StructureSelector website (Li & Liu, 2018). This software also incorporates the CLUMPAK program (Kopelman et al., 2015), which combines many features of existing tools for postprocessing STRUCTURE results by incorporating calls to CLUMPP (Jakobsson & Rosenberg, 2007) and DISTRUCT (Rosenberg, 2004).
3 RESULTS
3.1 Genetic diversity of the population
The intragenetic and intergenetic diversity of the collected and control populations were assessed by different genetic parameters (Table 2). A total of 369 alleles were detected with 48 SSR markers. The number of observed alleles per locus (Na) varied between 2 and 19, with an average Na of 7.69 over the whole population. The Na was found to vary from 4.56 (Zone 7) to 2.92 (Zone 2) across each collection zone, with the latter having the smallest sample size. The Na of the whole collected population (6.15) was similar to that of the control set (6.69). After correcting for unequal group size, the high allelic richness (Ar[g]) of the overall collected population was confirmed (5.82) and was found to be only 12.5% lower than that of the control set (6.65). Each of the eight areas collected contains between one and six private alleles (PA), which may seem high considering the geographical proximity of the different areas surveyed. These values were comparable to those of the control groups, where PA varied between 0 and 6, except for the Purús (PA = 14) and Caquetá (PA = 15) groups. Considering all collected material, PA accounted for 13.3% of the total 369 alleles. This result could be partly related to unequal sample sizes; private allele richness (PAr) confirmed the allelic contribution of the collections but with a greater difference between the collection set (PAr = 0.85) and the control set (PAr = 1.68).
| Group name | Na | PA | Ar(g) | PAr | Ho | He | FIS (SE) |
|---|---|---|---|---|---|---|---|
| Amelonado | 1.48 | 0 | 1.48 | 0.01 | 0.038 | 0.168 | 0.753 (0.059) |
| Nanay | 2.23 | 4 | 1.75 | 0.06 | 0.234 | 0.226 | −0.053 (0.037) |
| Marañon | 1.88 | 1 | 1.69 | 0.04 | 0.265 | 0.236 | −0.122 (0.035) |
| Iquitos | 2.25 | 1 | 2.23 | 0.03 | 0.493 | 0.347 | −0.360 (0.033) |
| Guiana | 1.33 | 2 | 1.60 | 0.06 | 0.079 | 0.111 | 0.217 (0.052) |
| Contamana | 2.08 | 6 | 1.95 | 0.20 | 0.350 | 0.314 | −0.109 (0.065) |
| Morona | 2.13 | 4 | 1.97 | 0.10 | 0.328 | 0.326 | −0.022 (0.053) |
| Nacional | 1.79 | 1 | 1.53 | 0.01 | 0.140 | 0.177 | 0.179 (0.060) |
| Curaray | 2.35 | 1 | 2.16 | 0.02 | 0.173 | 0.224 | 0.192 (0.043) |
| Caquetá | 3.19 | 15 | 2.45 | 0.27 | 0.286 | 0.410 | 0.267 (0.042) |
| Purús | 2.17 | 14 | 1.93 | 0.29 | 0.318 | 0.307 | −0.038 (0.048) |
| Criollo | 1.25 | 5 | 1.16 | 0.18 | 0.019 | 0.054 | 0.588 (0.075) |
| Zone 1 | 3.83 | 2 | 2.02 | 0.04 | 0.253 | 0.293 | 0.102 (0.028) |
| Zone 2 | 2.92 | 3 | 2.35 | 0.07 | 0.308 | 0.388 | 0.198 (0.053) |
| Zone 3 | 3.79 | 1 | 2.23 | 0.02 | 0.366 | 0.401 | 0.093 (0.026) |
| Zone 4 | 4.40 | 1 | 2.45 | 0.02 | 0.397 | 0.451 | 0.133 (0.027) |
| Zone 5 | 3.67 | 6 | 2.66 | 0.09 | 0.368 | 0.507 | 0.261 (0.035) |
| Zone 6 | 4.31 | 3 | 3.00 | 0.03 | 0.472 | 0.571 | 0.166 (0.032) |
| Zone 7 | 4.56 | 4 | 2.98 | 0.04 | 0.388 | 0.579 | 0.309 (0.029) |
| Zone 8 | 4.21 | 6 | 2.39 | 0.04 | 0.320 | 0.406 | 0.190 (0.034) |
| Controls ×135 | 6.69 | 75 | 6.65 | 1.68 | 0.215 | 0.648 | 0.683 (0.015) |
| Collections ×283 | 6.15 | 49 | 5.82 | 0.85 | 0.355 | 0.535 | 0.324 (0.018) |
| Total ×418 | 7.69 | 369 | NA | NA | 0.311 | 0.603 | 0.490 (0.013) |
- Note: Mean number of alleles per locus (Na), private alleles number (PA), allelic richness (Ar[g]), private allelic richness (PAr), observed heterozygosity (Ho), expected heterozygosity (He), and fixation index (FIS) with standard error (SE).
- Abbreviation: NA, not available.
The FIS of the eight collected zones (0.324) was lower than that of the control groups (0.683). The majority of the control groups were in Hardy–Weinberg equilibrium with FIS values near zero. However, some negative values showing an excess of heterozygosity were observed, notably for the Iquitos group (−0.360). In contrast, positive values showing an excess of homozygosity were observed for the Criollo (0.588) and Amelonado (0.753) groups. The FIS values ranged from 0.093 (Zone 3) to 0.309 (Zone 7) across the different collection zones and were thus closer to Hardy–Weinberg equilibrium than some of the control groups.
Differentiation between groups was calculated between pairs of populations using standardized variance (FST) and showed very significant differentiation (FST > 0.25) between control groups (Dataset S2). The lowest differentiation between control groups was observed between the Iquitos–Amelonado (FST = 0.3678), Iquitos–Nanay (FST = 0.3577), and Iquitos–Marañon (FST = 0.3825) groups, which aligned with the proximity of these groups. Moderate differentiation (0.15 < FST < 0.25) was mainly found between the eight different collection zones, and the lowest values were observed between the zones that were closest geographically (Zones 6 and 7 or Zones 3 and 4). Some of the collection zones showed moderate to low differentiation from the control groups, including Zones 1–3 with the Nacional control group (0.1330, 0.1314, and 0.1517, respectively) and Zones 7 and 8 with the Curaray control group (0.2352 and 0.1984, respectively). Zones 6 and 7, located upstream of the Morona River, showed similarities with the Morona control group (0.1918 and 0.2284, respectively). More surprisingly, Zones 6 and 7 also showed similarities with the Iquitos control group (0.1906 and 0.1771, respectively).
3.2 Genetic distances between reference groups and collection zones
The genetic distance coefficient (D) was calculated in pairs between the control groups and the eight collection zones (Dataset S2). The lowest genetic distance between the control groups was detected between the Marañon and Guiana groups (D = 0.274), followed by the Nacional and Morona groups (D = 0.295). The highest genetic distance value was found between the Criollo and Amelonado groups (D = 2.295). The genetic distances between the collected zones fluctuated from D = 0.05 (between Zones 3 and 4) to D = 0.482 (between Zones 1 and 5). Small genetic distances were observed between the Nacional control group and the collected regions located in the south (Zones 1–3), especially with Zone 1 (D = 0.068). The Curaray group is closer to the native trees collected in the regions located in the north (Zones 5, 7, and 8).
3.3 Structure of the population
The collected population was analyzed by two methods to identify its structure.
An initial analysis by the neighbor-joining method was performed between the 135 control clones. It was observed that the resulting phylogenetic tree confirmed the structure of the 10 genetic groups known thus far and described by Motamayor et al. (2008). Furthermore, two supplementary groups could be observed and include on the one hand the accessions of from Morona and on the other hand the accessions from Caquetá. In this phylogenetic tree, it was also observed that the Criollo genetic group appears closest to the Caquetá genetic group.
The same analysis was then conducted with the total dataset of 418 individuals. The overall phylogenetic tree (Figure 4) allowed us to observe that the majority (>70%) of the collected trees were grouped into three large clusters close to the Nacional, Morona, and Curaray control groups. The first cluster comprised accessions collected in the southern zones (Zones 1–3) and accessions from the Nacional and Morona control groups. The second cluster comprised accessions collected in the northern zones (Zones 5–8) and accessions from the Curaray control group. The third cluster comprised accessions collected in Zones 3 and 4. Most of the remaining collected accessions (<30%) were irregularly located closer to other control groups excepted for complex admixed accessions located at the base of the tree that involved more than three genetic groups defined by control populations.

A Bayesian analysis using STRUCTURE software was then used to more precisely determine the ancestry and genetic partition of the collected and control accessions. The top hierarchical level of genetic partitioning was first revealed at K = 4 (Figures S2 and S3). This level of hierarchy yielded four major subdivisions of the control groups: (i) Curaray, (ii) Nacional and Morona, (iii) Criollo, and (iv) Amelonado-Nanay-Marañon-Iquitos-Guiana-Purús and Contamana. At this value of K, the Caquetá control group appeared as an admixed population of the four subdivisions. Further subdivisions could be observed at higher K values, differentiating the following control groups: accessions with Contamana ancestry separated at K = 7; accessions with Morona ancestry separated at K = 8; and accessions with Amelonado ancestry separated at K = 11. The Purús and Caquetá control groups dissociated at K = 14. Finally, we did not observe a dissociation between accessions with Nanay and Iquitos ancestries from those with Marañon and Guiana ancestries, even at higher values of K.
More generally, for the different values of K taken into consideration, the genetic structure of the newly collected accessions varied distinctly from south to north depending on the geographical zone of the collection site. In Zones 1 and 2, the accessions are very close to the Nacional control group, and further north, particularly in Zone 8, the collected trees are closer to the Curaray genetic group (Figure 5). Surprisingly, starting from K = 5, accessions collected mostly in Zone 4 and nearby El Pangui formed a distinct genetic group.

4 DISCUSSION
A total of 283 native trees were collected and characterized with 48 SSR markers to assess their diversity and compare those values to the known genetic diversity of T. cacao species. SSRs were chosen for their flexibility in testing small numbers of individuals and for their discriminatory power compared to an equal number of SNP markers (van Inghelandt et al., 2010).
4.1 High allelic diversity
Our results show that cocoa trees with new alleles and increased diversity were collected during these four surveys, highlighting this hotspot of diversity present in Ecuador. The high number of alleles per locus (Na), up to 19 (average Na = 7.69), allowed us to assign specific alleles to a genetic group or collection zone. A majority of cocoa trees collected near the banks of the Morona River upstream of the village of San José de Morona (Zone 6) and further north (Zones 7 and 8) were characterized as being very close to the Curaray genetic group. These Curaray-type accessions were collected at low altitudes (between 200 and 350 m) under very different climatic conditions (cloud cover, high rainfall, and temperature). In contrast, the Nacional-type accessions observed in Zone 1 were collected between 800 and 1,200 m of altitude in mountainous regions (cordilleras). The Curaray and Nacional accessions are genetically very different (D = 0.695) and represent two of the four groups constituting the upper hierarchical structure observed in this study. Their presence reflects the high diversity observed over a short geographic distance between the most distant collection areas (approximately 300 km in a straight line). Adaptation to the strong geo-climatic variations of their environments could partially explain their genetic distance. The private alleles of all the collected native trees were found to be very large (PA = 49) compared to the control set, although they were well represented by regional groups (Curaray, Nacional, Morona, and Caquetá). Compared to the total alleles present in the control population (320), the private alleles present in the collected populations represent a gain of 15.3% of new alleles.
Different authors have examined the centers of diversity of T. cacao and hypothesized that its zone of origin is located near the Equator (Cheesman, 1944; Motamayor et al., 2008; Pound, 1938, 1945). Taken together, our results confirmed that Ecuador contains a high diversity hotspot for T. cacao species, as also previously suggested by Thomas et al. (2012).
4.2 Population structure
Population structure analyses revealed four main subdivisions by both NJ and Bayesian clustering methods; this structure differentiated the Curaray, Nacional-Morona, and Criollo groups and a cluster containing the other groups formerly called “Forasteros,” including Amelonado, Purús, Guiana, and groups originating from Peru (Nanay, Marañon, Iquitos, and Contamana). This four-group hierarchy was already highlighted in the NJ method analysis performed on the LCT-EEN accessions (Loor et al., 2012) collected in the Amazon between 1980 and 1985 (Allen, 1988). A similar low level of hierarchy (K = 4) was also detected for a large collection of T. cacao in Colombia (Osorio-Guarín et al., 2017). Furthermore, at a K value of ≥10, it was not possible to separate all 10 of the genetic groups previously described by Motamayor et al. (2008), suggesting that this reference analysis is not strong enough to be applied to our data even if the control samples had an ancestry value of >0.90. A possible explanation could be that the number of markers used in our study was not sufficient to discriminate the genetic groups highlighted by Motamayor et al. (2008). However, other genetic groups dissociated before these 10 reference groups in our study. A large number of cocoa trees collected mostly around the village of El Pangui (north of the Zamora-Chinchipe region), mainly located in Zone 4, formed a new group that dissociated at K = 5. This new group was formed with an equivalent proportion of Curaray and Nacional ancestry representing the majority of Zone 4 accessions at K = 4 (Figure 4). The low number of private alleles (PA = 1) found in this Zone 4 and the relatively high observed heterozygosity (Ho = 0.397) suggest that the Pangui population seemed to be an admixed population rather than a distinct genetic group. Furthermore, at K = 8, accessions from the Morona control group dissociated from accessions of the Nacional control group. In addition, the analysis of the genetic substructure within the neighbor-joining tree separated individuals with Nacional ancestry from individuals with Morona ancestry, suggesting that the Morona control group could be considered a different subpopulation.
Approximately 30% of the cocoa trees collected were not part of the three main clusters around the Nacional–Morona, Curaray, or the accessions collected around El Pangui village and were sometimes very close to the other control groups. The majority of these accessions were collected around villages well served by road connections; they are certainly foreign cocoa trees recently imported like TIW063 (Amelonado) or hybrids between native and foreign cocoa trees.
4.3 Origin of the Nacional variety
A previous study using paternity analysis performed on a large population of T. cacao accessions collected in Amazonia from several countries (Loor et al., 2012) suggested that the genotypes LCT-EEN85, LCT-EEN86, and LCT-EEN91, which were collected in Ecuador close to the Yacuambi and Nangaritza rivers (Allen, 1988), were the closest and most likely representatives of the ancestors of the Nacional variety among the accessions studied. In the vicinity of these rivers (Zone 3) and around the village of Shaïme (Zone 2), we found 19 trees similar to the Nacional variety, including four that were very close genetically to LCTEEN86 and four trees (SHAM001, SHAM002, SHAM003, and NANK006) that were very close to LCTEEN91 and the Nacional control group. However, the vast majority of T. cacao accessions close to Nacional were collected further south around the village of Palanda (Zone 1). Over a distance of approximately 50 km around the village of Palanda, 44 of the 49 trees analyzed showed a very strong similarity to each other and to the Nacional control accessions.
An old ceremonial archeological site dating back 5,500 years was discovered in Santa Ana-La Florida near the village of Palanda; this site showed a high degree of societal development at this time (Valdez et al., 2005). A recent study showed the presence of starch grains, theobromine, which is characteristic of T. cacao, and ancient cocoa DNA on ceramic remains from this archeological site (Zarrillo et al., 2018). Trees collected around or at the site were very similar to the Nacional group controls. All of these data suggest that the Mayo-Chinchipe people could have played a major role in the domestication of the Nacional variety.
Zones 1 and 2, where trees with Nacional ancestry have also been found, are not located in the same watershed but are connected by ancient paths still used by Amerindian communities. Evidence of very old interzone connections across this region (Valdez, 2008) may have facilitated the spread of the Nacional variety from its domestication area (Zone 1) to more northerly regions (Zones 2 and 3).
The hypothesis that the ancestral Nacional trees, once grown on the Pacific coast (Loor et al., 2009), could have originated from the region of Palanda (Zone 1) is supported by different geographical and historical data. Indeed, the Amazon region (Zones 1–3) is close to the Guayas region, where the first Nacional crops were observed on the Pacific coast. Marine shells discovered at the archeological site of Santa Ana–La Florida (Valdez et al., 2005) attest to the connections between Zone 1 and the Pacific coastal regions.
As a result of these surveys, this work has allowed the preservation and collection of genetic material close to the Nacional control group, known for its aromatic qualities and of importance for Ecuadorian cocoa production. The aromatic potential of the collected accessions is currently being analyzed (Colonges, Loor Solorzano, et al., 2022; Colonges, Seguine, et al., 2022). The enrichment of the allelic diversity of the Nacional varieties should provide, within the framework of improvement programs, to have material adapted to the local growing conditions found in the Ecuadorian Amazonian regions surveyed. Participatory breeding programs have been initiated with local communities to promote sustainable family cocoa farming and increase their livelihood.
4.4 Hypothesis of Criollo domestication
A recent study showed the basal position of the Curaray genetic group in the phylogeny of T. cacao (Osorio-Guarín et al., 2017). This result was reinforced by an analysis based on a population differentiation model derived from the sequencing of 200 individuals representing highly diverse cocoa trees (Cornejo et al., 2017). This latter analysis revealed the Curaray group as the most closely related to the Criollo group, suggesting that the origin of Criollo domestication came from a subset of ancient Curaray individuals.
With the analysis of representatives of a new population, the Caquetá population, our results showed that Criollo is closer to this new group (D = 0.835), which could also be visualized in the phylogenetic tree (Figure 4). Among the control groups, the Caquetá group had the highest values of allele richness and private alleles. These observations suggest that the Criollo domestication could have originated from this region or a region located further north, close to the frontier between Colombia and Ecuador, thus representing a hotspot of T. cacao diversity.
5 CONCLUSION
T. cacao diversity is strongly threatened by progressive deforestation and by the rapid emergence of diseases in the context of trade globalization. The four surveys (2010 to 2019) carried out in Ecuadorian Amazonia with the help of Amerindian communities and agricultural colleges constitute enhanced genetic resources that can be used to create new aromatic cocoa varieties adapted to Amazonia. The vast majority of the collected accessions increased the previously known diversity of the Nacional and Curaray genetic groups, and new alleles were discovered. Given their diversity, they could allow the identification of aromatic niches, providing new income for farmers while contributing to the maintenance of T. cacao diversity.
This research has clarified the domestication area of the Nacional varieties and gives new clues about the domestication of Criollo, another aromatic variety. In the coming years, safeguarding the diversity of native T. cacao should play a major role in varietal improvement for the adaptation of the species, especially regarding climate change.
ACKNOWLEDGMENTS
These surveys were carried out thanks to a collaboration between INIAP and CIRAD. We thank the Agropolis Foundation, MUSE (Montpellier Université d'Excellence), and Valrhona for their financial support of this project. This work, a part of the MUSE Amazcacao project, was publicly funded through ANR (the French National Research Agency) under the “Investissement d'avenir” program with reference (Agence Nationale de la Recherche) ANR-16-IDEX-0006. We are most grateful to the GPTR (Great regional technical platform) of Montpellier core facility for its technical support. We are very grateful to the local populations, mainly the Shuar, Achuar, and Amazonian Kichwa Amerindian communities, as well as to the agricultural colleges of El Pangui, Real Audiencia (San Jose), Raime Roldos (Santiago), Los Angeles (Taïsha), and Tuna (Kapawi) for their invaluable help during these surveys.
CONFLICT OF INTEREST
The authors declare no conflict of interest.
AUTHOR CONTRIBUTIONS
R.G.L.S., P.C. and C.L. conceived and coordinated the project; R.G.L.S., C.S., and F.F. established formalities with government authorities and Amerindian communities to enter in the Indian territories; R.G.L.S, C.S., D.C., F.F., and C.L. organized seminaries in agricultural colleges; O.F., R.G.L.S., C.S., D.C., F.F., F.A. B.Y., and I.S. carried out the 2017 and 2019 surveys, multiplicated the collected material, and established germplasm collections; O.F., R.G.L.S., and C.L. designed the genetic studies; O.F., R.R., K.C., and H.V. carried out the genotyping; O.F., B.R., and X.A. analyzed the data; O.F., R.G.L.S., B.R., X.A., and C.L. wrote the manuscript.
Open Research
DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available from the corresponding author upon request.





