SeedGerm: a cost-effective phenotyping platform for automated seed imaging and machine-learning based phenotypic analysis of crop seed germination

 In this article, we describe the hardware and software design in detail. We also demonstrate that SeedGerm could match specialists’ scoring of radicle emergence. Germination curves were produced based on seed-level germination timing and rates rather than a fitted curve. In particular, by scoring germination across a diverse panel of Brassica napus varieties, SeedGerm implicates a gene important in abscisic acid (ABA) signalling in seeds.  We compared SeedGerm with existing methods and concluded that it could have wide utilities in large-scale seed phenotyping and testing, for both research and routine seed technology applications.


Introduction
Seeds are essential for human beings, not only as important food sources, but also for efficient crop production. High-vigour seeds with better seed germination and seedling emergence rates can ensure reliable emergence under varied agricultural conditions and hence are key to yield potential and uniformity (TeKrony & Egli, 1991). A common scoring method for seed germination is to assess radicle protrusion, which quantifies the speed and frequency of germination (Finch-Savage & Bassel, 2016). Traditionally, the task was accomplished by seed technologists through visual inspections on colour and morphological changes during physiological processes of seed germination (Lin, 1999); however, this approach is labour-intensive and subjective (Joosen et al., 2010;Demilly et al., 2015).
Routine germination scoring still commonly relies on human observation, which has practically constrained the frequency, scale, and accuracy of such experiments (Reyazul et al., 2015;Jahnke et al., 2016;Zhang et al., 2018). This bottleneck has led to many attempts to automate both seed imaging and associated phenotypic analysis, resulting in several research-based solutions such as GERMINATOR and the package, phenoSeeder, and the MultiSense tool (Ducournau et al., 2005;Joosen et al., 2010;Demilly et al., 2015;Jahnke et al., 2016;Ligterink & Hilhorst, 2016;Keil et al., 2017). More recently, advanced computer-vision (CV) and machine-learning (ML) techniques are being applied to germination assays, including the Rice Seed Germination Evaluation System (RSGES) for assessing the germination status of Thai rice species using an artificial neural network (ANN) classifier (Lurstwut & Pornpanomchai, 2017); machine-vision based analysis on visible and X-ray images for evaluating soybean seed quality based on physical purity, viability and vigour (Mahajan et al., 2018); deep learning (DL) algorithms such as U-Net and ResNet for segmenting and classifying rice seed germination status (Nguyen et al., 2018); linear discriminant analysis and multispectral imaging combined for classifying cowpea seeds into categories of ageing, germination, and normality (Elmasry et al., 2019), and a high-throughput micro-CT-RGB (HCR) phenotyping system for dissecting the rice genetic architecture from seedling (Wu et al.,

2019).
The above solutions include customised hardware devices (e.g. bespoke germination trays, image sensors and seed handling system) and tailored analytic software built on MATLAB Toolbox, ImageJ/Fiji, Microsoft Excel macros, image analysis libraries (e.g. VideometerLab3 and

Accepted Article
OpenCV), and ML/DL libraries (e.g. PyTorch). Although not fully automated, they have been successfully applied to impute germination traits from the acquired seed images, including the quantification of morphological traits (e.g. size and shape), cumulative germination rates (e.g. time to 50% germination, T 50 , and the proportion of seeds germinated at the conclusion of an experiment, G max ), and quality traits such as viability and vigour (Ducournau et al., 2005;Jahnke et al., 2016;Mahajan et al., 2018). Nevertheless, the throughput, automation level, and the range of traits of the above solutions are still limited, such that seed imaging and associated germinationrelated traits analyses still require human interference.
The emergence of plant phenomics in recent years has brought new perspectives to seed science research (Dell'Aquila, 2009;Watson et al., 2018). By combining cost-effective digital imaging and environment sensors, organ-level plant growth and development can be recorded with detailed imagery, at a very high frequency (Tardieu et al., 2017;Pieruschka & Schurr, 2019;Reynolds et al., 2019b). In particular, many CV and ML combined analytic methods have been developed to enable the automation of organ-level phenotypic analysis, including leaves, roots, and reproductive organs (Pound et al., 2017;Sadeghi-Tehran et al., 2017;Xiong et al., 2017;Zhou et al., 2017a;Yasrab et al., 2019). By combining colour, texture, morphologies, and growth patterns, seed germination can be quantified in a dynamic and objective manner, based on which large-scale and reproducible evidence can be produced to enable new biological discoveries for seed physiology (Teixeira et al., 2007;Demilly et al., 2015;Reyazul et al., 2015;Lurstwut & Pornpanomchai, 2017;Elmasry et al., 2019). Furthermore, the automation of seed germination scoring presents a good opportunity to initiate the standardisation of seed science research. Not only can seed quality and vigour be digitally assessed, but in addition, biological experiments under varied conditions can be cross-referenced quantitatively to increase the confidence of our research outcomes.
Here, we introduce SeedGerm, a platform designed for automating seed imaging and highthroughput germination analysis for a variety of crop seeds. SeedGerm incorporates cost-effective hardware components for seed imaging and experimental conditions (e.g. ambient temperature and humidity) acquisition, as well as ML-based analytic software for measuring both germination-and establishment-related traits during the germination process. Utilising SeedGerm, we are able to quantify the performance of seed lots based on individual seeds rather than a fitted germination Accepted Article curve. The analytic software embedded in SeedGerm is able to process multiple image series at the same time and export analysis results in both comma-separated values (CSV) files and processed images (e.g. germination masks, in PNG format), at both seed and panel levels (normally one genotype per germination panel). We also demonstrate that SeedGerm matches seed specialists' observations for the scoring of radicle emergence timing for crop species such as tomato, pepper, Brassica, barley and maize seeds, which can also be used as a research tool to identify the genetic basis of germination differences between varieties.

Seed batch production and storage
Seed lots were produced in commercial production and stored at 12⁰C and 35% relative humidity (RH) until use. For seed production from the 88 B. napus Diversity Fixed Foundation Set (DFFS) lines used in this study, plants were vernalised (8-h photoperiod, 5 o C) for 6 weeks at the four-leaf stage and grown in a polytunnel. Seeds were used within three months after harvesting. Seed batches from independent mother plants constituted biological replicates. High-quality seed batches of tomato and Brassica were utilised to generate lower quality batches. To this end, a sub batch was taken from these, which was heat-treated for three days at 70 o C.

Seed germination conditions
A typical experimental setup uses standard A3-sized filter paper, dark blue seed testing paper used in the germination chambers supplied by Munktell Ahlstrom (Grade 194, Bärenstein Germany), substrate to accommodate six sets of 64 individual seeds (384 seeds in total, in six germination panels) for tomato and Brassica seeds. For barley, we carried out experiments with three extended germination panels, with 40 seeds per panel and 120 seeds in total. Due to the size of maize seeds, the entire germination box was used to host 35 seeds per experiment. For pepper seeds, 81 seeds were used in a given panel, resulting in a total of 486. To facilitate sound germination classification, a minimum of A4-sized filter paper is recommended to allow sufficient space between seeds, but further divisions could also be made to separate different genotypes.
Typical automated seed imaging was set with an hour interval and normally conducted between 5 and 10 days depending on the crop species. For example, B. napus seeds were germinated on

Accepted Article
This article is protected by copyright. All rights reserved saturated filter paper in SeedGerm boxes in constant white light at 10°C (in a cold-room or a growth chamber). A standard seed testing took 7-14 days, with two key traits (i.e. germination frequency and seed vigour) frequently checked by experienced seed technologists. To screen the 88 B. napus DFFS lines, seeds were gridded in panels of 50 seeds, with six panels per germination box and five replicates per line. A fully randomised experimental design was followed. In a routine experiment, each SeedGerm box contains two layers of white filter (Grade 3644, Hahnemuehle Germany), with a single sheet of blue seed germination paper on top. A fixed volume of water (i.e. sterile de-ionised water, 350mls) was added to the filter paper stack prior to the start of the experiment. To ensure even absorption across the filter paper, the wetted paper was allowed to stand for 2 hours after the addition of water (i.e. a further 30mls), before gridding the seeds and starting the experiment.

Hardware design
To carry out high-quality seed imaging to record physiological processes of germination in a continuous manner, we have designed two types of hardware apparatus: (1) a relatively low-cost translucent plastic germination box mounted with a fixed camera for routine germination experiments, and (2) a more expensive bespoke mini-gantry imaging system built on the top of a transparent polyethylene box for long-term experiments. Both designs are shown in Figure 1, where the image sensors used in the fixed design are high-definition (HD) Pi camera modules (i.e. 5 megapixel, MP, with a maximum 2592x1944 pixels per image, Fig. 1a) and the mini-gantry design is equipped with an 8 MP HD USB camera, with undistorted wide-angle lens and a maximum 4160 × 3120 pixels per image (Fig. 1b). As the focus of the moving USB camera is adjustable, the latter design has been used for a variety of experiments to explore the physiological processes between germination and seedling (e.g. a 15-day experiment for wheat seeds, Fig. 1b).
Also, some digital sensors have been installed in the SeedGerm device, recording ambient humidity and temperature on an hourly basis. As advised by previously published work (Schumann et al., 1995;Afzal et al., 2017), transparent polypropylene used to build SeedGerm devices has been tested repeatedly and did not have effects on germination and seedling growth. A brief outline of the hardware design and cost of the SeedGerm device can be seen in Supporting Information Note S1.

Accepted Article
This article is protected by copyright. All rights reserved Both SeedGerm hardware designs are controlled by low-cost single-board computers (i.e. Raspberry Pi 2 or Pi 3 computers). In a given experiment, users can set up seed imaging via a graphic user interface (GUI) based software application (i.e. the imaging module) running on Pi computers embedded in the SeedGerm hardware, through which imaging parameters such as resolution and interval can be programmed. The GUI control software is cross-platform and was developed using Python's native GUI package, Tkinter (Shipman, 2013), and has been described previously (Zhou et al., 2017b). It also allows users to define metadata for each experiment, including species, genotypes, experiment duration, and the naming convention for the acquired images. A number of experiments can be monitored simultaneously (Figs. 1a&1c). The data collation and management are controlled by Linux-based crontab scheduling, at near real-time.
Users can visually inspect experiments (e.g. tomato in Fig. 1c and wheat in Fig. 1b) from their own computers or smart devices using a virtual private network (VPN) or remote desktop software.

Open-source software system
Besides the seed imaging module, the SeedGerm software system also contains a light-weight data management module and a ML-based analysis module (Fig. 2a). An image acquired by the imaging module is firstly saved on the SeedGerm hardware's local storage. Then, the image is checked according to its size and clarity; if the size and clarity are greater than a predefined threshold value, it will be transferred to a gateway computer via wired (Ethernet) or wireless network connections. This synchronisation task is carried out on an hourly basis, between many SeedGerm devices and the gateway computer, where images from different devices are collated in folders named after their associated experiments defined through the seed imaging module.
Between the gateway machine and onsite storage (e.g. a dedicated workstation or highperformance computing infrastructure, HPC), data synchronisation tasks are normally accomplished overnight, when onsite network traffic is less busy. The data management module is administered by either crontab scheduling on Linux (Debian 9.0 onwards) or Bash scripting on Windows (Windows 7 onwards), which have been described in our previous work (Reynolds et al., 2019a).
For different end-users, automated phenotypic analysis can be conducted either centrally on HPC or in a distributed manner on a workstation. Each experiment generates a time-lapse image series, which is uploaded to the onsite storage progressively during the experiment. Then, users

Accepted Article
This article is protected by copyright. All rights reserved can use the ML-based phenotypic analysis module (i.e. analysis software) to either analyse these images through a command-line interface on HPC clusters, or tailored GUI-based software on a normal workstation computer. Both approaches output similar analysis results, including the quantification of germination-and establishment-related traits in CSV files, as well as a sequence of processed images (e.g. dynamic seed masks and panel segmentation images) in PNG format.

GUI-based analysis software
For the ML-based phenotypic analysis module, the workflows for both GUI and command-line approaches are fundamentally identical. We therefore use the more accessible GUI software to introduce the analysis procedure, which has been designed to execute on either Windows (i.e. the .exe executable, Windows 10 tested) or Mac OS (i.e. the .app file, version 10 onwards). The analysis software packages can be downloaded from our GitHub repository. The initial GUI contains an empty window with a menu bar and users can add experiments via the "Add experiment" window ( Fig. 2b), through which users can enter a given experiment's name, select an image series for processing, and choose a crop species such as Brassica, maize, pepper, tomato, or cereals. New plant species can be trained and added to the software through the Modules directory, an approach that is independent of the core analysis algorithm. Users need to briefly define the germination experiment associated with the selected image series, including the number of panels in a given SeedGerm device, Rows and Columns of seeds in each panel. In particular, users can define the Start and End image IDs to initiate and terminate the phenotypic analysis, because the background in early images can be over-saturated due to excess water soaked by the filter paper, whereas late images can contain too many overgrown seedling and roots (e.g. images between the fourth and 167 th image will be analysed in Fig. 2b). Default values of the Start and End images are the first and last image of the selected series.
In order to deal with varied image quality and features caused by lighting, crop species, and different establishment phases, a number of ML-based algorithms have been implemented in the software. Users can select the ML technique from the "BG remover" dropdown to remove the background pixels, which includes U-Net (Ronneberger et al., 2015), Gaussian mixture model (GMM) (Stauffer & Grimson, 2003), and stochastic gradient descent (SGD) (Bottou & Bousquet, 2008), which are explained in the following sections. After an experiment is added, users are required to set YUV colour-space ranges (Y stands for the brightness, U and V for colour Accepted Article components (Szeliski, 2010)) to delineate the background (i.e. filter paper) in the first image of the selected series (Fig. 2c). By adjusting the sliding bars in the "Set YUV ranges" window, backgrounds are mostly retained, representing different types of filter paper used in diverse experiments. After defining YUV values, users can click the "Process images" item to start the phenotypic analysis (Fig. 2c). Similar to our previous work (Zhou et al., 2017a), the analysis software has also employed parallel computing to process multiple experiments simultaneously, with up to 12 image series have been analysed at a time on an average computer (Intel Core i5, 8GB RAM) and over 120 series on HPC (Fig. 2d). This implementation has enabled a multithreading analysis running on HPC clusters for greater throughput.
Finally, when the analysis is completed, germination traits (e.g. T 25 , T 50 , T 75 , G max , and germination timing curves for each panel), morphological traits (e.g. area, width and length, extent, convex area, and circularity for each seed), and a range of processed images (showing the germination procedure and labelling individual seeds) are produced (Fig. 2e). Users can click "View results" on the shortcut menu to display the analysis outputs, as well as download a range of processed images (Supporting Information Video S1) and the analysis results in CSV files, containing phenotypic analysis at the image (overall results), the panel (i.e. a given genotype), and the seed levels (see Supporting Information Note S2).

Core analysis algorithm
The core analysis algorithm for SeedGerm includes three key parts: (1) ML-based background remover, (2) feature extraction and germination detection, and (3) traits measurement (Fig. 3). To establish a more general algorithm to analyse different types of seeds robustly, we have used a mixture of deep learning (DL, i.e. U-Net) and supervised ML (i.e. GMM and SGD) to divide background (filter paper) and foreground (seeds) pixels. For example, after users set the YUV values to retain background pixels, the selected BG remover is trained based on features of the background (e.g. RGB, contrast, intensity values) in the image (Fig. 3a). Then, the YUV values are applied to representative images across the image series (i.e. images at the beginning, middle, and end of the series) to segment background pixels, which allows the ML model to learn background features at different establishment stages during a given experiment, without overfitting the classifier for a specific crop species or a particular experimental setting. Finally, the

Accepted Article
This article is protected by copyright. All rights reserved trained classifier (i.e. the background remover) is applied to each image in the series, producing background masks excluding any seed for each germination panel (Fig. 3b).
After producing background masks, they are inverted so that only seed-related objects can be retained. SGD has been chosen as the main learning algorithm for germination scoring and was used for our routine germination experiments because it performs well when the seed-background contrast is high when seeds can be clearly delineated from the surrounding background pixels.
Unlike SGD, the GMM model is used for image series with low seed-background contrast when seeds are slightly out-of-focus. It is slower, but more robust when the background is complex (e.g. roots from different seeds are crossing). For images with acceptable quality but under changeable lighting conditions, we tend to use U-Net, a recent convolutional neural network (CNN) for semantic segmentation, excelling fully convolutional network (Jonathan et al., 2015) by adding skip connections and extra upsampling layers to provide both local and global information. The implementation of U-Net is exploratory, with the aim of using deep learning techniques to improve analysis for unseen datasets (e.g. treated seeds and new plant species).
For feature extraction and germination detection (Figs. 3c,d), we applied descriptive statistical moments, i.e. Hu Moments (Hu, 1962) to describe a given seed's area and its centroid position, which are invariant to the scale and rotation changes of seeds due to imbibition in early germination stages. Features such as minor axis length (seed width), major axis length (seed length), length and width ratio, perimeter, delta of Hu moments (i.e. difference of a seed's Hu moments between two consecutive images), delta of seed area, and delta of seed length and width, have been computed to monitor a seed's morphological changes. The aforementioned features from the pre-germination stage are extracted from the first 20% of images in the series and are then combined to form a training matrix (Fig. 4c) to train a classification model with the assumption that the label for all the seeds is non-germinated. The detection model used is called novelty detection (Schölkopf et al., 2001), a one-class support vector machine (SVM) established on the training matrix generated from the first 20% of images and is then applied to determine the germination status of each seed in the image series. Based on the training data, a decision function is generated by the model to enclose pre-germination feature vectors, i.e. white circles enclosed by the red-coloured contour in the embedding p-dimensional space (Fig. 3d); then, as the germination experiment progresses, feature vectors are recomputed. When a seed begins to germinate, its

Accepted Article
This article is protected by copyright. All rights reserved feature vector should gradually leave the boundary of the initial observation region (i.e. abnormal with a given confidence in the germination assessment, black circles outside the red-coloured contour). The seed's probability of germination will increase as well. The novelty detection model scores germination for all detected seeds, resulting in cumulative germination rates for each seed lot in a given germination panel (Fig. 3e). Since the novelty detection model is reinitialised and retrained for each experiment using the first 20% pre-germination images of the selected image series as training data, the detection model is dynamic and hence the risk of overfitting is low. The implementation of the above algorithms can be seen in Supporting Information Note S3.

Morphological traits analyses
The last key component of the analysis software is the measurement of morphological features, prior to true leaf production from apical meristems. This approach utilises the measure module in Scikit-Image (van der Walt et al., 2014), which is enumerated briefly below: 1. "Seed Area" is the total number of pixels in the region of a segmented seed, which quantifies the size of a seed together with its associated radicals, if it is germinated. This trait can be used to define the size change of seeds during germination (e.g. imbibition).
2. "Seed Perimeter" measures the length of the contour line that encloses a given seed and, if germinated, its associated radicals. This trait can be used to verify the change of the seed size and radicle emergence during germination.
3. "Seed Major/Minor Ratio" measures the width and length (W/L) ratio of the ellipse that encloses a seed and, if germinated, its associated root regions. This trait can be used to define the shape change of a given seed during germination.
4. "Seed Convex Hull Area" measures the area of the smallest polygon that can enclose a seed and, if germinated, its associated root regions. This trait can be used to define the change of the seed and root coverages in a germination panel.
5. "Seed Extent" is the ratio of the total number of pixels contained by a seed to the total pixels contained in its bounding box. This trait is useful to assess the seed establishment rate because the area of the bounding box should increase faster than the seed area.
6. "Seed Circularity" calculates the roundness of a seed, and, if germinated, its associated root regions. If the seed is a perfect circle, its circularity reading is 1; a line segment would have a circularity of 0. The circularity is defined as , where Area is Seed 4 * * 2

Accepted Article
This article is protected by copyright. All rights reserved Area (in pixels) and Perimeter is Seed Perimeter (in pixels). This trait is also used to differentiate different crop seeds and their germination rates.

Germination and morphological traits
A set of germination experiments have been conducted to test and improve the SeedGerm platform.
The analysis results of an experiment with 384 tomato seeds (six genotypes), which have been placed on six panels in a customised germination box, with one genotype per panel (64 seeds) can be seen in Figure 4. The imaging interval is 60 minutes and 186 images have been acquired in total, within eight days (Fig. 4a). Analysis outputs include two types of traits: (1) germination traits quantified using 1 st~1 86 th images (Fig. 4b), including cumulative germination curves, T 50 germination rates to assess the uniformity of germination, and G max to quantify the proportion of seeds germinated at the end of the experiment; and, (2) morphological traits quantified using 1 st~1 60 th images (Fig. 4c), including seed area, width and length (W/L) ratio, and circularity. By combining both traits, we can identify morphological changes of six genotypes at the pregemination stage (before the 106 th image). As soon as the germination process started, the cumulative germination curves and associated morphological features became divergent between genotypes. It is observable that there is a strong correlation between the germination curves and the seed area curves, fitting in the developmental procedure when radicals coming out of seeds can dramatically increase the W/L ratio, and the more roots the lower W/L ratio and circularity. The above quantification exhibits the usefulness of combining both germination and morphological traits to verify and improve the detection accuracy.
Additionally, we also used the analysis outputs to evaluate germination uniformity or variability, an important trait requiring complex formulas to compute previously (Ranal & De Santana, 2006).
For example, box-and-whisker plots are provided amongst the result files to demonstrate statistical dispersion of T 50 germination rates (Fig. 4b), showing the difference between 25 th and 75 th percentiles of each genotype, as well as the median time to 50% germination of all genotypes. For example, genotype 6 seeds (G6) possess lower germination variability and better germination uniformity, which is verified by narrower percentile ranges and similar median values across tested seed batches. We have removed a number of late images (after T 75 ) when presenting the

Accepted Article
This article is protected by copyright. All rights reserved morphological traits (Fig. 4c), which is due to substantial measurement variations caused by overlapped roots at late stages. The analysis results of the experiment can be seen in Supporting Information Datasets S1 & S2.

Germination analysis for different crop seeds
To demonstrate the robustness and generalisation of the SeedGerm system, we have applied SeedGerm to score germination for a range of crop seeds. The germination analyses for four selected crop species are tomato, pepper, maize and barley (Fig. 5). Seed images at three different experiment stages can be seen in the first three columns of images in Figure 5. After conducting time-series seed imaging, we used SeedGerm software to measure germination and morphological traits. Each germination panel (enclosed by dotted rectangles coloured in red in Fig. 5) contains one genotype. Seeds in the panel were monitored continuously, with dissimilar durations due to varied research objectives, for example, 165 hours (7 days) for tomato (Groot & Karssen, 1992), 180 hours (8 days) for pepper (Smith & Cobb, 1991), 138 hours (6~7 days) for maize (Flórez et al., 2007), and 138 hours (6~7 days) for barley (Al-Karaki, 2001). These experiments were also checked by specialists daily, so that manual and SeedGerm scores can be compared and verified.
The tomato seed germination experiments were conducted in six panels (i.e. six genotypes), with 64 seeds per panel and 384 seeds monitored in total (Fig. 5a). Six cumulative germination curves have been produced based on hourly measurements for a seven-day period. We could clearly identify small differences amongst these genotypes between T 50 and T 75 , when germination rates diverted. Similarly, germination variances could also be quantified for pepper and barley experiments (Figs. 5b&c). The three barley genotypes monitored exhibited a wide variety of cumulative germination, similar to what has been reported previously (Matthews & Khajeh-Hosseini, 2007). Due to the size of maize seeds, we conducted one experiment per germination box (35 seeds per box, Fig. 5d). Still, SeedGerm software can perform sound measurement even when the number of germination experiments is changed. The above panel-and seed-level germination measures were exported and saved in several CSV files (see Supporting Information Datasets S3-6).
New morphological traits included in the SeedGerm analysis are seed convex area, seed extent and seed circularity, which have been used to quantify dynamics of germination of different crop Accepted Article seeds as they were difficult to assess using traditional approaches (TeKrony & Egli, 1991;Dell'Aquila, 2009). For example, using the seed convex area trait, we found that maize had the quickest establishment rate after T 50 , while other crop seeds were very similar (Fig. 5e). Due to substantial variations caused by too many overlapped radicals at the late germination stage, end image IDs for the above analysis are different. Similarly, panel-and seed-level morphological measures are saved in CSVs (see Supporting Information Datasets S7-10).

Validation of the SeedGerm platform
To validate analysis produced by SeedGerm, we have used a range of validation methods to comprehensively compare human and SeedGerm scores. A multitude of metrics were produced (Table 1), including Pearson's correlation metric (r) to measure the strength of the linear relationship between SeedGerm and manual scoring for cumulative germination rates. For all tested crop species, SeedGerm's cumulative predictions yield a Pearson's correlation greater than 0.98 (column two in Table 1), indicating a strong linear correlation and goodness of fit. Pearson's correlation (r) was used to evaluate the linear relationship between SeedGerm's true positive germination timings and their respective timings scored by seed scientists (column three in Table   1). In addition to the correlation metrics, we have also calculated the mean absolute error (MAE, column four in Table 1 Table 1), we can conclude that SeedGerm performed well across all tested crop species. The above methods evaluate both SeedGerm's final germination scoring as well as the germination timing of each seed, covering germination rate, timing and the uniformity respectively.
To visualise the correlation between SeedGerm scoring and seed specialists' counting, we have used 19 time series (over 4,000 images in total) to perform the correlation, with three series of maize (129 seeds in total), six series of tomato (384 seeds), six series of Brassica (384 seeds), one

Accepted Article
This article is protected by copyright. All rights reserved series of pepper (81 seeds), and three series of barley (120 seeds). Manual scoring was performed using the image series, where cumulative germinated seed counts for each image and the image ID for when each seed germinates were recorded. There is a strong correlation between SeedGerm's scoring and that of the manual observers, which can be seen in Figure 6. A predicted equals actual line (coloured red) is included (Fig. 6a) to show how SeedGerm's cumulative scores deviate from the manual scores. Additionally, line plots contrasting cumulative seed-by-seed scoring between SeedGerm and specialists' counting are shown in Figure 6b. SeedGerm's scoring is largely identical in comparison with manual counting, except for it tending to overestimate the number of germinated seeds in crowded experiments such as the later establishment stages for Brassica and tomato experiments.

SeedGerm as a research tool
To test the ability of SeedGerm to be used as a research tool in routine biological experiments, we used the B. napus Diversity Fixed Foundation Set (Harper et al., 2012) to detect genetic differences in seed germination. After setting replicate seed batches of each variety, biological replicates of 50 seeds were sowed in SeedGerm boxes in a randomised design. SeedGerm scored the germination parameters of 88 varieties with a range of germination behaviours, with some showing strong dormancy, while most seed lots germinated to high levels, but with varying kinetics. SeedGerm scored the T 10 , T 50 , T 90 and G max after 8 days (Figs. 7a&b). To test the accuracy of the SeedGerm outputs, 60 seed lots were also scored by a manual observer based on images. The agreement was strong (Supporting Information Dataset S11), except for T 90 in varieties requiring the longest time to germination, where SeedGerm has a weak tendency to score seeds as germinated before the manual observation.
The SeedGerm outputs were then used for associative transcriptomic (AT) analysis, as described previously (Harper et al., 2012). The AT found no significant associations between T 10 , T 50 and T 90 and polymorphisms in B. napus. However, we found a strong association between G max and genotype on chromosome A5, with both SNPs and gene expression markers (Harper et al., 2012) associated with the trait in this region (Figs. 7c-e). This is distinct from those loci identified in previous studies (Hatzig et al., 2015(Hatzig et al., , 2018, but significant, even after correcting for multiple testing. This region spans approximately 340kb and contains at least 69 known transcribed genes, one of which is a B. napus orthologue of the known germination regulator, protein phosphatase 2C

Accepted Article
This article is protected by copyright. All rights reserved known as HIGH ABA INDUCED 3 (HAI3) (Yoshida et al., 2006;Bhaskara et al., 2012), which has a role in seed sensitivity to abscisic acid. Although more work is needed to precisely identify the underlying gene of interest, it is evident that the SeedGerm platform is capable of automating phenotypic analysis of seed germination with sufficient accuracy to perform standard genetic analysis of seed dormancy and germination rate.

Automated seed phenotyping
Plant phenomics is a fast-developing research area focusing on obtaining meaningful phenotypic information to enable scientists to address diverse biological questions, from cellular organisms to populations in the field (Tardieu et al., 2017;Zhou et al., 2018;Furbank et al., 2019;Yang et al., 2020). To study seed germination and seedling vigour, many academic and industrial attempts have been made, including research-based tools such as Germinator, phenoSeeder, MultiSense and RSGES, as well as commercial solutions such as the PhenoSeeder platform (developed by Forschungszentrum Jülich, Germany), SeedAIXPERT and Germination Scanalyzer (www.lemnatec.com/products/seed-screening), and Seeds Automatic Germination Analyzer (SAGA, France, no longer trading). These methods are capable of carrying out seed imaging, advanced 3D seed morphological analysis (i.e. phenoSeeder), and germination related traits analyses; however, their applications are limited due to their costs, availability, automation level, analysis throughput, and the technical scalability.
In this study, we present the SeedGerm system, a platform that combines automated seed imaging and vision-based phenotypic analysis with cost-effective hardware to enable highthroughput analysis of seed germination experiments for a variety of crop species. Based on more than three years' experiments and system improvements, we believe that our system is easy-toaccess and capable of carrying out scalable seed germination scoring for the following reasons: its low-cost and easy-to-build hardware design, its flexibility to incorporate different experiments, its open-source and modular software design, its scalability of traits analyses, and the availability of user-friendly GUI software, source code and design documents.

Accepted Article
This article is protected by copyright. All rights reserved

The SeedGerm hardware design
In comparison with high-end seed phenotyping devices such as phenoSeeder (Jahnke et al., 2016) and its commercial version, our hardware design follows a low-cost and easy-to-build strategy.
The material used to build the device can be easily accessed (see Supporting Information Note S1). The germination box was made of either translucent plastic for the fixed design or transparent polyethylene for the more expensive gantry design, which can ensure reliable germination and seedling growth. To provide biologically relevant data from imbibition to seedling, an overhead image sensor (e.g. a Pi camera module or an HD USB camera) was installed to acquire highquality seed image series during the entire germination procedure. To increase the throughput of To assess and compare germination performance for different seed batches with varied treatments is often laborious and prone to errors. Previous work (Ligterink & Hilhorst, 2016;Mahajan et al., 2018) relies on CV and ML techniques to calibrate obtained images to ensure the soundness and the experiment. However, because experiment conditions (e.g. temperature and humidity) are also key to seed germination, we therefore have installed affordable environmental sensors (e.g. combined ambient temperature and humidity) and a fluorescent lighting device in the SeedGerm hardware to facilitate continuous experiment monitoring.
We followed open hardware suggestions (Gibney, 2016;Czedik-Eysenberg et al., 2018) to improve the flexibility of SeedGerm. The hardware design is freely available to the community and allows changes for other research requirements. For example, by replacing the image sensor

Accepted Article
This article is protected by copyright. All rights reserved with multi-or hyper-spectral cameras, seed germination can be studied beyond visible bands.
Also, adding a side image sensor (Humplík et al., 2015) or 3D imaging (Roussel et al., 2016) in the SeedGerm system can support the analysis of seedling growth with vertical information. Such hardware improvements can be carried out without any restriction, which is likely to provide flexible options for seed research rather than mainly relying on costly commercial solutions such as Germination Scanalyzer or SAGA. Although SeedGerm is low-cost, its design is capable of carrying out high-quality (e.g. each pixel equals to 0.15-0.2 mm) and automated seed imaging to provide sufficient visual evidence and sensor data for biological experiments. Also, the low-cost feature is prone to increase the scalability of SeedGerm, as more devices can be built relatively cheaply to accommodate more experiments, which is hard to achieve previously.

The SeedGerm software design
There is a growing need for standardising plant phenotyping in recent years, resulting in the ISA-Tab format (Sansone et al., 2012), minimal Information About Plant Phenotyping Experiments (MIAPPE) (Ćwiek-Kupczyńska et al., 2016), and ontology approaches to enable comparative phenomics research (Oellrich et al., 2015). Much previous work (Demilly et al., 2015;Nguyen et al., 2018;Wu et al., 2019) in seed phenotyping has employed bespoke data collection processes and data formats, limiting external researchers and laboratories to utilise and support these methods. Hence, when designing SeedGerm's software system, we chose to standardise the collection of image and sensor datasets following the ontological suggestions. Additionally, to calibrate images acquired by different SeedGerm devices, users were required to enter metadata to define their experiments, including experiment ID, genotype, biological replicates, treatment, and experiment duration; then, imaging intervals, image resolution, white balance, exposure mode, and shutter speed were controlled automatically by the imaging module to largely standardise the data collection.
To increase the scalability of the phenotypic analysis, we chose to implement our algorithms in Python instead of MATLAB as previously reported (Jahnke et al., 2016;Elmasry et al., 2019).
The reasons are that Python is easy-to-understand, cross-platform, and self-contained

Accepted Article
This article is protected by copyright. All rights reserved enabled us to extend and upgrade our software relatively easily. For example, new crop species and traits can be added to the core analysis algorithm through new modules, where guideline seed morphological features can be predefined. Also, we followed the modular software design, so that modules developed for one species can be shared by other functions in analysis and parallel computing.
Recently, deep learning has become a powerful technique used by some seed germination analysis software (Mahajan et al., 2018;Nguyen et al., 2018;Halcro et al., 2020), for which it was applied to extract features, segment seeds, and classify germination status. Although DL is relatively easy to implement through Python presently, the reasons we chose a combined CV and ML approach are: 1) DL requires a very large amount of training datasets to perform better than supervised ML and CV-based methods; for features that need to be engineered frequently such as varied seed germination experiments, DL might not be suitable. 2) normally we need to build a dedicated DL model of each species; hence, it is time-consuming and ineffective to employ DL techniques for analysing a large number of crop species. 3) DL is likely to be overfitting for particular experiment settings and becomes problematic when conditions are changed. To allow our solution to be adopted by a broader research community that has varied experimental settings, we chose supervised GMM, SGD and novelty detection learning techniques based on generalised feature selection. More importantly, by designing the ML models to reinitialise and retrain with background features at different establishment stages for each experimental setting, the learning models embedded in SeedGerm are dynamic and can be updated for each experiment, enabling us to avoid overfitting the learning models for a specific crop species or a particular experiment.
By employing CV algorithms, SeedGerm can also measure cumulative germination rates and seed morphologies such as size, width and length, extent and circularity to assess seed quality and seedling vigour, from germination to seedling. For example, we have measured imbibition using the change of seed size, radical protrusion based on seed major/minor ratio, and germination speed through seed extent. If new biological questions are proposed, new traits and features could be designed jointly by biologists and computer scientists, instead of relying on DL techniques blindly.
Because the SeedGerm software can be easily extended and accessed, we believe it is scalable and easy-to-access.

Accepted Article
This article is protected by copyright. All rights reserved

Applications of the SeedGerm system
Our work has demonstrated that SeedGerm is capable of scoring germination and measuring morphological changes automatically, for five major crop species and between different genotypes.
The results show that SeedGerm could be employed to score germination frequency and seedling vigour, based on which the preformation of seed batches can be assessed. These traits were regularly checked by experienced seed engineers and scientists in order to provide certificates of seed germination and establishment performance in seed testing and seed insurance (Khurana & Singh, 2001;Dell'Aquila, 2009). Hence, it is evident that SeedGerm has the potential to provide a replacement for manual assessment of germination frequency and radical emergence activities.
Furthermore, as many traits measured by SeedGerm are highly correlated with seed performance and the effectiveness of post-harvest seed enhancement processes, SeedGerm is likely to contribute towards seeds certification, guidance on sowing density, or even seed insurance in the future.
Besides routine seed testing on germination frequency, the applications of SeedGerm could also be expanded to the seed vigour (i.e. how fast and uniform radical emergence) through monitoring morphological traits, which are important for estimating canopy closure, weed suppression, and crop yields through seed research (Attree et al., 1992;Nelson et al., 2012;Paparella et al., 2015).
Beyond existing traits analyses, the continuous phenotypic analysis can extend our insights into the entire physiological procedure of germination to understand phenotypic effects of individual seed and seed batches under dissimilar treatments. Furthermore, we also set up a range of experiments to score germination rates and timing across a diverse panel of B. napus varieties to demonstrate the biological relevance of SeedGerm as a research tool to measure the effect of genetics. We showed that SeedGerm outputs can be used successfully for GWAS, identifying an association on B. napus chromosome A5 that explains the difference between high and low germinating varieties in the panel (Figs. 7b-e). Although the GWAS study identified associations over a 100kb region, this region does contain one gene BnaA05g27660D, a homologue of Arabidopsis AHG3, known to regulate ABA signalling during germination in Arabidopsis (Yoshida et al., 2006), which would be a strong candidate for further study. The low-germinating allele is only present in older spring varieties including Bronowski and Duplo, suggesting that it has been consistently selected against by modern oilseed rape breeders. Hence, we believe that

Accepted Article
This article is protected by copyright. All rights reserved SeedGerm has a great potential to have significant utilities in seed germination scoring and seed testing, for both research and routine seed technology applications.

Issues of SeedGerm
It is also important to point out some edge cases where the system has struggled. Due to camera position and lighting problems, some image series were of poor quality. Although we have added software calibration to allow users to improve the classification accuracy on the low-quality datasets (e.g. colour features), the analysis could still suffer. For such datasets, only through SeedGerm are dynamic for each experiment, the cost of such a design is that additional computational resources are required, demanding users to build a decent computer (i7 CPU with 16GB memory) to perform analysis. Notably, to maintain the reliability of the parallel computing, we do not recommend more than eight tasks to be paralleled on an average computer, because processing multiple image series simultaneously requires a high demand of computing resources and some Python functions have been locked because they are not thread-safe during multi-thread processing.

Conclusion
In conclusion, limitations of current seed imaging and scoring approaches have prevented automated and scalable analysis of seed germination. In this paper, we present the SeedGerm system that integrates cost-effective hardware and user-friendly software for performing seed imaging and ML-based analysis for measuring establishment-and germination-related traits. The

Accepted Article
system has been applied to many germination experiments for five crop species, through which we could assess the performance of seed batches quantitatively. Morphological traits such as seed size, width and length, extent and circularity were also measured to provide insights into the physiological procedure of seed germination. We demonstrate that SeedGerm matches seed specialists' observations for the scoring of radicle emergence timing and its biological relevance in identifying a gene important in ABA signalling in seeds with associative transcriptomics. We trust that the SeedGerm system could have wide utilities in seed testing and germination scoring, for both research and industrial applications.  Tables   Table 1 Table of

Accepted Article
This article is protected by copyright. All rights reserved   Cumulative germination curves, uniform and G max plots produced to score seed quality and seed vigour together with dynamic seed masks recording entire germination procedures for different genotypes in a germination box.

Accepted Article
This article is protected by copyright. All rights reserved traits such as seed area (in pixels), W/L ratio (0~1), seed circularity (0~1), and convex area (in pixels). Coloured shading areas denote confidence intervals, between the 15 th and 85 th percentiles of the data.