METHOD AND APPARATUS FOR SORTING SEEDS

Info

Publication number: 20250003871
Type: Application
Filed: Nov 15, 2022
Publication Date: Jan 2, 2025
Applicant: KWS SAAT SE & CO. KGAA (Einbeck)
Inventors: Jacob LAGE (Thriplow Cambridgeshire), Christian UTSCHIG (Einbeck), Christian Bernd HIRSCHMANN (Einbeck)
Application Number: 18/710,322

Abstract

The present invention relates to a method for categorizing/sorting seeds, the method comprising the steps of: providing (S11) a sample including at least one seed; obtaining (S12) a near infrared, NIR, spectrum of at least a subset of the sample; determining (S14) presence of an organic colorant in at least the subset of the sample based on the obtained NIR spectrum; and categorizing/sorting (S19) at least the subset of the sample based on the determination. Based on this, mis-colored white seed and blue seed with a fading blue color can be properly categorized. Further, it is also possible to distinguish between single blue and double blue seed.

Description

Description

FIELD OF THE INVENTION

The present invention relates to an improved method for categorizing/sorting seeds, particularly blue and white seeds in the basic seed production step of a BLue Aleurone (BLA) hybrid cereal system. The present invention is also directed to an apparatus for categorizing/sorting seeds and a corresponding mobile device comprising said apparatus.

TECHNOLOGICAL BACKGROUND

Essential for any hybrid system is the production of male-sterile female parents. WO 92/01366 A1 discloses a male sterility system which allows the maintenance of male sterility that can be used in the production of hybrid cereal plants, in particular hybrid wheat plants. Male sterility can be achieved by possessing a homozygous deletion on the short arm of chromosome 4B in wheat. The deletion typically used is the ‘Probus’ deletion (Fossati A, Ingold M. 1970. A male sterile mutant in Triticum aestivum. Wheat Inform Serv 30:8-10).

Recently, the ms1 gene located in the region concerned by the deletion has been identified as the causative gene. If this gene is deleted physically or knocked out/down by a mutation or targeted modification (e.g. WO 2016/048891 A1), then a reliable male sterility can be established. Fertility can then be easily restored when a wheat line carrying homozygously the deletion or the mutation/modification, is crossed with any normal wheat. Resulting progenies or hybrids are fertile as the deletion or the mutation/modification is only heterozygously present. Thus, plants or plant lines comprising the above deletion or mutation/modification are suitable for use in producing hybrid plants.

However, in order to maintain the male-sterile female parent further components are needed. As such, WO 92/01366 A1 teaches the use of a male parent that is isogenic to the female but having an alien addition chromosome bearing a dominant male fertility restorer gene from Triticum boeticum on the short arm and the BLue Aleurone (BLA) gene from Agropyron elongatum on the long arm, in a cross with the female parent for maintenance of the male sterile female parent, whereby the BLA gene, if expressed, confers a characteristic blue coloration of the progeny seed.

The BLA hybrid cereal system relies on the effect sorting of blue and normal colored seed, wherein the normal colored seed is subsequently referred to as “white” seed but covers the normal spectrum of seed color, produced during the basic seed production stage. Therein, blue seed is planted, and a mix of blue and white seed is harvested. The white seed will grow into male sterile seed and become the female component of hybrid seed production. For maintenance breeding, the blue seed can be replanted for another round of blue-white seed production. The color of the blue seed is caused by a blue color marker, which is co-expressed with a fertility-related gene.

According to the prior art, the blue and white seed can be mechanically separated using commercially available seed sorters equipped with optical cameras that detect the seed color. However, normal environmental conditions can result in the white seed becoming mis-colored, and blue seed can lose some of their distinct blue color. Either of these can result in difficulties when a separation is made based on visual light spectrum.

The solutions of the prior art lead to very poor separation efficiency and several rounds of separation may be required to achieve purely separated seed pools. Seed purity directly affects the quality of seed breeding and subsequent processing products. For example, in the process of seed harvest and storage, the impurities or hybrids may be mixed in the normal seed, which results in the economic losses to agricultural production and processing. Therefore, it is crucial to sort impurities and hybrids to ensure that the seed purity meets the market criteria.

Furthermore, due to genetic defects, unwanted dark blue seed may be generated in some cases, which is subsequently referred to as double blue seed. Since such double blue seed cannot be used in breeding pool development, it is desired to discard the same.

It is therefore an object of the present invention to provide a method that allows to reliably detect a difference in blue seed, comprising single blue seed and double blue seed, and white seed regardless of a cereal type and variety.

SUMMARY OF INVENTION

According to a first aspect, the invention is directed to a method for categorizing/sorting seeds, the method comprising the steps of: providing a sample including at least one seed (single kernel); obtaining a near infrared (NIR) spectrum of at least a subset of the sample; determining presence of an organic colorant in at least the subset of the sample based on the obtained NIR spectrum; and categorizing/sorting at least the subset of the sample based on the determination.

In contrast to the prior art, which suggests distinguishing between blue seed and white seed using visual light, the present solution uses light in the NIR spectrum to determine whether an organic colorant is present.

NIR spectroscopy is a spectroscopic method that uses the near infrared region of the electromagnetic spectrum (ranging from 780 nm to 2500 nm). Typical applications include medical and physiological diagnostics and research. NIR spectroscopy is based on molecular overtone and combination vibrations. Such transitions are forbidden by the selection rules of quantum mechanics. As a result, the molar absorptivity in the NIR region is typically quite small. One advantage is that NIR can typically penetrate much further into a sample than mid-infrared radiation. NIR spectroscopy is therefore very useful in probing bulk material with little or no sample preparation. Using NIR spectrometry for detecting colorants advantageously allows for identifying a well-defined target marker with quantitative precision in a destruction-free measurement that can be carried out in addition to visual inspection.

Plant pigments are colored substances produced by plants and are important in controlling photosynthesis, growth, and development. (P. Sundhakar et al. Phenotyping Crop Plants for Physiological and Biochemical Traits). In the present specification, the term plant pigment may have the same meaning as organic colorant. The organic colorant may comprise at least one colorant of the group including betalains, carotenoids, anthocyanins, flavonoids, anthraquinone and/or chlorophylls, and others known to the skilled person. For example, the blue seed color may be a result of an anthocyanin in the aleurone layer of a cereal seed, wherein the present invention is not limited thereto. Anthocyanins are, for example, used as natural color pigment for the food industry and are a water-soluble flavonoid compound that is responsible for the appearance of certain colors in nature and may appear—depending on the pH—as red, purple, or blue. However, anthocyanins are susceptible to gradual degradation when exposed to certain food processing methods or even through prolonged/improper storage. Chemical assays exist, which allow to determine a total anthocyanin content but said assays often require destroying the sample. In contrast, spectroscopy-based assays are simple, fast, and non-destructive analytical tools and hence, may be used in determining the total anthocyanin content e.g. in cereal grains. It was surprisingly found that anthocyanin content can be determined with high precision using NIR spectrometry even when gradual degradation impeded visual inspection.

Based on this, the present invention provides an approach for detecting and sorting seed having an elevated level of a certain organic colorant, such as (but not limited to) anthocyanin, which is caused by a blue color marker being co-expressed with a fertility-related gene. Since the presence of an organic colorant—such as anthocyanin—is determined, the present invention is also able to detect when e.g. a blue color fades. This is the case because there is a correlation between NIR spectra and the amount of anthocyanins in seed. Accordingly, by using the NIR spectrum, it is possible to detect a difference in concentration between single blue seed and double blue seed, particularly since the level of anthocyanins in double blue seed is approximately twice as high as in single blue seed.

According to an aspect of the present disclosure, the method may further comprise the step of identifying a signal associated with the organic colorant in the obtained NIR spectrum, wherein the categorizing/sorting of at least the subset of the sample is based on the identified signal. Since the identified signal corresponds to an NIR spectroscopy reflectance measurement, it is possible to determine whether a certain organic colorant is present. Alternatively, an absorption measurement may be likewise conducted to determine whether a certain organic colorant is present. Therein, the identified signal is in the range of 15500 cm⁻¹to 400 cm⁻¹, including sub-ranges of 15500 cm⁻¹to 6000 cm⁻¹and 6000 cm⁻¹to 400 cm⁻¹. Another aspect of the present disclosure provides that the identified signal is in one of the ranges 25000 cm⁻¹to 3597 cm⁻¹(400-2780 nm) comprising sub-ranges 25000 cm⁻¹to 11764 cm⁻¹(400-850 nm) and 11111 cm⁻¹to 3597 cm⁻¹(900-2780 nm) as well as 16000 cm⁻¹to 11111 cm⁻¹(625-900 nm), 15385 cm⁻¹to 10526 cm⁻¹(650-950 nm), 8547 cm⁻¹to 8000 cm⁻¹(1170-1250 nm), 7519 cm⁻¹to 7092 cm⁻¹(1330-1410 nm), 6250 cm⁻¹to 5917 cm⁻¹(1600-1690 nm) and 5263 cm⁻¹to 4167 cm⁻¹(1900-2400 nm). This is particularly advantageous since wavelengths in these ranges corresponded to the UV/visible absorption bands of organic colorants such as anthocyanin. It has been found that such characteristic NIR signals of organic colorants are reliably detected even when optical inspection is deteriorated. The following table summarized wavelengths, which are applied according to the present invention.

TABLE A Wavelength [nm] Wavenumber [cm−1] 400-2780 25000-3597 400-850 25000-11764 625-900 16000-11111 900-2780 11111-3597 650-950 15385-10526 1170-1250 8547-8000 1330-1410 7519-7092 1600-1690 6250-5917 1900-2400 5263-4167

The ranges 25000 cm⁻¹to 11764 cm⁻¹(400-850 nm) and 11111 cm⁻¹to 3597 cm⁻¹(900-2780 nm) are particularly advantageous for a classification of single blue seed and double blue seed.

Considering the respective wavelength regions or classifications, the following ranges are given:

TABLE B Wavelength region Wavelength range [nm] Visible light/limited NIR 400-850 Visible light/complete NIR 400-2780 NIR 900-2780 Artificial RGB 575-625 (red) 475-575 (green) 425-500 (blue)

According to a further aspect, the method may further comprise the step of: determining an amount of the organic colorant in at least the subset of the sample based on the obtained NIR spectrum, wherein the categorizing/sorting of at least the subset of the sample is based on the determined amount of the organic colorant. Hence, a quantitative determination of colorant is achieved and a distinction between single blue seed and double blue seed, i.e. blue seed with different levels of organic colorants such as anthocyanin, can be made. Further, bulk measurements for quality control can be performed. Therein, low thresholds can be set for the amount of colorant, e.g., for pre-assessing bulk samples for presence of any colorant.

Another aspect of the present disclosure provides that the NIR spectrum is obtained for a bulk sample, wherein the method further comprises the steps of: normalizing the determined amount of the organic colorant with respect to a size of the bulk sample; comparing the normalized amount of organic colorant with a predefined threshold; and categorizing/sorting the bulk sample based on the comparison. In this way, bulk measurements can be carried out for quality control, so that it can be determined that less than a certain percentage or amount of e.g. blue seed, comprising single blue seed and double blue seed, is contained in a bag declared as white seed. The size of the bulk sample is preferably determined as a mass and/or a volume of the bulk sample. The predefined threshold preferably allows for meeting quality standards. It is also possible to measure an overall intensity (i.e. a total amount) of a colorant, for example by determining that ten blue seeds are present in 1000 seeds.

A further aspect of the present disclosure requires the steps of: obtaining a first NIR spectrum of a first subset of the sample and a second NIR spectrum of a second subset of the sample; determining presence of the organic colorant in the first subset by classifying the first NIR spectrum and in the second subset by classifying the second NIR spectrum; categorizing/sorting the first subset and the second subset based on the classification, wherein the classification of the NIR spectrum is performed using at least one of a principal component analysis (PCA), a partial-least-squares-regression (PLS-R) or partial least squares discriminant analysis (PLS-DA) model based on labeled spectra, a trained support vector machine classification (SVM-C) or support vector machine regression (SVM-R) model, a trained artificial neural network (ANN), or a k-nearest neighbors algorithm (k-NN). Using any of these classifications allows to effectively differentiate between samples of a first NIR spectrum and samples of a second NIR spectrum, wherein said spectra pertain to different organic colorants.

According to a preferred aspect, a plurality of NIR spectra is obtained for a plurality of subsets of the sample, wherein presence of the organic colorant is determined for each of the plurality of subsets based on classifying the respective NIR spectrum of the plurality of NIR spectra using at least one of PCA, PLS-R, PLS-DA, SVM-C, SVM-R, ANN, or k-NN, wherein each of the plurality of subsets is categorized based on the respective classification, and wherein each of the plurality of subsets is sorted based on its categorization. Hence, it is possible to distinguish between more than two sample subsets such as single blue seed, double blue seed and white seed. The method of the present disclosure may further comprise the step of: sorting the first subset and the second subset based on the classification, wherein a subset of the sample preferably consists of a single seed.

Another aspect of the present disclosure provides that the method may further comprise the step of: obtaining an image of at least the subset of the sample, wherein the presence of an organic colorant in at least the subset of the sample is determined based on the obtained NIR spectrum and based on the image. By using NIR spectroscopy in combination with visual light, a precision according to which blue and white seed can be detected is further increased. Preferably, the image may be a photometric image. More preferably, the NIR spectroscopy can be used in combination with visible light with more than one camera for sorting the seed. Alternatively, seed will go through more than one cycle of sorting which could be a first cycle with visible light and a second cycle with NIR spectroscopy only. Further, a combination of visible light and NIR spectroscopy may be applied in at least one of the cycles or only NIR spectroscopy may be used in the cycles.

A further aspect of the present invention is to provide a method for precise seed sorting by using NIR in combination with other state of the art sorting methods including visual sorting or imaging sorting (e.g. by combination with at least one camera), x-ray sorting, Raman Spectroscopy and any other suitable spectroscopic methods.

According to a second aspect, the invention is directed to an apparatus for categorizing/sorting seeds, said apparatus comprising: a near infrared (NIR) spectrometer, including a light source configured to emit light with a wavelength in between 650 nm and 2500 nm; a detector configured to detect a NIR spectrum in a range of 15500 cm⁻¹to 400 cm⁻¹; a sample holder configured to hold a sample relative to the light source and the detector to perform NIR measurements in a transmission and/or reflection mode; and means adapted to execute the steps of the method described above.

According to a further aspect, the apparatus may further comprise an image sensor configured to obtain an image of at least the subset of the sample and means adapted to execute the steps of the above method and/or a feeder system configured to feed at least the subset of the sample to the NIR spectrometer, preferably configured to feed at least the subset of the sample to the NIR spectrometer and the image sensor. However, any suitable sensor may be combined with NIRS to facilitate and improve sorting efficiency as well as for assessing quality parameters with respect to necessary quality check.

According to a preferred aspect, a computer program is provided, said computer program comprising instructions to cause the above apparatus to execute the steps of the above method.

According to a third aspect, the invention is directed to a mobile device comprising the above apparatus.

According to a fourth aspect, the invention is directed to the use of the above method and/or the above apparatus according for categorizing/sorting seeds of a hybrid seed system, the hybrid seed system comprising seeds including an organic colorant.

Another aspect of the present disclosure provides that the hybrid seed system is a hybrid wheat system comprising a monosomic alien addition chromosome carrying a male fertility restorer gene and a color marker gene as a selection marker gene and that the color marker gene is able to confer a characteristic coloration and is co-expressed with a fertility-related gene.

The present invention may be applied during basic seed production as opposed to sorting male seed from hybrid seed, and allows for efficient separation of white and blue seed even when the color difference is not so clear in the visual light spectrum. Further, sorting single blue and double blue seed allows to distinguish the seed based on a concentration of e.g. anthocyanins. By using the NIR spectrum, it is possible to detect a difference between blue seed and white seed regardless of the variety even without using visual light.

Further aspects and preferred embodiments of the invention result from the dependent claims, the drawings and the following description of the drawings. Different disclosed embodiments are advantageously combined with each other if not explicitly stated otherwise.

BRIEF DESCRIPTION OF THE DRAWINGS

The features of the invention become apparent to those skilled in the art by the detailed description of exemplary embodiments with reference to the attached drawings in which:

FIG. 1 schematically illustrates an embodiment of a method for categorizing/sorting seeds;

FIG. 2 illustrates results of a principal component analysis (PCA) performed on a sample including a plurality of seeds/kernels;

FIG. 3 illustrates results of a PCA performed on single seeds/kernels;

FIG. 4 illustrates a graph indicating a correlation between predicted and reference values of a total anthocyanin content (TAC);

FIG. 5 illustrates the spectral information distribution for different wavelengths for anthocyanin; and

FIG. 6 schematically illustrates at which stages of a seeding/harvesting cycle the present invention may be applied.

DETAILED DESCRIPTION OF THE INVENTION

Reference will now be made in detail to embodiments, examples of which are illustrated in the accompanying drawings. Effects and features of the exemplary embodiments, and implementation methods thereof will be described with reference to the accompanying drawings. In the drawings, like reference numerals denote like elements, and redundant descriptions are omitted. The present invention can be embodied in various different forms, and should not be construed as being limited to only the illustrated embodiments herein. These embodiments are provided as examples so that this disclosure will be complete and will fully convey the aspects and features of the present invention to those skilled in the art.

Accordingly, elements not considered necessary to those having skill in the art for a complete understanding of the features of the present invention may not be described.

As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items. Further, the use of “may” when describing embodiments of the present invention refers to “one or more embodiments of the present invention.” In the following description of embodiments of the present invention, the terms of a singular form may include plural forms unless the context clearly indicates otherwise.

It will be understood that although the terms “first” and “second” are used to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another element. For example, a first element may be named a second element and, similarly, a second element may be named a first element, without departing from the scope of the present invention. As used herein, the term “substantially”, “about,” and similar terms are used as terms of approximation and not as terms of degree, and are intended to account for the inherent deviations in measured or calculated values that would be recognized by those of ordinary skill in the art. Further, if the term “substantially” is used in combination with a feature that could be expressed using a numeric value, the term “substantially” denotes a range of +/−5% of the value centered on the value.

FIG. 1 schematically illustrates an embodiment of a method for categorizing/sorting seeds. According to step S11, a sample including a plurality of seeds is provided. Subsequently, according to step S12, a near infrared (NIR) spectrum of at least a subset of the sample is obtained.

Based on the obtained NIR spectrum, presence of an organic colorant is determined in at least the subset of the sample as illustrated in step S14. Finally, according to step S19, the method further comprises categorizing/sorting at least the subset of the sample based on the determination. It has to be noted that steps shown in dashed boxes in FIG. 1 represent optional steps.

For example, as an optional step S13, it is provided that the method further comprises the step of identifying a signal associated with the organic colorant in the obtained NIR spectrum. By means of this, the categorizing/sorting of at least the subset of the sample is based on the identified signal.

According to a preferred embodiment, the method may further comprise optional step S15 according to which an amount of the organic colorant in at least the subset of the sample is determined based on the obtained NIR spectrum. By means of this, the categorizing/sorting of at least the subset of the sample can be based on the determined amount of the organic colorant.

In case the NIR spectrum is obtained for a bulk sample, the method further comprises normalizing the determined amount of the organic colorant with respect to a size of the bulk sample as illustrated in step S16. Subsequently, according to step S17, the normalized amount of organic colorant is compared with a predefined threshold. After that, a categorization of the bulk sample is performed based on the comparison in step S18.

FIG. 2 illustrates the results of a principal component analysis (PCA) performed on a sample including a plurality of seeds/kernels. Therein, the scores plot/scores room shows a clear difference between white seed (w), which is represented by triangles, and blue seeds, which is represented by squares (single blue seed; sb) and circles (double blue seed; db).

In general, in order to verify the performance of using NIR spectroscopy to distinguish between different seed, the following set of eight different wheat variety samples was exemplarily chosen, wherein all samples contain white, single blue, and double blue seed.

TABLE 1 Sample overview Sample: Variety: 1 ClevelandBla1 2 GymnastBla3 3 KielderBla2 4 LiliBla1 5 LoftBla1 6 SolehioBla1 7 SolehioBla4 8 TrinityBla1

These eight samples were used to develop a method to distinguish, based on NIR reflectance measurements, between white and blue (comprising both, single blue and double blue) seeds on the one hand, and white, single blue and double blue seeds on the other hand.

More specifically, the samples were measured using a Bruker MPA NIRS spectrometer and small beakers comprising 20 g to 25 g samples, which corresponds to about 500 seeds or kernels. The beakers were placed in a rotating holder of the spectrometer and measured in triplicate each (refilled and remeasured) and thus three spectra were recorded for each sample. The technical specification of the Bruker MPA NIRS spectrometer is given below.

TABLE 2 Bruker MPA NIRS spectrometer specification Detector type: Lead sulphide (PbS) Spectral range [nm]: 800-2780 Spectral range [wavenumber]: 12500-3600 Spectral resolution: 8 cm⁻¹ Number of scans: 32

Furthermore, to minimize scattering and reflection losses, a half sphere was placed on top of the kernels.

As indicated above, all spectra were labelled with a specific code (w, sb, db) according to their color classes. Further, for a regression test, the labels 0, 1 and 2 were provided to the white, single blue, and double blue classes, respectively.

TABLE 3 Codes and labels of color classes Labelling approach Labelling approach Sample color: classification: regression: White w 0 Single blue sb 1 Double blue db 2

For the blue and white separation test, the single blue samples as well as the double blue samples were labelled as blue, which leads to the following number of samples for the two different approaches.

TABLE 4 Measurement approaches and samples Samples Samples Samples Samples single double Samples Approach: white: blue: blue: blue: total: White, blue 8 16 — — 24 White, single blue, 8 — 8 8 24 and double blue

Referring back to FIG. 2, the result of the PCA is used to determine the effectiveness/potential of different classification approaches. In particular, a classification of the samples was performed using a partial-least-squares-regression (PLS-R) model and a support vector machine regression (SVM-R) model, wherein no spectra pre-treatment was applied for both approaches.

A PLS-R model requires numbers as input and delivers numbers as predictions. Based on the labels 0 (for white), 1 (for single blue) and 2 (for double blue), predictions were classified using thresholds. For example, a prediction value smaller than 0.5 corresponds to white, a prediction value greater than or equal to 0.5 but smaller than 1.5 corresponds to single blue, and a prediction value greater than or equal to 1.5 corresponds to double blue. Accordingly, the thresholds are chosen in order to decide to which class a predicted spectrum belongs.

As indicated above, the scores plot of FIG. 2 shows a clear difference between white seeds, which are represented by triangles, and blue seeds, which are represented by squares (single blue seed; sb) and circles (double blue seed; db).

a) Assemblage of Kernels; White and Blue; PLS-R

As it is apparent from the below shown table, a blue and white separation test based on the PLS-R, which is a model based on linear regression, shows an accuracy of 100%.

TABLE 5 White/blue classification confusion matrix of PLS-R model Predicted class blue white True class blue 48 (100% classified) white 24 (100% classified)

The numbers in the confusion matrix show the number of samples classified and misclassified with the respective percentage of classified and misclassified shown in brackets.

According to this example, 48 blue kernels (true class) are properly predicted as blue and 24 white kernels are properly predicted as white.

b) Assemblage of Kernels; White, Single Blue and Double Blue; PLS-R

As indicated below, the PLS-R classification model for white, single blue and double blue shows a 100% accuracy in predicting the white samples.

However, the accuracy in predicting single blue and double blue is lower with 91.7% for single blue and 83.3% for double blue. In detail, two single blue kernels were misclassified as double blue and four double blue kernels were misclassified as single blue.

Nevertheless, all white samples were identified as white and all blue samples, including single blue and double blue, were identified as blue samples.

TABLE 6 White/single blue/double blue classification confusion matrix of PLS-R model Predicted class single blue double blue white True single 22 2 class blue (91.7% (8.3% classified) misclassified) double 4 20 blue (16.7% (83.3% misclassified) classified) white 24 (100% classified)

Within the predicted class for single blue, there is a classification of 84.6% and a misclassification of 15.4% since 22 of the overall 26 samples in this predicted class have been classified correctly.

Further, within the predicted class for double blue, there is a classification of 90.9% and a misclassification of 9.1% since 20 of the overall 22 samples in this predicted class have been classified correctly.

c) Assemblage of Kernels; White and Blue; SVM-R

SVM-R uses a support vector which allows a finer classification if there are, for example, non-linearities in the data. The white and blue classification according to the SVM-R model also shows a 100% accuracy.

TABLE 7 White/blue classification confusion matrix of SVM-R model Predicted class blue white True class blue 48 (100% classified) white 24 (100% classified)

Hence, all blue kernels are properly predicted as blue and all white kernels are properly predicted as white.

d) Assemblage of Kernels; White, Single Blue and Double Blue; SVM-R

However, the accuracy in the white, single blue and double blue prediction using SVM-R is not 100% for all classes. Said three-class model shows a 100% accuracy for white and double blue samples and an 83.3% accuracy for single blue samples. This means that 16.7% of the single blue samples were identified as double blue samples.

TABLE 8 White/single blue/double blue classification confusion matrix of SVM-R model Predicted class single blue double blue white True single 20 4 class blue (83.3% (16.7% classified) misclassified) double 24 blue (100% classified) white 24 (100% classified)

Hence, with the PLS model as well as with the SVM-R model, it is clearly possible to distinguish between blue (regardless whether single blue or double blue) and white samples with a 100% accuracy when an analysis is performed on a subset including a plurality of seeds (kernels).

FIG. 3 illustrates the results of a PCA performed on single seeds (kernels). Therein, the scores room shows the differences between white and blue kernels (comprising both, single blue and double blue kernels). Since the triangles (corresponding to the white kernels) are located in the lower part of the scores plot, whereas the squares/circles (corresponding to the blue kernels) are located in the upper part of the scores plot, white and blue kernels are visually separated.

However, the separation is not as obvious as in the embodiment of FIG. 2, which pertains to whole sample measurements, i.e. measurements performed on a subset including a plurality of seeds or kernels. Single blue and double blue kernels are more difficult to separate in comparison to the whole sample measurement of FIG. 2 although there are differences in the respective spectra.

In order to investigate the performance of the inventive classification based on NIR spectroscopy, both PLS and SVM models were tested.

e) Single Kernel; White and Blue; PLS-R

The blue and white classification model based on the PLS-R for single kernels shows the same 100% accuracy as for whole samples, which has been described above in a).

TABLE 9 White/blue classification confusion matrix of PLS-R model for single kernel Predicted class blue white True blue 48 class (100% classified) white 24 (100% classified)

f) Single Kernel; White, Single Blue and Double Blue; PLS-R

However, the PLS-R classification model for white, single blue and double blue shows a lower accuracy when predicting all three classes of kernels.

TABLE 10 White/single blue/double blue classification confusion matrix of PLS-R model for single kernel Predicted class single blue double blue white True single 14 7 3 class blue (58.3% (29.2% (12.5% classified) misclassified; misclassified; overall: 41.7%) overall: 41.7%) double 4 20 blue (16.7% (83.3% misclassified) classified) white 4 20 (16.7% (83.3% misclassified) classified)

For all classes, there are false positive rates ranging from 16.7% (white kernels or double blue kernels being misclassified as single blue kernels) to 41.7% (single blue kernels being misclassified as double blue kernels and white kernels).

Hence, although the white and blue PLS-R classification delivers a 100% accuracy, this model has drawbacks in distinguishing the single kernel samples for the three-class approach.

g) Single Kernel; Single Blue and Double Blue; PLS-R

As it is apparent from the table shown below, a model for the separation of just single blue and double blue kernels does also not deliver better results than the three-class model.

TABLE 11 Single blue/double blue classification confusion matrix of PLS-R model for single kernel Predicted class single blue double blue True class single blue 18 6 (75% classified) (25% misclassified) double blue 5 19 (20.8% misclassified) (79.2% classified)

Said model has accuracies below 80% (i.e. 75% of the single blue kernels being properly classified and 79.2% of the double blue kernels being properly classified) and hence, is slightly inferior to the three-class PLS-R model described above in section f), which has accuracies of 83.3% for both single blue and double blue.

h) Single Kernel; White and Blue; SVM-R

The SVM-R model shows a comparable performance in comparison to the PLS-R model described in section e).

In detail, the white and blue classification shows a slightly lower accuracy of 91.7%.

TABLE 12 White/blue classification confusion matrix of SVM-R model for single kernel Predicted class blue white True class blue 47 1 (95.8% classified) (4.2% misclassified) white 1 23 (8.3% misclassified) (91.7% classified)

i) Single Kernel; White, Single Blue and Double Blue; SVM-R

However, the accuracy in the white, single blue and double blue is lower for all classes when being compared to the results of the PLS-R described in section f).

TABLE 13 White/single blue/double blue classification confusion matrix of SVM-R model for single kernel Predicted class single blue double blue white True single 17 6 1 class blue (70.8% (25.0% (4.2% classified) misclassified) misclassified) double 8 16 blue (33.3% (66.7% misclassified) classified) white 2 1 21 (8.3% (4.2% (87.5% misclassified) misclassified) classified)

The three-class model shows a 87.5% accuracy for white, a 70.8% accuracy for single blue and a 66.7% accuracy for double blue samples.

j) Single Kernel; Single Blue and Double Blue; SVM-R

An SVM-R model trained only with single blue and double blue kernels has a lower accuracy than predicting blue and double with the three-class model as explained in section i).

TABLE 14 Single blue/double blue classification confusion matrix of SVM-R model for single kernel Predicted class single blue double blue True class single blue 17 7 (70.8% classified) (29.2% misclassified) double blue 5 19 (20.8% misclassified) (79.2% classified)

Thus, a three-class model (comprising white, single blue and double blue) shows a better performance to distinguish between single blue and double blue kernels in comparison to a model that is based just on two classes, i.e. to distinguish single blue and double blue.

In summary, the PLS-R and the SVM-R model both show that there is the potential to distinguish between white and blue samples with nearly 100% accuracy. Hence, adding extra information from NIR radiation improves existing color sorting techniques that are based on visual light only.

In addition to the above, further measurements have been performed using another sample set. This sample set comprises 12 individual kernels per color per variety, except for the variety SolehioBla4, for which no double blue kernels are available. Hence, the sample set consists of 96 kernels for white, 96 kernels for single blue and 84 kernels for double blue, i.e. 276 single kernels in total.

For the blue and white separation test, the single blue samples as well as the double blue samples were again labelled as blue, which leads to the following number of samples for the two different approaches.

TABLE 15 Measurement approaches and samples Samples Samples Samples Samples single double Samples Approach: white: blue: blue: blue: total: White, blue 96 180 — — 276 White, single blue, 96 — 96 84 276 and double blue

To receive visible light and NIR spectra, the kernels were measured placed inside a Quantitative Filter Technique Integrating Cavity Absorption Meter (QFT-ICAM) equipped with an Avantes AvaSpec-ULS2048XL spectrometer having the following specification:

TABLE 16 AvaSpec-ULS2048XL spectrometer specification Back-thinned CCD image sensor, Detector type: 2048 pixels Spectral range [nm]: 200-1160 Spectral range [wavenumber]: 50000-8620 Spectral resolution: 2 nm Number of scans: 500

Furthermore, the kernels were measured using the Bruker MPA NIRS spectrometer (specified above in table 2) by placing them on the top of the sensor. As already indicated above, to minimize scattering and reflection losses, a half sphere was placed on top of the kernels and all measurements were done in triplicate.

To receive a full spectrum (i.e. visible light/NIR), ranging from 400 nm to 2780 nm, both spectra from the QFT-ICAM spectrometer and the Bruker MPA spectrometer were combined. In order to identify the relevant spectral regions to distinguish between white, single blue and double blue kernels, classification models were calculated for three different wavelength regions as indicated above in table B (i.e. VIS/limited NIR, VIS/complete NIR, NIR). In addition, artificial RGB was calculated to simulate a standard RGB sensor, wherein RGB bands were calculated in a simplified manner as mean absorption of 575-625 nm (red), 475-575 nm (green) and 425-500 nm (blue).

A classification of the samples was performed using a Subspace Discriminant (SD) classification model was used, wherein no spectra pre-treatment was applied for this approach.

Compared to the foregoing explanations, a different approach has been carried out by performing a classification based on the anthocyanin content instead of performing a classification on a preselected sample set. Hence, the anthocyanin content of the 276 single kernels was determined. For this purpose, the kernels were milled and the anthocyanin was extracted using a methanol extraction.

An anthocyanin standard solution was prepared to establish a calibration curve for an UV/VIS photometer and all samples were measured in the photometer (extinction at 535 nm) and the corresponding anthocyanin values (i.e. the total anthocyanin content; TAC) were calculated. The spectra and the determined anthocyanin contents were used as input to develop PLS-models according to the defined wavelength ranges indicated in table B.

To identify specific wavelength ranges to classify single blue, double blue and white seed, the four different wavelength ranges (see table B above) were used as input for an SD-model. The results are presented below:

1) White, Single Blue and Double Blue; SD Model; Visible Light/Complete NIR

As it is apparent from table 17, a good classification can be achieved even for double blue seeds in case the VIS/complete NIR range is used. Accordingly, this model allows to clearly distinguish between blue (comprising single blue and double blue) and white kernels with approximately 99% accuracy. In other words, VIS/NIR complete allows to distinguish between white, single blue and double blue kernels with a high precision.

TABLE 17 White/single blue/double blue classification confusion matrix of SD model for VIS/NIR complete range Predicted class single blue double blue white True single 267 6 3 class blue (96.7% (2.2% (1.1% classified) misclassified) misclassified) double 3 273 — blue (1.1% (98.9% misclassified) classified) white 6 — 270 (2.2% (97.8% misclassified) classified)

2) White, Single Blue and Double Blue; SD Model; Visible Light/Limited NIR

As it is apparent from table 18, a good classification can also be achieved in case the VIS/limited NIR range is used. Hence, this model allows to clearly distinguish between blue and white kernels with approximately 99% accuracy as well. VIS/NIR limited allows to distinguish between white, single blue and double blue kernels with a high precision.

TABLE 18 White/single blue/double blue classification confusion matrix of SD model for VIS/NIR limited range Predicted class single blue double blue white True single 266 7 3 class blue (96.4% (2.5% (1.1% classified) misclassified) misclassified) double 3 273 — blue (1.1% (98.9% misclassified) classified) white 6 — 270 (2.2% (97.8% misclassified) classified)

3) White, Single Blue and Double Blue; SD Model; Complete NIR

However, as it is apparent from table 19, the classification accuracy decreases when using the complete NIR range only. This model shows a performance of 82.2% classified blue (comprising single blue and double blue) seeds.

TABLE 19 White/single blue/double blue classification confusion matrix of SD model for NIR complete range Predicted class single blue double blue white True single 201 26 49 class blue (72.8% (9.4% (17.8% classified) misclassified) misclassified) double 46 217 13 blue (16.7% (78.6% (4.7% misclassified) classified) misclassified) white 37 3 236 (13.4% (1.1% (85.5% misclassified) misclassified) classified)

4) White, Single Blue and Double Blue; SD Model; Artificial RGB

Similarly, when artificial RGB is used, the classification accuracy is also decreased, which is apparent from table 20. Although the model allows to clearly distinguish between white and blue (comprising single blue and double blue) kernels since 97.8% blue kernels are classified, it does not clearly distinguish between single blue and double kernels since 42.8% of double blue kernels are misclassified.

TABLE 20 White/single blue/double blue classification confusion matrix of SD model for artificial RGB Predicted class single blue double blue white True single 164 106 6 class blue (59.4% (38.4% (2.2% classified) misclassified) misclassified) double 118 158 — blue (42.8% (57.2% misclassified) classified) white 6 3 267 (2.2% (1.1% (96.7% misclassified) misclassified) classified)

Based on the foregoing results, it is apparent that most of the relevant spectral information is in the VIS/NIR region. To clearly separate blue (comprising single blue and double blue) and white seeds, an RGB sensor will be principally sufficient. However, when it is necessary to distinguish between single blue and double blue kernels, which requires to consider the anthocyanin concentration, an RGB sensor does not deliver sufficient information. Accordingly, most of the spectral information about the anthocyanin concentration is in the range of 625 nm to 900 nm.

In other words, in order to achieve a clear separation between single blue and double blue kernels, receiving information in the wavelength range between 625 nm and 900 nm is crucial.

FIG. 4 shows the result of an PLS-R model that is used to predict the anthocyanin concentration, wherein a TAC has been determined from the kernels by performing measurements using the different spectrometers mentioned above.

As it is apparent from FIG. 4, the result shows a good correlation between predicted and reference values of a total anthocyanin content (TAC), which is indicated by a high R2 value of 0.91 in cross validation. The root mean square error of cross validation is 17.37 ug/mg. Thus, the PLS-R model allows to predict the anthocyanin concentration with an error of 17.37 and allows to distinguish the kernels according to the actual anthocyanin content.

FIG. 5 shows the x-loading of component 1 indicating the spectral information distribution for the different wavelengths. The more loading (indicated using the vertical axis), the more relevant spectral information is contained at a specific wavelength (indicated using the horizontal axis). According to the loading plot shown in FIG. 5, the absorption peak of the anthocyanin is at around 720 nm.

Accordingly, the results from the classification models as well as from the PLS-R model show that the relevant spectral information to predict the anthocyanin content is between 600 nm and 900 nm, having a peak at approximately 720 nm.

FIG. 6 schematically illustrates at which stages of a seeding/harvesting cycle the present invention may be applied. In step S21, blue seed is planted during a pre-basic seed production. This results in a mix of blue and white seed being harvested. Here, the blue seed, which can be replanted for blue-white seed production, is of particular interest. In order to effectively categorize seed in the mix, the presented method for categorizing/sorting seeds is applied in step S22 and hence, a seed sorting is performed. Subsequently, white seed and double blue seed are discarded in steps S23a and S23b, respectively.

However, according to step S24, the single blue seed is kept and re-planted for blue-white basic seed production, which—again—results in a mix of blue and white seed being harvested. Here, the white seed is of interest, which will grow into male sterile seed and become the female component of hybrid seed production. Thus, a further sorting is performed in step S25 by using the inventive method for categorizing/sorting seeds.

Subsequently, blue seed, i.e. single and double blue seed, is discarded in step S26. In contrast, according to step S27, white seed is kept and planted for hybrid seed production, e.g. for the actual production of wheat.

In FIG. 6, the sorting in at least one of the steps S22 and S25 may be performed solely based on NIR spectroscopy but also based on a combination of NIR spectroscopy and an imaging in the visual light spectrum. Further, in at least one of the steps S22 and S25, the sorting may comprise more than one categorization cycle, wherein a categorization cycle includes NIR spectroscopy or NIR spectroscopy combined with a visual separation.

REFERENCE SIGNS

- S11 Provide sample
- S12 Obtain NIR spectrum
- S13 Identify signal associated with organic colorant
- S14 Determine presence of organic colorant
- S15 Determine amount of organic colorant
- S16 Normalize amount of organic colorant
- S17 Compare normalized amount of organic colorant
- S18 Categorize bulk sample
- S19 Categorize subset of sample
- S21 Plant blue seed
- S22 Categorize/sort seeds
- S23a Discard white seed
- S23b Discard double blue seed
- S24 Plant single blue seed
- S25 Categorize/sort seeds
- S26 Discard single and double blue seed
- S27 Plant white seed

Claims

1. A method for categorizing/sorting seeds, the method comprising the steps of:

providing (S11) a sample including at least one seed;

obtaining (S12) a near infrared, NIR, spectrum of at least a subset of the sample;

determining (S14) presence of an organic colorant in at least the subset of the sample based on the obtained NIR spectrum; and

categorizing/sorting (S19) at least the subset of the sample based on the determination.

2. The method according to claim 1, wherein the organic colorant comprises at least one colorant of the group of betalains, carotenoids, anthocyanins, flavonoids, anthraquinone and/or chlorophylls.

3. The method according to claim 1, further comprising the step of identifying (S13) a signal associated with the organic colorant in the obtained NIR spectrum, wherein the categorizing/sorting of at least the subset of the sample is based on the identified signal.

4. The method according to claim 3, wherein the identified signal is in the range of 15500 cm-1 to 400 cm-1.

5. The method according to claim 3, wherein the identified signal is in one of the following ranges: 25000 cm−1 to 3597 cm−1 (400-2780 nm) comprising sub-ranges 25000 cm−1 to 11764 cm−1 (400-850 nm) and 11111 cm−1 to 3597 cm−1 (900-2780 nm) as well as 16000 cm−1 to 11111 cm−1 (625-900 nm), 15385 cm−1 to 10526 cm−1 (650-950 nm), 8547 cm−1 to 8000 cm−1 (1170-1250 nm), 7519 cm−1 to 7092 cm−1 (1330-1410 nm), 6250 cm−1 to 5917 cm−1 (1600-1690 nm) and 5263 cm−1 to 4167 cm−1 (1900-2400 nm).

6. The method according to claim 1, further comprising the step of: determining (S15) an amount of the organic colorant in at least the subset of the sample based on the obtained NIR spectrum, wherein the categorizing/sorting (S19) of at least the subset of the sample is based on the determined amount of the organic colorant.

7. The method according to claim 6, wherein the NIR spectrum is obtained for a bulk sample, the method further comprising the steps of:

normalizing (S16) the determined amount of the organic colorant with respect to a size of the amount bulk sample;

comparing (S17) the normalized amount of organic colorant with a predefined threshold; and

categorizing/sorting (S18) the bulk sample based on the comparison.

8. The method according to claim 6, comprising the steps of:

obtaining a first NIR spectrum of a first subset of the sample and a second NIR spectrum of a second subset of the sample;

determining presence of the organic colorant in the first subset by classifying the first NIR spectrum and in the second subset by classifying the second NIR spectrum;

categorizing/sorting the first subset and the second subset based on the classification,

wherein the classification of the NIR spectrum is performed using at least one of a principal component analysis (PCA), a partial-least-squares-regression (PLS-R) or partial least squares discriminant analysis (PLS-DA) model based on labeled spectra, a trained support vector machine classification (SVM-C) or support vector machine regression (SVM-R) model, a trained artificial neural network (ANN), or a k-nearest neighbors algorithm (k-NN).

9. The method of claim 7, further comprising the step of: sorting the first subset and the second subset based on the classification, wherein a subset of the sample consists of at least one seed.

10. The method according to claim 9, further comprising at least one of the steps of: obtaining an image of at least the subset of the sample using at least one camera, wherein the presence of an organic colorant in at least the subset of the sample is determined based on the obtained NIR spectrum and based on the image; determining presence of an organic colorant using an X-ray based analysis; and determining presence of an organic colorant using a Raman spectroscopy analysis.

11. An apparatus for categorizing/sorting seeds comprising: a near infrared, NIR, spectrometer, including a light source configured to emit light with a wavelength in between 650 nm and 2500 nm; a detector configured to detect a NIR spectrum in a range of 15500 cm−1 to 400 cm−1; a sample holder configured to hold a sample relative to the light source and the detector to perform NIR measurements in a transmission and/or reflection mode; and means adapted to execute the steps of the method of claim 1.

12. The apparatus according to claim 11, further comprising an image sensor configured to obtain an image of at least the subset of the sample and means adapted to execute the steps of;

obtaining an image of at least the subset of the sample using at least one camera, wherein the presence of an organic colorant in at least the subset of the sample is determined based on the obtained NIR spectrum and based on the image; and/or determining presence of an organic colorant using an X-ray based analysis; and determining presence of an organic colorant using a Raman spectroscopy analysis;

and/or a feeder system configured to feed at least the subset of the sample to the NIR spectrometer.

13. A mobile device comprising the apparatus according to claim 11.

14-15. (canceled)

16. The method according to claim 7, comprising the steps of:

obtaining a first NIR spectrum of a first subset of the sample and a second NIR spectrum of a second subset of the sample;

determining presence of the organic colorant in the first subset by classifying the first NIR spectrum and in the second subset by classifying the second NIR spectrum;

categorizing/sorting the first subset and the second subset based on the classification,

wherein the classification of the NIR spectrum is performed using at least one of a principal component analysis (PCA), a partial-least-squares-regression (PLS-R) or partial least squares discriminant analysis (PLS-DA) model based on labeled spectra, a trained support vector machine classification (SVM-C) or support vector machine regression (SVM-R) model, a trained artificial neural network (ANN), or a k-nearest neighbors algorithm (k-NN).

17. The method of claim 16, further comprising the step of: sorting the first subset and the second subset based on the classification, wherein a subset of the sample consists of at least one seed.

18. The method according to claim 17, further comprising at least one of the steps of: obtaining an image of at least the subset of the sample using at least one camera, wherein the presence of an organic colorant in at least the subset of the sample is determined based on the obtained NIR spectrum and based on the image; determining presence of an organic colorant using an X-ray based analysis; and determining presence of an organic colorant using a Raman spectroscopy analysis.

19. The apparatus according to claim 11, wherein the feeder system is configured to feed at least the subset of the sample to the NIR spectrometer and the image sensor.

20. The apparatus according to claim 11, further comprising means adapted to execute the steps of:

determining (S15) an amount of the organic colorant in at least the subset of the sample based on the obtained NIR spectrum, wherein the categorizing/sorting (S19) of at least the subset of the sample is based on the determined amount of the organic colorant, and wherein the NIR spectrum is obtained for a bulk sample;

normalizing (S16) the determined amount of the organic colorant with respect to a size of the amount bulk sample;

comparing (S17) the normalized amount of organic colorant with a predefined threshold; and

categorizing/sorting (S18) the bulk sample based on the comparison

obtaining a first NIR spectrum of a first subset of the sample and a second NIR spectrum of a second subset of the sample;

determining presence of the organic colorant in the first subset by classifying the first NIR spectrum and in the second subset by classifying the second NIR spectrum;

categorizing/sorting the first subset and the second subset based on the classification, wherein the classification of the NIR spectrum is performed using at least one of a principal component analysis (PCA), a partial-least-squares-regression (PLS-R) or partial least squares discriminant analysis (PLS-DA) model based on labeled spectra, a trained support vector machine classification (SVM-C) or support vector machine regression (SVM-R) model, a trained artificial neural network (ANN), or a k-nearest neighbors algorithm (k-NN); and

sorting the first subset and the second subset based on the classification, wherein a subset of the sample consists of at least one seed.