Identification of Grape Varieties via Digital Leaf Image ... - Oiv2010.ge
Identification of Grape Varieties via Digital Leaf Image ... - Oiv2010.ge
Identification of Grape Varieties via Digital Leaf Image ... - Oiv2010.ge
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Identification</strong> <strong>of</strong> <strong>Grape</strong> <strong>Varieties</strong> <strong>via</strong> <strong>Digital</strong> <strong>Leaf</strong> <strong>Image</strong> Processing by Computer<br />
1 J. ZHANG, 2 P. YANNE * and 3 H. LI<br />
1,2 College <strong>of</strong> Information Engineering, Northwest A&F University<br />
Yangling, Shaanxi, China<br />
pyanne@nwsuaf.edu.cn (corresponding author) *<br />
3 College <strong>of</strong> Oenology, Northwest A&F University<br />
lihuawine@nwsuaf.edu.cn<br />
ABSTRACT<br />
<strong>Grape</strong> variety identification is <strong>of</strong> great significance for resource statistics, new specie<br />
detection and protection <strong>of</strong> genetic resources. Based on the classical ampelographic grape<br />
identification method combined with machine learning and pattern recognition techniques in<br />
computer science, we proposed a new cheap and fast identification method <strong>via</strong> leaf image<br />
processing. We demonstrated its feasibility <strong>via</strong> the implementation <strong>of</strong> a prototype which could<br />
classify 354 leaf images belonging to 20 varieties with an accuracy rate <strong>of</strong> 87%. Our<br />
techniques can be applied to computer aided diagnosis <strong>of</strong> grape leaf diseases and new variety<br />
discovery, as well as to quantify the classical ampelographic identification method. We<br />
propose further work to transform this new method from a prototype into a practical s<strong>of</strong>tware<br />
product.<br />
Keywords: <strong>Grape</strong> variety identification; Ampelography; Pattern recognition; <strong>Image</strong><br />
processing; Hu's moment invariants<br />
RESUME<br />
L‟identification des cépages est très importante pour les statistiques de ressources, la<br />
détection des nouvelles espèces et la protection des ressources génétiques. Basés sur la<br />
méthode classique de l‟identification ampélographique et grâce aux récents progrès en<br />
informatique, notamment la reconnaissance des formes et l‟apprentissage automatique, nous<br />
proposons une nouvelle méthode efficace et automatique par ordinateur pour l‟identification<br />
des cépages. Nous démontrons sa faisabilité <strong>via</strong> la réalisation d‟un prototype qui a pu<br />
classifier 354 fichiers de feuilles, appartenant à20 cépages avec une précision de 87%. Notre<br />
technique peut s‟appliquer au diagnostic des maladies de vigne, à la découverte de nouvelles<br />
espèces et àla quantification de la méthode classique de l‟identification ampélographique.<br />
Nous suggérons des directions de recherche future afin de transformer notre prototype en<br />
produit logiciel.
I. Introduction<br />
There are over 10,000 grape varieties throughout the world. About 3000 <strong>of</strong> them are widely<br />
cultivated in production and many are wine varieties [Zhai, 2001]. <strong>Grape</strong> variety<br />
identification is <strong>of</strong> great significance for resource statistics, new specie detection and<br />
protection <strong>of</strong> genetic resources. OIV‟s Strategic Framework includes the task for recognising<br />
new viticultural varieties [OIV, 2005].<br />
The classical identification method is based on ampelography [Galet,1990; Tassie and<br />
Blieschke, 2008]. Some new methods have been also developed recently, using different<br />
approaches such as DNA molecular genetic marker [Bower et al., 1993; Zhang et al., 1996;<br />
Testier et al., 1999], pollen morphology [Wang and Li, 2000], anthocyanin analysis<br />
[Wendelin and Barna, 1994], etc. All these methods need expert intervention and are hence<br />
quite expensive. Some <strong>of</strong> them need special devices and take a long time. Today, computer<br />
technologies have a wide range <strong>of</strong> applications in many fields including grape production.<br />
There are many successful examples where the computer has been used for image processing<br />
[Li et al., 2007; Barbu, 2009] and identification <strong>of</strong> plant species [Ye et al., 2004] based on<br />
pattern recognition. We look for a new method for identifying grape varieties combining the<br />
computer techniques and the classical ampelography. Based on the processing <strong>of</strong> digital grape<br />
leaf image, this new method would be rapid, efficient and nearly automatic with little or even<br />
no human intervention. Our research objective is to develop a s<strong>of</strong>tware product, available on<br />
web, which will be able to tell a browser the variety <strong>of</strong> the grape leaf image that s/he uploads.<br />
The ampelographic identification <strong>of</strong> grape varieties is based on the observation <strong>of</strong> features<br />
on some organs <strong>of</strong> a grape, such as flower, berry, shoot and leaf. OIV has produced 2 editions<br />
[OIV, 1983; OIV, 2009] <strong>of</strong> the document “OIV descriptor list for grape varieties and Vitis<br />
species” which defines as a standard the ampelographic characteristics for the identification <strong>of</strong><br />
Vitis varieties and species. Using the 128 characteristics selected by [OIV, 1983] where each<br />
characteristics is signed a code and may take values from 1 to 9 for all grapes, [OIV, 2000]<br />
describes 250 wine grape varieties <strong>of</strong> its member states, by assigning a values to descriptor<br />
codes for each variety . For a given grape sample, if each its code has the same value as the<br />
variety V <strong>of</strong> the 250 in [OIV, 2000], this grape‟s variety is classified as V. All ampelographic<br />
experts agreed that the features <strong>of</strong> mature leaf are the most determinate for the varieties<br />
identification. For the 128 codes, 35 <strong>of</strong> them are for leaf and 29 for mature leaf. [OIV, 2009]<br />
adds another 18 codes from 601 to 618 on mature leaf. On the “Primary descriptor priority list”<br />
<strong>of</strong> 14 codes, there are 9 on leaf.<br />
In our new approach based, the main idea is to let computer calculate all the code values<br />
instead <strong>of</strong> measuring them by a human being. Then the computer can compare these values<br />
against the known ones as in [OIV, 2000] to find the right variety. However, on one hand, it‟s<br />
not easy to calculate some code values and on the other hand, it is not necessary to know all<br />
these values for the identification purpose. Furthermore, some features not selected by [OIV,<br />
2009] may also contribute to distinguish or identify varieties, for example, Hu's moment<br />
invariants for an image [Hu, 1962].
A digital image is composed <strong>of</strong> a pixel f(x, y) matrix where (x, y) is the index or coordinator<br />
<strong>of</strong> the matrix. Each pixel f(x, y) represents an image dot and is described by a series <strong>of</strong><br />
numbers. For a binary image like a photo in an old news paper, a pixel f(x, y) is either 0 for<br />
white or 1 for black. For a colour image taken by a digital camera, a pixel may be a<br />
combination <strong>of</strong> three basic colours with different densities. Hu defined 7 moment invariants<br />
for any digital image. Each invariant can be easily calculated as the function <strong>of</strong> its pixels f(x,<br />
y). The 7 invariants‟ values are nearly independent <strong>of</strong> the rotation, position or size <strong>of</strong> the<br />
image <strong>of</strong> the matrix. They have been successfully used in computer pattern recognition<br />
applications such as car registration number [Liu and Lu, 2008], static hand gesture [Liu et al.,<br />
2008], tiger variety [Xu and Qi, 2009], human face [Gan and Zhang, 2002] and corn leaf<br />
disease recognition [Shen et al., 2008]. Yanhua YE and Chun CHEN <strong>of</strong> Hong Kong<br />
Polytechnic University have developed a Computer Plant Species Recognition System,<br />
CPSRS [Ye et al., 2004] which could provide a convenient and efficient way to search and<br />
identify plant species from a digital image file.<br />
Departing from the works mentioned above, which consist <strong>of</strong> the cornerstone <strong>of</strong> our method,<br />
we present our method in detail and experiment it by the implementing a s<strong>of</strong>tware prototype<br />
on an ordinary personal computer. We then analyze our experiment results and discuss on<br />
some choices that have been made, the remaining problems and possible improvements as<br />
well as applications. We conclude on the feasibility <strong>of</strong> our new method and point out the<br />
future work.<br />
II. Materials and Methods<br />
Our identification method is constructed on 4 steps: 1) collect typical mature grape sample<br />
leaves for the varieties we want to identify, 2) scan the leaves into digital image files, 3) select<br />
a set <strong>of</strong> characteristics or features useful for identification and computable by computer from<br />
the images, 4) build a s<strong>of</strong>tware classifier based on the features calculated from the sample<br />
files.<br />
1) Collect mature sample leaves<br />
Following the requirements <strong>of</strong> OIV [OIV, 2009], for each variety, we collected about 10<br />
mature leaves from different shoots at the third middle level, between berry set and veraison<br />
time. These leaves were collected from the grape variety culture field <strong>of</strong> College <strong>of</strong> Oenology,<br />
Northwest A&F University in Yangling, Shaanxi, China. There are a total <strong>of</strong> 500 leaves<br />
belonging to 3 wild local varieties and 47 cultured ones including Sauvignon, Riesling,<br />
Traminer, Sémillon, Chenin Blanc, Ugni Blanc, Müller-Thurgau, Cabernet Sauvignon,<br />
Carignan, Gamay, Syrah, Muscat, etc.<br />
2) Obtain digital leaf files<br />
For each leaf, we scanned both leaf sides with the default parameters <strong>of</strong> 3 A4 size ordinary<br />
scanners. We got a total <strong>of</strong> 1073 colour leaf image files at the resolution <strong>of</strong> 300 DPI (Dot Per<br />
Inch). In our s<strong>of</strong>tware prototype, we used 354 leaf files <strong>of</strong> 20 varieties.
3) Select features for identification<br />
Naturally, the 47 features on mature leaves, coded by OIV [OIV, 2009] have been<br />
considered. More researches have to be done for calculating some features, e.g. “density <strong>of</strong><br />
prostrate hairs between the main veins on lower side <strong>of</strong> blade”, OIV code 84. We have found<br />
a way to calculate some <strong>of</strong> them, including size and circumference <strong>of</strong> blade, length <strong>of</strong> petiole,<br />
length <strong>of</strong> veins, etc. We select also some features, neither considered by OIV nor<br />
ampelography, which are easy to calculate and useful for identification, e.g. Hu‟s 7 moment<br />
invariants. In order to quickly build our prototype, with the criteria <strong>of</strong> both computable and<br />
useful, we finally selected the size and circumference <strong>of</strong> blade and Hu‟s 7 moment invariants<br />
to form the feature set or vector <strong>of</strong> 9 dimensions.<br />
4) Build a s<strong>of</strong>tware classifier<br />
Let‟s explain the mathematical basis <strong>of</strong> our method. Each leaf image is represented by a<br />
feature vector Lj= (fj1, … ,fj9) where fji is a real number. Such vector Lj is a point in the 9<br />
dimension feature space in mathematics. We imagine the Euclidean distance D L 1 , L 2 =<br />
f 11 − f 12 2 + ⋯ + f 19 − f 29 2 between 2 grape leaves L1 and L2 <strong>of</strong> the same variety should<br />
be in average smaller than that <strong>of</strong> 2 different varieties. For the variety i, its mass centre<br />
Ci=( ci1, …, ci9) Where cim=<br />
n<br />
f jm<br />
j=1 , fjm is the m-th feature <strong>of</strong> the j-th leaf sample Lji <strong>of</strong> the<br />
n<br />
variety Vi, and its radius R i =maximum <strong>of</strong> D(Ci, Lji) for j=1 to n. For a given leaf j‟s vector<br />
Lj, if we only find one variety S which can satisfy D(Lj, C S ) ≤ R S , we can conclude that the<br />
leaf Lj belongs to the variety S.<br />
Unfortunately, for the 354 vectors <strong>of</strong> 20 varieties, their mass centres are so close and their<br />
radiuses are so big that the 9 dimension sphere <strong>of</strong> a variety S i at centre C i with radius R i<br />
has intersection with the spheres <strong>of</strong> other varieties. To reduce the space occupied by each<br />
variety, we improve the above method by detecting the 9 dimension cube which inscribed the<br />
sphere. This can be done by finding the value range <strong>of</strong> the vector‟s each dimension for every<br />
variety. Some <strong>of</strong> leaves still cannot be distinguished. For this case, based on the fact that the<br />
values <strong>of</strong> each dimension for each variety should satisfy the normal distribution, we introduce<br />
the probability <strong>of</strong> a leaf L belonging to a variety S by the following formula:<br />
P L, S = 1 −<br />
9 2∗dL j<br />
j=1 ∗ 1 , dL j = L j − 1 r S,j 9 2<br />
∗ r(S, j) , r(S, j)=max(fsj)–min(fsj)<br />
where max/min(fsj) means the maximum/minimum value <strong>of</strong> j-th dimension for all samples<br />
<strong>of</strong> the variety S.<br />
Finally, we build our classifier with the following algorithm:
1) Find the vector Li <strong>of</strong> a leaf image i and compare Li (fij) with the value range max/min(fsj)<br />
<strong>of</strong> all varieties S (S=1 to 20 and j= 1 to 9).<br />
2) If max(fsj)>=fij>=min(fsj) holds for only one variety S with j=1 to 9, then the leaf Li<br />
belongs to the variety S.<br />
3) Else, leaf Li satisfies the relation in step 2) for m varieties S 1, ⋯ , S m, . We find out all the<br />
probabilities P(L i, , S j ). Li belongs to the variety S j with the probability P(L i, S j ) which<br />
is the maximum <strong>of</strong> P L i, S j for j=1 to m.<br />
III. Results and Discussion<br />
We developed a s<strong>of</strong>tware prototype in Matlab implementing the above algorithm to verify<br />
our method. The following figure 1 shows an execution <strong>of</strong> our prototype under Matlab<br />
environment. The user selects a leaf image and then asks for the classification. The prototype<br />
displays the image in 3 modes and prints out the variety name:<br />
Figure 1. Execution <strong>of</strong> classification prototype<br />
We have tested our algorithm to classify the 354 images files belonging to 20 varieties. The<br />
correct classification rate is <strong>of</strong> 87%. This rate is obtained by calling the classifier on all the<br />
354 files and count the present <strong>of</strong> corrected classified ones.
The accuracy decreases when the number <strong>of</strong> varieties increases. We may resolve this<br />
problem by increasing the feature vector‟s dimensions, i.e. to find more features, and use<br />
better classification algorithm such as SVM [Wu et al., 2008]. The latter is a well known<br />
machine learning method for classification. It is in fact the capacity <strong>of</strong> computer to learn from<br />
examples. After having trained it by giving many leaf samples belonging to each grape<br />
variety, the s<strong>of</strong>tware can decide to which variety a new leaf belongs to.<br />
Our prototype can only classify scanned images. In order to classify digital camera images,<br />
we have to consider factors such as photography distance and focus. On the other hand, a<br />
camera may take photos from different angles. This may allow us to distinguish the prostrate<br />
hairs <strong>of</strong> a leaf from the erect hairs. However, this problem can be better resolved by a 3<br />
dimensions camera or scanner. Based on the 3D image technique, we can more easily<br />
calculate other leaf features such as the pr<strong>of</strong>ile <strong>of</strong> leaf (OIV code 74 [OIV, 2009]). Light wave<br />
lengths other than visible ones, such as infrared, microwave and terahertz [Lu, 2002; Xing<br />
and Baerdemaeker, 2005] etc. can also be used to obtain digital leaf images. These images<br />
should supply complementary features, useful for the variety identification. By combining<br />
these mentioned techniques, we are expected to be able to calculate all the 45 codes selected<br />
by OIV and hence to identify all grape varieties based on digital image processing by<br />
computer.<br />
This new identification method may not only simplify the identification procedure, but its<br />
techniques can also improve the classical ampelographic identification method. The current<br />
OIV Descriptor List [OIV, 2009] uses the code values in a qualitative way. For example, the<br />
code 65 for the size <strong>of</strong> blade takes values 1, 3, 5, 7 and 9 which means respectively, very<br />
small, small, medium, large and very large. Our method can calculate the size <strong>of</strong> blade in<br />
inch 2 or cm 2 effectively and automatically. We may do this for all quantifiable codes <strong>of</strong> all<br />
known varieties. With these quantitive values, we may give a value range for each current<br />
qualitative value on one hand, and check if the code values for the 250 varieties in [OIV, 2000]<br />
are coherent. These techniques can be used to detect new variety and guess the parent<br />
varieties <strong>of</strong> a new hybrid variety. In fact, the feature vector <strong>of</strong> a new variety will not belong to<br />
any known varieties, but a hybrid one should be close to its parents‟ ones. Computers<br />
s<strong>of</strong>tware can easily find out all the similar varieties and sort them according to the similitude.<br />
IV. Conclusions<br />
Based on the classical ampelographic grape identification method combined with machine<br />
learning and pattern recognition techniques in computer science, we proposed a new cheap<br />
and fast identification method <strong>via</strong> leaf image processing. We demonstrated its feasibility by<br />
implementing a s<strong>of</strong>tware prototype which could classify 354 leaf images belonging to 20<br />
varieties with an accuracy <strong>of</strong> 87%.<br />
We are continuing our research to increase both the number <strong>of</strong> grape varieties and the<br />
accuracy by calculating more features from digital images and improving the classification<br />
algorithms.
Acknowledgments<br />
We would first express our gratitude to Dr Jean-Claude Ruf, head <strong>of</strong> OIV‟s vitiviniculture<br />
science and techniques department for his advices during his visit to our university at the<br />
occasion <strong>of</strong> the 6th International Symposium on Viticulture and Enology, in Yangling,<br />
Shaanxi, China. Mrs A. Tsioli, head <strong>of</strong> OIV‟s viticulture unity, supplies us with a lot <strong>of</strong> useful<br />
information. The whole research team for the project <strong>of</strong> grape variety identification in our<br />
university contributed to this work, especially, Dr JF NING for his suggestion <strong>of</strong> adopting<br />
Hu's moment invariants, Dr C CAI as well as his students for the leaf sample collecting and<br />
image scanning and Dr Y ZHANG for his encouragement. At last, we would thank the<br />
students <strong>of</strong> our team, ZG FENG, WJ HAN, Z SONG, H ZHANG and Y ZHANG.<br />
Bibliography<br />
Barbu T., 2009. Content-based image retrieval using Gabor filtering. In: Proceedings - 20th<br />
International Workshop on Database and Expert Systems Applications. New York: Institute<br />
<strong>of</strong> Electrical and Electronics Engineers Inc. 2009: 236-240.<br />
Bower J.E., Bandman E.B., Meredith C.P., 1993. DNA fingerprinting characterization <strong>of</strong><br />
some wine grape cultivars. American Society for Enology and Viticulture, 44: 266-271.<br />
Galet P., 1990. French grapevine varieties and vineyards. Volume 2. The French<br />
ampelography. 2nd edition. Montpellier.<br />
Gan J.Y., Zhang Y.W., 2002. Face Recognition Based on Moment Invariants and Neural<br />
Networks. Computer Engineering and Applications, 38(7): 53-57.<br />
Hu M.K., 1962. Visual Pattern Recognition by Moment Invariants. IRE Transaction<br />
Information Theory, 8(2): 179-187.<br />
Li Y., Chi Z.R., Feng D.D., 2007. <strong>Leaf</strong> vein extraction using independent component analysis.<br />
In: 2006 IEEE International Conference on Systems, Man and Cybernetics. New York:<br />
Institute <strong>of</strong> Electrical and Electronics Engineers Inc. 5: 3890-3894.<br />
Liu J.M., Lu K., 2008. The identification <strong>of</strong> car logo based on Hu‟s invariant moments.<br />
scientific and technical information (Academic Edition), 36: 76-81.<br />
Liu Y., Gan Z.J., Sun Y., 2008. Static Hand Gesture Recognition and its Application based on<br />
Support Vector Machines. In: S<strong>of</strong>tware Engineering, Artificial Intelligence, Networking,<br />
and Parallel/Distributed Computing. Ninth ACIS International Conference, Phuket:<br />
2008(9): 517-521.<br />
Lu R., 2002. Detection <strong>of</strong> bruises on apples using near-infrared hyperspectral imaging.<br />
Information & Electrical Technologies Division <strong>of</strong> ASAE, vol.46(2): 1-8.
OIV, 1983. 1st Edtion <strong>of</strong> the OIV Descriptor list for grape varieties and Vitis species.<br />
http://www.oiv.int.<br />
OIV, 2000. Description <strong>of</strong> world vine varieties. http://www.oiv.int.<br />
OIV, 2005. RESOLUTION AG 1/2005, OIV Strategic Framework 2005-2008. .<br />
http://www.oiv.int.<br />
OIV, 2009. 2nd Edtion <strong>of</strong> the OIV Descriptor list for grape varieties and Vitis species.<br />
http://www.oiv.int.<br />
Shen W.Z., Wu Y., Chen Z.L., Wei H.D., 2008. Grading method <strong>of</strong> leaf spot disease based on<br />
image processing. In: Proceedings - International Conference on Computer Science and<br />
S<strong>of</strong>tware Engineering. NJ: IEEE Computer Society. Vol. (6): 491-494.<br />
Tassie L., Blieschke N., 2008. Ampelography: do you know what variety you are planting in<br />
your vineyard or nursery. Australian & New Zealand <strong>Grape</strong> grower & Winemaker: national<br />
journal <strong>of</strong> the grape and wine industry, No.537.<br />
Testier C., Daivd J., This P., Boursiquot J. M., Charrier A., 1999. Optimization <strong>of</strong> the choice<br />
<strong>of</strong> molecular markers for varietal identification in Vitis Vinifera L. Theoretical and Applied<br />
Genetics, 98(1): 171-177.<br />
Wang X.D., Li C.L., 2000. A study on pollen morphology <strong>of</strong> the genus Vitis L. Acta<br />
Phytotaxonomica Sinica, 38(1): 43-52.<br />
Wu D.K., Xie C.Y., Ma C.W., 2008. The SVM classification leafminer-infected leaves based<br />
on fractal dimension. In: 2008 IEEE International Conference on Cybernetics and<br />
Intelligent Systems, CIS 2008. NJ: Inst. <strong>of</strong> Elec. and Elec. Eng. Computer Society. 2008:<br />
147-151.<br />
Xing J., Baerdemaeker J.D., 2005. Bruise detection on „Jonagold‟ apples using hyperspectral<br />
imaging. Postharvest Biology and Technology, 37(2005): 152-162.<br />
Xu Q.J., Qi D.W., 2009. Parameters for Texture Feature <strong>of</strong> Panthera tigers altaica Based on<br />
Gray Level Co-occurrence Matrix. Journal <strong>of</strong> Northeast Forestry University, 37(7):<br />
125-130.<br />
Ye Y.H., Chen C., Li C.T., Fu H., Chi Z.R., 2004. A Computerized Plant Species Recognition<br />
System. In: Proceedings <strong>of</strong> 2004 International Symposium on Intelligent Multimedia,<br />
Video and Speech Processing. Hong Kong: Institute <strong>of</strong> Electrical and Electronics<br />
Engineers Inc. 2004: 723-726.<br />
Zhai H., 2001. The main grape varieties used for processing. In: Wine <strong>Grape</strong> Growing and<br />
Processing Techniques. Yang T.Q., 1st edition. Beijing: China Agriculture Press. 1: 48.<br />
Zhang L.P., Lin B.N., Shen D.X., 1996. The extraction, purification and identification <strong>of</strong><br />
RFLP <strong>of</strong> the chromosomal DNA <strong>of</strong> <strong>Grape</strong>. Journal <strong>of</strong> Fruit Science, 13(2): 71-74.