Scientia Agricultura Sinica ›› 2007, Vol. 40 ›› Issue (10): 2119-2127 .

• CROP GENETICS & BREEDING·GERMPLASM RESOURCES •     Next Articles

Comparison among Gene Supervised Clustering Methods for DNA Microarray Expression Data

  

  1. 扬州大学江苏省遗传生理重点实验室
  • Received:2006-09-04 Revised:1900-01-01 Online:2007-10-10 Published:2007-10-10

Abstract: Several typical supervised clustering methods, Gaussian mixture model-based supervised clustering (GMM), K-Nearest-Neighbor (KNN), binary support vector machines (SVMs) and multicategory support vector machines (MC-SVMs), were employed to classify the computer simulation data, yeast cell cycle microarray data and 60 human cancer cell lines (NCI-60) microarray data. False positive, false negative, true positive, true negative and clustering accuracy were compared among these methods. The results are as follows. (1) For classify thousands of gene expression data, the performances of two GMM methods have the maximal clustering accuracy and the least overall FP+FN error numbers based on the assumption that the whole set of microarray data is a finite mixture of multivariate Gaussian distributions. Furthermore, when the number of training sample is very small, the clustering accuracy of GMMⅡ method have superiority over GMMⅠ method. (2) In the general, the superior classification performance of the MC-SVMs are more robust and more practical, which are less sensitive to the curse of dimensionality and not only next to GMM method in clustering accuracy to thousands of gene expression data, but also more robustness to a small number of high-dimensional gene expression samples than other techniques. (3)Among MC-SVMs, in case of large sample sizes, OVO and DAGSVM perform better; In case of moderate sample sizes, five MC-SVMs methods perform very similar; Otherwise, OVR, WW and CS yield the better results when sample sizes are small. (4) A suggestion for the supervised clustering microarray data is that one should consider the data feature and experiment when choose an appropriate method. Two kinds of these methods should be trial calculation to obtain better clustering result.

Key words: Microarray, Supervised Clustering, K-Nearest-Neighbor, Support Vector Machines

[1] SHU JingTing, JI GaiGe, SHAN YanJu, ZHANG Ming, XIAO Qin, TU YunJie, SHENG ZhongWei, ZHANG Di, ZOU JianMin. Analysis of Differential Expression Genes Between Different Myofiber Types in Chicken Skeletal Muscle Based on Gene Expression Microarray [J]. Scientia Agricultura Sinica, 2017, 50(14): 2826-2836.
[2] SUN Ming-yue, ZHOU Jun, TAN Qiu-ping, FU Xi-ling, CHEN Xiu-de, LI Ling, GAO Dong-sheng. Analysis of Basic Leucine Zipper Genes and Their Expression During Bud Dormancy in Apple (Malus×domestica) [J]. Scientia Agricultura Sinica, 2016, 49(7): 1325-1345.
[3] SUN Li, DING Yu-duan, HE Yi-zhong, CHEN Ling-ling, CHENG Yun-jiang. Biomarker Sieving for Fruit Storage Life of Satsuma Mandarin (Citrus unshiu Marc.) [J]. Scientia Agricultura Sinica, 2016, 49(7): 1346-1359.
[4] QI Yun-xia, LIU Xiao-fang, ZHANG Ping, HE Xiao-long, XING Yu-mei, Dalai, Terigele, LIU Yong-bin, RONG Wei-heng. Differentially Expressed microRNAs Screening Between Ovaries of Sheep Producing Single Lamb and Twins [J]. Scientia Agricultura Sinica, 2015, 48(10): 2039-2048.
[5] SUN Wei, NI Rong, YIN Jin-Feng, DING Jia-Tong, ZHANG You-Fa, CHEN Ling, WU Wen-Zhong, ZHOU Hong. Screening Differentially Expressed Genes of Skin Tissue of Different Flowers Patterns of Hu Sheep [J]. Scientia Agricultura Sinica, 2013, 46(2): 376-384.
[6] ZHAO Ming-hui,SUN Jian,WANG Jia-yu,XU Hai,TANG Liang,CHEN Wen-fu
. Global Genome Expression Analysis of Photosynthesis-Related Genes Under Low Nitrogen Stress in Rice Flag Leaf
[J]. Scientia Agricultura Sinica, 2011, 44(1): 1-8 .
[7] CHEN Jun-ying,ZHANG Yan-min,LI Ming-jie,MA Ping-an,CUI Yan,CHEN Xin-jian
. A Preliminary Analysis of Genes Related to Signal Transduction During Dedifferentiation of Mature Embryo in Wheat
[J]. Scientia Agricultura Sinica, 2010, 43(5): 1083-1092 .
[8] SONG Wen-wen,LI Wen-bin,HAN Xue,GAO Mu-juan,WANG Ji-an
. Analysis of Gene Expression Profiles in Soybean Roots Under Drought Stress
[J]. Scientia Agricultura Sinica, 2010, 43(22): 4579-4586 .
[9] MIAO Wei-guo,WU Shu-wen,SONG Cong-feng,WANG Yu,GONG Xiao-chong,ZHANG Liang,WANG Jin-sheng
. Genome-wide Microarray Analysis of Helicoverpa armigera Larva Fed on Transgenic hpa1Xoo Cotton Leaves
[J]. Scientia Agricultura Sinica, 2010, 43(2): 313-321 .
[10] ZHAO Guo-hong,WANG Sheng,JIA Yin-hua,SUN Jun-ling,WANG Jie,DU Xiong-ming
. Genes Related to Fuzz Initiation and Development in Gossypium arboretum Identified by cDNA Microarray
[J]. Scientia Agricultura Sinica, 2010, 43(2): 430-437 .
[11] WEI Ke-su,CHENG Fang-min,DONG Hai-tao,ZHANG Qi-fang,LIU Kui-gang,CAO Zhen-zhen
.

Microarray Analysis of Gene Expression Profile Related to Grain Storage Metabolism in Rice Endosperms as Affected by High Temperature at Filling Stage

[J]. Scientia Agricultura Sinica, 2010, 43(1): 1-11 .
[12]

.

Analysis of Gene Expression Profiles in Tobacco Roots Under Osmotic Stress

[J]. Scientia Agricultura Sinica, 2009, 42(2): 460-468 .
[13] JIN Peng,HUANG Li-yu,WANG Di,WU Hui-min,ZHU Ling-hua,FU Bin-ying
. Expression Profiling of Rice AP2/EREBP Genes Responsive to Abiotic Stresses
[J]. Scientia Agricultura Sinica, 2009, 42(11): 3765-3773 .
[14] Shi-Qiangg GAO Chenyang He. Transcriptional profile of plant pathogenic bacteria revealed by DNA microarray analysis [J]. Scientia Agricultura Sinica, 2008, 41(5): 1341-1346 .
[15] ,,,,. Analysis of Differential Gene Expression Pattern in Brassica napus Hybrid Huayouza6 and Its Parents Using Arabidopsis cDNA Microarray [J]. Scientia Agricultura Sinica, 2006, 39(01): 23-28 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!