JIA-2018-09

1973 YANG Hai-long et al. Journal of Integrative Agriculture 2018, 17(9): 1972–1978 (Lam et al . 2010; Li et al . 2014), enables SNP markers to adequately represent the high density of whole-genome information. Furthermore, in recent years, the SNP chip array and particularly affordable next-generation sequencing (NGS) technology have triggered the large-scale, high- throughput sequencing of a huge number of individual crop plants (Davey et al . 2011). In addition, the methods and software for genotype and SNP calling have laid a solid foundation for further analysis of genotyping data (Nielsen et al . 2011). Owing to these features, SNP genotyping is one of the main methods used to detect the genetic diversity among maize inbred lines. To date, application tools for evaluating genetic diversity have provided us with multiple methods to understand the population relationship among hundreds of individual crop plants. For example, PowerMarker can process multiple DNA markers to cluster plant lines based on statistical methods, such as the F -statistics and differentiation test (Liu and Muse 2005). Structure software uses a Bayesian clustering model to analyze DNA markers such as SNPs to stratify crop populations (Pritchard et al . 2000). PLINK provides us with another method called complete-linkage hierarchical clustering and multidimensional scaling (MDS) to evaluate plant population diversity (Purcell et al . 2007). SNPhylo software (University of Georgia, USA) can manipulate huge SNP data via various methods, such as principal component analysis (PCA) and relatedness analysis (Lee et al . 2014). However, one straightforward presenting method for genetic diversity from huge SNP data is still not practical because of the hurdle of pre-handling complicated raw data and subsequent graphic display using the application tools mentioned above. In this present study, a simple method based on SNPhylo software was developed to address the problems mentioned above and to enhance the current genetic diversity analysis pipeline. We used Excel to mask the scaffold symbol of public SNP data that cannot be recognized by SNPhylo, adjusted the SNPhylo output tree file with MEGA software (Tamura et al . 2013), and edited the graphic file usingAdobe Illustrator CS6 (Adobe Systems Incorporated, USA). This study will provide us with another useful way of analyzing and displaying genotyping data as well as enabling us to speed up the identification of genetic diversity in crop species. 2. Materials and methods 2.1. Data source To illustrate our pipeline, we used the public maize hapmap file as the example. A set of 251 maize inbred lines was chosen from the collection of publicly available germplasms around the world, such as North America, Africa, Europe, and Asia (Appendix A, modified from Flint-Garcia et al . (2005)), which represent the current public breeding lines from temperate, subtropical, and tropical lines as well as popcorn and sweetcorn lines. This collection has been frequently used for maize research, such as genetic diversity and association analyses (Liu et al . 2003; Flint-Garcia et al . 2005; Cook et al . 2012). The Maize SNP data we used in this study were downloaded from the PANZEA Genotypes database (see the URL: cbsusrv04.tc.cornell.edu/users/ panzea/download.aspx?filegroupid=7) (Cook et al . 2012), and 251 samples were selected for analysis (Appendix B). 2.2. Software and analysis Data were analyzed on the Ubuntu Operating System (Linux, UK). SNP data were analyzed by SNPhylo software (chibba. pgml.uga.edu/snphylo/ ). Then, the Newick file, i.e., tree format file, was manipulated using MEGAsoftware (ver. 6.0). Finally, the modified file was edited using Adobe Illustrator CS6. The whole pipeline is presented in Fig. 1. 3. Results 3.1. First step: prepare tissues and SNP data calling When tissues are prepared and extracted, the next pressing step is to score SNP data. So far there are three main branches for SNP sequencing: PCR-based amplicon assay, Sampling and DNA extraction Sequencing or DNA chip SNP scoring raw data SNPhylo formatting and running Tree file editing Phylogenetic tree SNPhylo Fig. 1 A flowchart of phylogenetic tree construction. DNA is extracted from fresh maize leaf tissues. DNA from maize inbred lines are analyzed with a sequencing machine, and the hapmap raw data were edited by Excel software to be input into the SNPhylo script. Finally, the resulting tree file is edited by MEGA for constructing phylogenetic trees using Adobe Illustrator.

RkJQdWJsaXNoZXIy MzE3MzI3