JIA-2018-09
1974 YANG Hai-long et al. Journal of Integrative Agriculture 2018, 17(9): 1972–1978 immobilized array platform, and next generation sequencing. PCR-based amplicon assay typically embodies on KASP TM (LGC Group), TaqMan ® (Thermo Fisher Scientific, USA), SNaPshot ® Multiplex System (Thermo Fisher Scientific), and SNP Type™ (Fluidigm, USA) assay, which use allele- specific PCR combined with fluorescence label technology to identify SNP (Thomson 2014; Campbell et al . 2015). Immobilized array platform uses the fluorescence intensity of hybridization signals between specific predesigned probes and fragmented DNA to detect the SNP site variations (LaFramboise 2009), such as Illumina (BeadArray and GoldenGate) and Affymetrix (GeneChip and Axiom) (Ragoussis 2009). NGS stands out with short read technology like Illumina and Ion Torrent as well as long read technology such as Single Molecule Real Time (SMRT) sequencing technologies (Pacific Biosciences, USA) and Oxford Nanopore Technologies (Thomson 2014; Cornelis et al . 2017; Weirather et al . 2017). One of the very common approaches of these technologies is to do SNP genotyping by using whole genome or transcriptome sequence to call SNP variants, such as, genotyping by sequencing (GBS) (He et al . 2014). Once SNP data are prepared, the incoming step is editing and analysis. Here we used the selected public maize SNP hapmap data to demonstrate the pipeline. 3.2. Second step: edit raw data and import it into SNPhylo To obtain a compatible import format, the maize SNP data were edited manually. Generally, maize SNP sequencing data, especially in hapmap format, and chromosome symbols from a couple of the last rows were associated with the scaffold. However, these rows of data were not detectable in SNPhylo software because the development of software is only for numeric chromosomes. Therefore, rows associated with the scaffold were substituted into “not-real” chromosome symbols that were based on the number of scaffold types to ensure that the SNPhylo software was able to detect these non-chromosomal SNP data. To illustrate this process, we used the undetectable scaffold part of the public hapmap data to show the details. For example, scaffold_252 was named chromosome 11, and scaffold_507 was named chromosome 12 because maize usually has 10 basic chromosome symbols. Data were imported into SNPhylo and analyzed with the following script: snphylo.sh -H HapMap_file -m 0.05 –a 13 –A, where H is the hapmap file, m is the minor allele frequency, a is the total number of autosomes in a modified hapmap file, and A is Perform multiple alignment by MUSCLE. More options for this script were based on the specific analysis plan, and detailed information is available on the SNPhylo software website (Lee et al . 2014). 3.3. Third step: import tree format file into MEGA The original output files generated by SNPhylo made it difficult to see the relationship among hundreds of samples. Therefore, it is necessary for researchers to understand the data more concisely and intuitively when presenting huge data. To address this problem, each output format was searched to match the subsequent analyzing software. Interestingly, we found that the tree file was a Newick format file that can be imported into MEGA software and edited manually (Tamura et al . 2013). The tree format file was imported into MEGA with Display Newick Trees in the User Tree option to subsequently edit the data. With the options of swap tree and resize, we finally obtained a more concise phylogenetic tree (Appendix C). 3.4. Final step: edit output file with Adobe Illustrator To better understand the distribution of these maize inbred lines, we performed color-grouping on the modified output phylogenetic tree (Appendix C) with Adobe Illustrator CS6 software (Adobe Systems Incorporated, USA). Based on the background of typical representative inbred lines and the phylogenetic relationship distances among them, three major groups and two minor groups were constructed (Fig. 2). The red group was inferred as tropical_subtropical (TS) since the majority of inbred lines come from CIMMYT and subtropical areas with some representative lines such as CML69, NC296, Tzi 10, and Ki11. The green group represented the stiff stalk (SS) lines with the typical inbred line B73. The blue group represented the non-stiff stalks (NSS) with common lines, including A619, W22, and Mo17. In the two minor groups, the purple group contained several inbred lines from sweet corn, whereas the orange group mainly comprised popcorn inbred lines (Fig. 2). These results confirmed that this color-grouping method was reliable, and more importantly, it was highly consistent with previous data (Liu et al . 2003; Flint-Garcia et al . 2005). 4. Discussion Progress in analyzing and presenting huge genotyping data has been highly driven by the development of many outstanding pieces of software and tools. For example, PowerMarker, an early well-designed software, is able to illustrate clustering relationships and structure in population members by processing simple sequence repeats (SSRs), restriction fragment length polymorphisms (RFLPs), SNPs, and other format files (Liu and Muse 2005). Structure software plays a substantial role in analyzing genotyping data based on its widespread input data formats, manipulating platforms, efficient analyzing methods, and
Made with FlippingBook
RkJQdWJsaXNoZXIy MzE3MzI3