JIA-2018-09

2076 SONG Hui et al. Journal of Integrative Agriculture 2018, 17(9): 2074–2081 A total of 256975 M . truncatula expressed sequence tag (EST) sequences were downloaded from the National Center for Biotechnology Information (NCBI) database on January 19, 2016. EST abundance has been used to estimate gene expression intensity (Ohlrogge and Benning 2000). In this study, we surveyed all available EST sequences for each M . truncatula CDSs using a local BLAST program (Altschul et al . 1997). The number of EST sequences that match a specific CDS defines the expression intensity of a given gene (Ohlrogge and Benning 2000; Song and Nan 2014). The following evaluation criteria were used as thresholds to determine sequences subjected to further analyses (Song et al . 2015): (1) length of aligned sequences>200 bp; (2) identity>96%; and (3) E-value≤10 –10 . Correlation analyses were carried out using JMP 9.0 (SAS Institute Inc., Cary, NC, USA), and results were depicted using Origin 9.0 (OriginLab, Northampton, MA, USA). 3. Results 3.1. Base composition of M. truncatula A total of 62319 CDSs have been previously identified in the sequenced M . truncatula genome (Young et al . 2011). The total number of CDSs used in the present study was reduced to 39 531 using the filtering criteria described in Materials and methods. The GC content in these CDSs varied from 23.9 to 69.7% (SD=3.77, Appendix A). Moreover, the GC contents at three nucleotide positions of codons were different. The GC1 was the highest (47.6%), followed by that of GC2 (38.5%) and GC3 (36.3%). The average GC contents across the three positions were 40.9%, indicating that CDSs in M . truncatula have higher AT content (59.1%). RSCU is expressed as the observed frequency of a codon divided by its expected frequency. Thirty-one codons were less frequently used than expected (i.e., RSCU<1) and 26 codons were used more frequently (i.e., RSCU>1) in CDSs of M . truncatula , indicating the 26 codons are used preferentially among all CDSs M . truncatula (Appendix B). Furthermore, the RSCU analysis demonstrated that CDSs in M . truncatula are biased towards codons ending with A or T, except for AGG (Arg) and TTG (Leu) (Appendix B). 3.2. Factors associated with codon usage in M. truncatula A significant correlation between the average of GC1 and GC2 (GC12) and GC3 with a slope value close to 1 suggests that mutation pressure is the major force in shaping codon usage pattern (Sueoka 1988). If natural selection is the dominant factor, in contrast, the slope value is close to 0 (Sueoka 1988). In this study, a significant positive correlation ( r =0.12, P <0.01, and slope=0.08) between GC12 and GC3 with a slope close to 0 was observed (Fig. 1), suggesting that natural selection has shaped the codon usage pattern in M . truncatula . Gene spots occur along the curves of ENC plots if codons are constrained by neutral pressure (Wright 1990; Zhang et al . 2007). Other pressures influence codon usage if all gene spots occur below or above the ENC curve. In addition, Kawabe and Miyashita (2003) demonstrated that natural selection shapes codon usage if the GC3s across genes is narrow. Our analysis showed that in M . truncatula , most genes analysed fell below the ENC curve and GC3s values were distributed within a narrow range (0.2–0.5, Fig. 2), suggesting that natural selection plays a substantial role in the codon usage pattern. A comparison between ENC and CAI can assess the relationship between the nucleotide composition and the natural selection (Sharp and Li 1987; Wright 1990). In this study, we found ENC and CAI had no correlation ( r =0.06) in M . truncatula . There were significantly negative correlations between ENC and either Aor T content at synonymous third codon positions (A3s, r =–0.26, P <0.01; and T3s, r =–0.37, P <0.01), and significantly positive correlations between ENC and either C or G content at synonymous third codon positions (C3s, r =0.40, P <0.01; and G3s, r =0.24, P <0.01). These patterns indicated that in M . truncatula , CUB was featured with high A3s and T3s (AT3s) or low G3s and C3s (GC3s) values. The reasonable explanation is that natural selection acts on the third codon position to increase the A and T content (AT3, 63.7%), instead of the G and C content (GC3, 36.3%). 0 20 40 60 80 100 20 30 40 50 60 70 80 90 GC12 (%) GC3 (%) r =0.12 P <0.01 Fig. 1 Correlation between GC12 (GC1 and GC2) and GC3. GC content at the first (GC1), second (GC2), and third (GC3) codon positions were calculated using an in-house Perl script. Correlation analyses were executed in JMP 9.0, and the figure was generated using Origin 9.0.

RkJQdWJsaXNoZXIy MzE3MzI3