JIA-2018-09
2079 SONG Hui et al. Journal of Integrative Agriculture 2018, 17(9): 2074–2081 abundant tRNAcannot be translated most accurately based on 73 bacterial genomes from 20 different genera (Shah and Gilchrist 2010). Besides these hypotheses, other factors may also explain these results. First, optimal codons with low tRNA abundance may encode conserved domains. Purifying selection acts on codons with low-abundent tRNAs, many of which encode conserved domains that play a crucial role in physiological development (Zhou et al . 2013; Chaney and Clark 2015). As such, proteins encoded by genes with low-abundant tRNAs have experienced purifying selection, and these proteins may play a vital role in M . truncatula . Secondly, codons with low-abundant corresponding tRNAs may be used in more frequency. In vivo analyses in Saccharomyces cerevisiae indicated that codons preferentially used in highly expressed genes are not translated faster than those highly expressed genes with non-optimal codon usage (Novoa and de Pouplana 2012; Qian et al . 2012). Recent studies have focused on identifying factors that act on CUB, but some resulting conclusions are inconsistent. In this study, we performed correlation analyses between Fop and a number of variables, including sequence length, GC content, CAI, and ENC. We found that optimal codon use (i.e., Fop) is not correlated with CDS length, but positively correlated with the GC content and CAI. However, Wang and Hickey (2007) found that CUB is negatively correlated with gene length, and that short genes have high GC content compared to long genes in rice. Ingvarsson (2007) showed that Fop values are negatively correlated with protein lengths, but strongly and positively correlated with the GC3 content in P . tremula . Note that CAI, which indicates CUB in genes with high expression levels, is the major factor associated positively with optimal codon usage. A strongly positive correlation has been found between CUB and gene expression in many species, including Cardamine spp., P . tremula , S . latifolia , and Tribolium castaneum (Ingvarsson 2007; Qiu et al . 2011a; Ometto et al . 2012; Williford and Demuth 2012). When average gene expression intensities within a given tissue type are examined, Fop is not correlated with gene expression; however, when maximal gene expressions across tissues are under survey, Fop is weakly correlated with gene expression (De La Torre et al . 2015). In M . truncatula , gene expression is not correlated with CDS length and GC content. Similar results were also observed previously in rice. Liu et al . (2004) confirmed that natural selection is one major driving force behind gene expression level, whereas CDS length only plays a minor role in rice. However, Qiu et al . (2011a) found that gene expression is positively correlated with the GC3 content, but strongly and negatively correlated with the intron GC content. GC3 is not positively correlated with gene expression in A . thaliana and A . lyrata , but there is a weak positive correlation between gene expression and intron GC content (Wright et al . 2004). The studies of the correlation between CDS or genomic DNA length and gene expression have led to controversial results. Long gene sequences actually improve gene expression in species such as T . castaneum and Picea spp. (Williford and Demuth 2012; De La Torre et al . 2015). By contrast, Camiolo et al . (2015) confirmed that short and higher-GC DNA sequences are always positively correlated with gene expression and optimal usage bias in four monocots, 15 dicots and two mosses. Some studies have proposed that short protein- coding sequences with high expression levels are less costly in terms of metabolism (Williford and Demuth 2012; Whittle and Extavour 2015). However, Yang (2009) argued that short sequences with high expression levels hardly support the energy-cost hypothesis, but may be better reconciled Amino acid Codon 2) tRNA (copy number) High expression RSCU (number) Low expression RSCU (number) Gly GGT* ACC(1) 1.58 (14 585) 1.33 (10 158) GGC GCC(23) 0.29 (2 674) 0.98 (7 432) GGA* TCC(15) 1.68 (15 498) 0.99 (7 537) GGG CCC(3) 0.45 (4 171) 0.70 (5 323) 1) RSCU, synonymous codon usage. 2) * indicates optimal codon. ND indicates not detected. Table 1 (Continued from preceding page) Table 2 Correlation analysis between coding sequence architecture features and gene expression based on expressed sequence tag ( EST) abundance in Medicago truncatula 1) CDS length DNA length GC1 content GC2 content GC3 content GC content in CDS GC content in DNA Fop –0.07 –0.004 0.47 ** 0.28 ** 0.27 ** 0.23 ** 0.40 ** EST –0.04 –0.02 0.08 0.04 0.06 0.10 0.03 1) CDS, coding sequence; GC1–3, GC content at the first, second, and third codon positions, respectively; CDS, coding sequence; Fop, frequency of optimal codons. ** indicates significance at P <0.01.
Made with FlippingBook
RkJQdWJsaXNoZXIy MzE3MzI3