Previous Article | Next Article ![]()
Applied and Environmental Microbiology, June 2005, p. 3126-3130, Vol. 71, No. 6
0099-2240/05/$08.00+0 doi:10.1128/AEM.71.6.3126-3130.2005
Copyright © 2005, American Society for Microbiology. All Rights Reserved.
Department of Epidemiology and Biostatistics, Tianjin Cancer Institute and Hospital, Tianjin 300060, China,1 Department of Physics, Tianjin University, Tianjin 300072, China2
Received 19 November 2004/ Accepted 21 December 2004
|
|
|---|
|
|
|---|
Genomic islands contain clusters of genes that are horizontally transferred. By transferring genes across species boundaries, horizontal gene transfer (HGT) alters the genotype of a bacterium, which leads to increased genetic diversity and even new species. Now it is becoming increasingly clear that HGT has critical roles throughout bacterial evolution (1, 3-6, 8, 11).
Although the complete genome sequence of C. efficiens is available, no horizontally transferred genomic islands have been identified in the genome. Among the methods for detecting genomic islands, assessing the changes in GC content remains an established way. In the paper describing the genome, Nishio et al. used a window-based method, i.e., 20-kb sliding windows with a 1-kb step, to display the GC content distribution (see Fig. 1 in reference 9). The window-based method has a low resolution in detecting the GC content change. We recently proposed a windowless method for the GC content computation, the cumulative GC profile, which is much more sensitive to detecting GC content change than the traditional window-based method (12, 14, 15). In the present study, we used the cumulative GC profile to identify genomic islands in the C. efficiens genome. Consequently, four genomic islands which have much lower GC contents than those of the rest of the genome were found. In addition, these four genomic islands have many conserved genomic island-specific features, such as biased codon usage, the presence of mobile genes, and the presence of direct repeats and a tRNA locus at junctions.
![]() View larger version (24K): [in a new window] |
FIG. 1. (A) The cumulative GC profile for the C. efficiens genome and (B) the distribution of the codon usage bias along the genome as determined by use of 22-kb sliding windows. In the cumulative GC profiles, an increase means a decrease in GC content, and any sharp minimum (or maximum) point indicates a turning point, in which the GC content undergoes a relatively abrupt decrease (or increase). If the cumulative GC profile is approximately described by a straight line, the corresponding region is approximately constant in GC content. Therefore, some regions of the C. efficiens genome have an abrupt decrease in GC content, and these regions are fairly homogeneous in GC content. A quantitative index, h, was used to measure the homogeneity of genomic islands. Many of such low-GC-content regions correspond to peaks in the distribution curve of codon usage bias, indicating that genes in these regions have more-biased codon usages. The peak around kb 550 corresponds to a cluster of 18 ribosomal protein genes, whereas the peaks at about kb 266, kb 405, kb 875, and kb 1275 correspond to regions that have many conserved features of genomic islands. See the text for details.
|
|
|
|---|
The methods of the cumulative GC profile, the computation of codon usage bias, and the definition of homogeneity of the genomic islands have been detailed previously (15). Here we briefly summarize the methods.
Use of the cumulative GC profile to calculate GC content.
We define
![]() | (1) |
n is fitted by a straight line by the least-squares technique,
![]() | (2) |
n, we will use the z' curve, or cumulative GC profile, hereafter, where
![]() | (3) |
n in a sequence, we find from equations 1, 2, and 3 that
![]() | (4) |
zn/
n represents the average slope of the z' curve within the region
n. The region
n is usually chosen to be a fragment of a natural DNA sequence, e.g., a genomic island. Equation 4 describes the windowless technique for the GC content computation (12).
An index to measure codon usage bias.
The occurrence frequencies of codons (the stop codons are excluded) in a protein-coding gene may be deemed a 61-dimension codon usage vector. The mean codon usage vector determined for all genes in a genome is denoted by
. Suppose that the codon usage vector for the ith gene in the genome under study is denoted by ci. Then, the codon usage bias of this gene with respect to the average vector can be calculated by using the index of codon usage bias, cubi,
![]() | (5) |
| are the modules of the vectors ci and
, respectively. The larger the cubi, the more the codon usage bias of this gene.
An index to measure the homogeneity of the GC content of genomic islands.
We noticed that genomic islands have fairly homogeneous GC contents. The fact that a genomic island has a fairly homogeneous GC content implies that zn is
0. The variation of zn may be described by the deviation dgi defined by
![]() | (6) |
n is the cumulative GC profile defined in equation 3 for a genomic island (gi) and M is its length. Similarly, the deviation of the GC content from a constant for a whole genome may be described by dgenome, defined by
![]() | (7) |
n is the cumulative GC profile defined in equation 3 for a whole genome and N is its length. A homogeneity index hgi is defined by the following equation.
![]() | (8) |
|
|
|---|
Some low-GC-content regions correspond to peaks in the distribution curve of codon usage bias along the genome (Fig. 1B), indicating that DNA sequences located at these regions have a much more biased codon usage than that of the rest of the genome. It is known that ribosomal protein genes have much more codon usage bias than those of other genes in genomes. Therefore, the regions corresponding to ribosomal protein genes should be excluded. The peak around kb 550 corresponds to a cluster of 18 ribosomal protein genes, which are located from kb 529 to kb 570. There is a small peak at about kb 2980, which corresponds to six rRNA genes and two ribosomal protein genes located at a region from kb 2977 to kb 3000.
In the cumulative GC profile, four low-GC-content regions that correspond to four peaks in the distribution curve of codon usage bias, i.e., the peaks at about kb 266, kb 405, kb 875, and kb 1275, do not have ribosomal genes; instead, they have many genomic island-specific features. Therefore, these four regions are likely to be horizontally transferred genomic islands, which are designated CEGI-1, CEGI-2, CEGI-3, and CEGI-4, respectively.
Genomic islands are usually different in many characteristics from other regions of the core genome. For instance, genomic islands are different in GC content and codon usage from the rest of the genome. In addition, genomic islands have many unifying features. For instance, genomic islands are usually flanked by direct repeat elements, and an integrase gene is frequently located at the 5' junction. Furthermore, tRNA loci, which are usually located in the junctions, presumably are utilized as the integration sites. Finally, genomic islands often possess genes, such as integrase and transposase genes, that code for genetic mobility.
The GC contents of CEGI-1, CEGI-2, CEGI-3, and CEGI-4 are 0.595, 0.555, 0.595, and 0.566, respectively, much lower than that of the rest of the genome, 0.635. The codon usage biases of CEGI-1, CEGI-2, CEGI-3, and CEGI-4 are 0.181, 0.205, 0.186, and 0.197, respectively, values which are statistically larger than that of the rest of the genome, 0.146 (P of <0.001 for all four genomic islands) (Table 1).
|
View this table: [in a new window] |
TABLE 1. Features of the four genomic islands in the C. glutamicum genome
|
![]() View larger version (15K): [in a new window] |
FIG. 2. (A) CEGI-3 has a conserved structure of genomic islands. CEGI-3 is flanked by two direct repeats, and an integrase gene is located at the 5' junction. The figure is not drawn to scale. (B) Alignment of the two direct repeats.
|
The features, such as low GC content, biased codon usage, the presence of repeat elements, an integrase gene, and a tRNA locus at the junctions, the presence of transposase genes, and the homogeneity in terms of GC content, strongly suggest that the four regions, i.e., CEGI-1, CEGI-2, CEGI-3, and CEGI-4, are horizontally transferred genomic islands.
Among the species belonging to the genus Corynebacterium, C. efficiens can grow at the highest temperature, and it is the only one able to produce glutamate above 40°C (9). Therefore, the C. efficiens proteins are likely to be more thermostable than those of other members of the genus Corynebacterium, such as C. glutamicum. To test this, the thermal stabilities of 13 pairs, i.e., orthologs of enzymes, on the Glu and Lys biosynthetic pathways of the two species were compared on the basis of the enzymatic activities remaining after heat treatment of crude extracts. Most of the tested enzymes from C. efficiens were more thermostable than their C. glutamicum orthologs; unexpectedly, however, the enzyme aspartate kinase from C. glutamicum was more thermostable than that from C. efficiens (9). This phenomenon is hard to explain. We found that the gene encoding this enzyme, ORF CE0220, which is the only aspartate kinase gene in the C. efficiens genome, is located in CEGI-1. Therefore, one explanation is that the adaptive mutations of the C. efficiens genome have not occurred extensively due to the recent horizontal transfer of this gene. Indeed, a recent comparative study of C. efficiens, C. glutamicum, and Corynebacterium diphtheriae suggested that the evolutionary events of gene loss and HGT must have been responsible for the functional differentiation in amino acid biosynthesis of the three species of corynebacteria (10). The finding that C. efficiens may harbor four genomic islands is consistent with this result, i.e., events of HGT have happened during the evolution of the C. efficiens genome. Among the 13 tested enzymes, a diaminopimelate dehydrogenase from C. glutamicum was also more thermostable than its C. efficiens ortholog (9), but the ORF coding for this enzyme, CE2498, does not seem to be horizontally transferred.
Both C. efficiens and C. glutamicum belong to the genus Corynebacterium (2, 10). If the four genomic islands were integrated before the divergence of C. efficiens and C. glutamicum, it is likely that C. glutamicum would have the genomic islands that C. efficiens has. However, none of the four genomic islands is present in the C. glutamicum genome. Therefore, it is very likely that the four genomic islands were integrated after the divergence of C. efficiens and C. glutamicum. Therefore, the length of time that the four genomic islands have been present in the C. efficiens genome is relatively short compared to the length of time of the whole evolutionary process of C. efficiens from the origin of this species. In addition, based on a comparison of all orthologous ORFs between C. efficiens and C. glutamicum, there is tremendous bias in amino acid substitutions (9). Three substitutions were found to be important for the stability of the C. efficiens proteins: the substitutions of arginine for lysine, alanine for serine, and threonine for serine. A point system was defined previously as the difference between the sum of the three substitutions from C. glutamicum to C. efficiens (of Arg for Lys, Ala for Ser, and Thr for Ser) and the sum of the reverse substitutions. Based on this point system, the point value for the aspartate kinase is 1, suggesting that this protein does not have the three biased amino acid substitutions that lead to increased thermal stability (9). This observation further supports our hypothesis that during evolution, due to the recent HGT, the adaptive mutations have not occurred extensively enough to increase the thermal stability of this aspartate kinase.
C. efficiens is of particular interest from an industrial viewpoint because of its ability to produce amino acids at high temperatures. Aspartate kinase is the first enzyme in the aspartate-derived amino acid biosynthesis pathways for the production of lysine, methionine, threonine, and isoleucine (7). Because the gene coding for this aspartate kinase is the only aspartate kinase gene in the C. efficiens genome, it is possible that the increase in the thermostability of this enzyme can potentially further increase the ability of C. efficiens to produce certain amino acids at high temperatures. Our results may provide some insight in explaining the relatively low thermostability of this aspartate kinase and some clues to the work necessary to increase its thermostability.
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»