| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
,
Bioinformatics Program,1 Department of Microbiology & Immunology, University of Michigan Medical School, Ann Arbor, Michigan 48109,2 Center for Statistical Genetics and Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, Michigan 481093
Received 19 July 2006/ Accepted 4 November 2006
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
Direct experimental approaches to operon identification, such as Northern blotting or primer extension, are usually costly and time-consuming, and so there is considerable interest in the development of computer algorithms that will accurately predict genome-wide operon structure. Given the rapid pace at which bacterial genomes are now being sequenced, there is a particular need for methods that are generally applicable across the bacterial domain. This requirement for "portability" places limits on the information that can be used in such algorithms and necessarily excludes experimental and detailed functional data, which are only available for a small subset of sequenced genomes (and often for only a subset of the genes within these genomes). A truly portable operon prediction algorithm must essentially rely on data inherent in the genome itself: the identity, spacing, and orientation of genes, as well as the sequence.
There has been a variety of prediction algorithms developed in recent years, including those that take advantage of experimental or functional data as well as examples that are more generalized and only require sequence information. Examples of the former include methods that rely on microarray-based expression data (3, 4, 7, 26) and others that use different forms of detailed functional annotation (5, 30, 35). Although these algorithms have shown great promise in terms of being able to predict operon structure with a high degree of specificity and sensitivity, the data they rely on are only available for a select subset of bacterial species, and this limits how widely they can be used.
Progress has also been made toward a more generalized method for operon prediction, and a number of groups have constructed algorithms based on a variety of diverse information sources, including codon usage statistics (3, 4) and the identification of promoter and terminator sequences (6, 28, 33, 34). Although these data have all proven to be valid predictors of operon structure, it is striking that these studies have also consistently demonstrated that one of the most valuable predictors is simply intergenic distance. The distances between genes within an operon tend to be considerably shorter than the distances between genes that are not cotranscribed, and in several recently developed algorithms, intergenic distance was shown to be more informative than any other data source, including even microarray-based expression data (3, 7, 28, 34). In addition, this trend appears to be universal in bacterial genomes, making it a very attractive option for a generalized, portable prediction algorithm (7, 15, 33). Unfortunately, intergenic distance alone only allows for a specificity of
65 to 70% when tested on a large set of experimentally verified operons from within the Escherichia coli genome, and so other sources of information must be added to bring the total accuracy to a more acceptable level (28).
Another promising generalized predictor of operon structure is the degree to which gene order is conserved across a variety of genomes, with the general idea that adjacent genes that are found in the same order in multiple genomes are more likely to be cotranscribed. This method has previously been used as a means of assessing functional relatedness among proteins (17), and several studies have shown that operons in a given bacterial genome could be identified with a very high degree of specificity using this approach (98%, as reported by Ermolaeva et al. [9]). The drawback, however, is that its accuracy is derived from genes being conserved in a relatively large number of species, and the method tends to miss operons containing genes that are unique or less conserved. In addition, a study by Itoh et al. showed that during evolution many operons undergo shuffling events that change the order of genes within (but not their overall content), and such operons are missed by an algorithm that requires conservation of order to make predictions (11). The result is that despite the high specificity of this algorithm, it is inherently insensitive and can only be applied to 30 to 50% of the genome being examined (9).
The extremely high specificity achieved using conserved gene pair information underscores the utility of these data in operon prediction, and we hypothesized that it might be possible to exploit phylogenetic information in a more general way, such that it could be applied to the entire genome. If so, we reasoned that these data, when combined with intergenic distance information in a rigorous statistical model, would allow for highly specific and sensitive operon prediction in any sequenced bacterial genome. With this in mind, we developed a generalized method for using phylogenetic data to predict operon structure. We adopted a Bayesian approach in constructing a hidden Markov model (HMM)-based algorithm that combines these data with intergenic distance statistics. Using a large set of experimentally verified operons from E. coli, we found that an optimized version of the algorithm predicted operon structure with specificity and sensitivity levels of >85%. We have also shown that the method can be generalized easily and applied to essentially any bacterial genome, regardless of the availability of any experimental or functional data.
We applied the algorithm in predicting the operon structure within the genome of Bacillus anthracis and successfully predicted all previously known B. anthracis operons. In addition, we identified a large number of putative operons that link apparently unrelated genes in cotranscriptional relationships, and we chose a particularly interesting example (BA1489-92) for further testing. This putative operon contains four genes that have little in common functionally, and they have not been predicted to be cotranscribed by any previously developed algorithm (23). Reverse transcription-PCR (RT-PCR) experiments confirmed that these genes are in fact cotranscribed, and targeted gene disruption data obtained in a related study (18) were also consistent with this finding. These results suggest a new functional link between the genes within this operon and have interesting implications for B. anthracis biology. In addition, they suggest that many other important functional and regulatory relationships may be identified in the same way and that the algorithm developed in this study may be a significant new tool for the field of bacterial genomics.
| MATERIALS AND METHODS |
|---|
|
|
|---|
The set of 359 experimentally verified E. coli transcriptional units (257 operons and 102 singly transcribed genes) used in testing and developing the algorithm was obtained from the supplemental information provided by Sabatti et al. in their recent study (26). The list was originally compiled as part of the RegulonDB, a database of transcriptional regulation and organization for E. coli K-12, and further information regarding the experimental verification of the transcriptional status of each of these operons may be found there (http://www.cifn.unam.mx/Computational_Genomics/regulondb/) (27). The genes included in this set were mapped onto the E. coli reference file using scripts within MS Excel.
Phylogenetic distribution analyses.
The peptide sequence files obtained for each genome were used to construct local BLASTP databases, and the complete set of peptide sequences from the subject genome was compared to each database using a locally installed copy of the NCBI BLAST search tool (software obtained from the NCBI website [ftp://ftp.ncbi.nlm.nih.gov/BLAST/]) run with default parameters (2). The results from these searches were parsed using a Perl script that identified potential orthologs as the best hit within a given comparative genome for each peptide sequence within the subject genome, with the additional provision that the expect value for potential orthologs was required to be less than our defined cutoff (104). This method for identifying orthologs is intentionally more promiscuous than the commonly used "reciprocal best match" method. Our goal in this study was to construct an algorithm that was capable of predicting operon structure across an entire genome, and use of the reciprocal best match method was problematic because of the way this technique deals with paralogs within the subject genome. Briefly, if there are a number of paralogs within the subject genome that are all homologous to a single gene in the reference genome, the reciprocal best match method will only identify the most related of these as having an ortholog within the reference genome, even though all of them are actually related to that ortholog. Essentially, this means that the reciprocal best match method results in a number of paralogous genes within the subject genome for which we have inaccurate phylogenetic data, and the simpler "best match" method that we use here avoids this problem by considering each gene independently.
A binary vector, with each dimension corresponding to a given comparative genome and denoted by a 1 if an ortholog is present and a 0 if there is no ortholog within that genome. These vectors have a passing similarity to commercial barcodes, and we refer to them as phylogenetic barcodes. They can be written as follows: (
,
, ... ,
), where
= 0 or 1 and j = 1, 2, ... , m and where each binary code
indicates whether an ortholog for gene i can be found in the jth related species. The final lists of potential orthologs from each comparative genome were combined into a single file from which the list of phylogenetic barcodes corresponding to each gene in the subject genome was compiled. To test the difference between the barcodes of two adjacent genes, and thus assess the difference in their phylogenetic distributions, we compared the two vectors by counting how many differences exist between them. This was calculated as follows:
![]() |
|
Two sources of information are considered in this study: the distance between adjacent genes and the difference in their phylogenetic distribution. Both can be derived from the genome sequence alone, provided that the physical location of each gene is known. In this study, we adopted an HMM framework to accommodate all the information. Previous experience suggested that gamma distributions would fit the inter- and intraoperonic intergenic distances well. We modeled these two populations using two distinct gamma distributions and defined the transition probabilities from one state to the other as a function of the intergenic distance between each adjacent gene pair. In general, a small intergenic distance suggests that the two genes belong to the same operon, and a larger intergenic distance favors the opposite possibility. We also treated the phylogenic conservation barcode defined above as observed data generated from the hidden states. The differences in the barcodes between two adjacent genes are assumed to follow one of two binomial distributions, depending on whether or not they belong to the same operon. Due to the lack of good-quality training data, we adopted a Bayesian HMM scheme as described by Liu (12), where distribution parameters are used to calculate emission probabilities and path are inferred from the data, and two empirical distributions were used to calculate the distance-dependent transition probabilities. We applied the Gibbs sampler technique to iteratively sample from the conditional distributions of these unknown quantities. A detailed description of these methods can be found in the supplemental material.
For comparison purposes, we also constructed two homogeneous HMMs which model intergenic distances or phylogenic barcode differences exclusively. In these studies, all distribution parameters, path, and transition probabilities are inferred from the data. No additional training data are required. The HMM using intergenic distance alone performed better than the HMM using just the phylogenic barcode difference data and, not surprisingly, both were inferior relative to the aforementioned inhomogeneous HMM that combines the two sources of information. Detailed descriptions of the implementation of these HMMs can also be found in the supplemental material.
Algorithm testing and scoring of predictions.
Predicted operons in the E. coli K-12 genome were scored relative to the set of known E. coli transcriptional units described in the Results section by using MS Excel. Statistical testing, including receiver operator characteristic (ROC) curve analysis, was done within Excel using the Analyze-It general statistics and clinical laboratory modules (Analyze-It Software, Ltd., Leeds, England).
Software availability.
Software implementing the HMM-based prediction algorithm is available from our group upon request, as are the Perl scripts used to parse the BLASTP results. Potential users should note that the software tools developed for this study are generally quite fast (requiring less than 1 min on a typical desktop PC), and even the slowest step (the BLASTP comparisons) can be performed in less than an hour.
Cell growth conditions and RNA isolation.
Brain heart infusion broth cultures of Bacillus anthracis strain Sterne (34F2) were grown overnight and then diluted 1:1,000 into nutrient-limiting (sporulation) modified G medium. At an optical density at 600 nm of 1.0, 5 ml of culture was collected and bacteria were pelleted by centrifugation. RNA isolation was performed using the Ambion RiboPure-Bacteria kit per the manufacturer's instructions with the following modifications: cell disruption with zirconia beads was done for 15 min, 450 µl of RNAwiz was used, bromochloropropane was used in place of chloroform, and 50 µl of RNase/DNase-free distilled water was added during extraction. A QIAGEN RNeasy mini kit with a DNase digestion step was used per the manufacturer's RNA cleanup protocol. RNA was quantitated via the A260/A280 ratio on a Beckman DU530 spectrophotometer. One microgram of RNA was run on a denaturing formaldehyde gel to verify purity. The above procedures were carried out in three separate experiments utilizing two unique cultures each time.
RT-PCR.
In three separate experiments, 500 ng to 1.0 µg of RNA was used to perform endpoint RT-PCR using the Invitrogen one-step RT-PCR with platinum Taq per the manufacturer's instructions. Briefly, reverse transcription was performed at 50°C for 30 min. PCR was performed with 0.25 pg of operon/gene-specific primers for 35 or 37 cycles with an elongation temperature of 70°C and extension time of 1 min 10 s. Five µl of endpoint PCR product was then run in 0.7% agarose gels and visualized with ethidium bromide. Negative controls omitting reverse transcriptase and positive controls with B. anthracis Sterne (34F2) genomic DNA were done with each experiment. Operon/gene-specific primers were designed to result in 0.6- to 1.0-kb products.
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
In order to test this possibility, we compared the E. coli K-12 genome to 35 other bacterial genomes (chosen arbitrarily as a diverse set of species, including both distant and close relatives of E. coli) (Table 1) and searched for the possible presence of orthologs to each gene from the K-12 genome in each of the other genomes. The phylogenetic distribution of each E. coli gene was then compiled from these searches and represented as a 35-digit phylogenetic barcode. We hypothesized that genes within an operon would be more likely to have similar phylogenetic distributions and, therefore, that operon boundaries might be identifiable as comparatively large changes in barcode structure between two adjacent, codirectional genes (note that throughout this study, only codirectional gene pairs are examined). With this in mind, we calculated the difference between the barcodes of each adjacent gene pair in the E. coli K-12 genome by comparing each pair of vectors and counting how many differences exist between them (Fig. 1A). We then used a large set of experimentally verified E. coli transcriptional units (257 operons and 102 singly transcribed genes) to directly test our hypothesis. The 929 genes in this set provide us with 580 verified intraoperonic gene pairs and 626 verified interoperonic gene pairs, and the probability distribution of barcode differences for each of these two populations is shown in Fig. 1B. We found that the differences observed within known operons and those observed at operonic boundaries form significantly different populations, with intraoperon gene pairs generally having a smaller phylogenetic barcode difference than interoperon gene pairs. These results suggested that the phylogenetic barcode data might be a valuable predictor of operon structure throughout the entire genome.
|
The HMM-based algorithm was constructed as described in Materials and Methods and applied to the E. coli K-12 genome. We found that, using only the phylogenetic barcode information derived from the 35 comparative genomes and scoring our predictions using the set of known transcriptional units, the algorithm was able to predict the operon status of adjacent gene pairs with 89.6% sensitivity [(true positives)/(true positives + false negatives)] and 61.6% specificity [(true negatives)/(true negatives + false positives)] when using a predicted probability cutoff value of 0.5. Because a full spectrum of prediction probabilities is possible, the performance of the algorithm is more completely described by an ROC curve. In this method of analysis, specificity and sensitivity are plotted for every possible prediction probability cutoff value, and the area under the curve provides a combined measure of algorithm performance (with a maximal value of 1.00). The ROC curve corresponding to predictions generated using the phylogenetic barcode information is shown in Fig. 2 and the area under the curve is 0.819, indicating that this data source has substantial predictive value. Intergenic distance, when used alone in the HMM, yielded predictions that were somewhat different. At a probability cutoff of 0.5, this data source made the algorithm slightly more sensitive (91.1%) and also more specific (75.0%) than when the phylogenetic data were used alone (Fig. 2), and the area under the curve was slightly higher (0.852). When we combined the two sources in an inhomogeneous HMM, with transition probabilities estimated from intergenic distances, we found that a 0.5 cutoff value yielded predictions with a sensitivity of 87.5% and a specificity of 86.4% (Fig. 2) and an area under the ROC curve of 0.907. We also found that HMM helps improve the performance of the algorithm, as the Markov property captures the "clustering" characteristic of the operons. If the dependency among adjacent states were ignored (that is, if we fixed the transition probabilities at 0.5 in the HMM) and a naïve Bayes approach was used to combine the two sources of information, the area was decreased to 0.857. These data are summarized in Table 2, and they indicate that as we had hypothesized, the algorithm performs best when the two data sources are used in combination within the framework of the inhomogeneous HMM.
|
|
These results also seemed to imply that the algorithm might be somewhat insensitive to the phylogenetic distribution of comparative genomes (relative to either the subject genome or to each other), a trait that would seem to be highly desirable in a method that is designed to be applicable to any bacterial genome, including those for which there are no sequenced close relatives. To further test this possibility, we examined algorithm performance under conditions in which we systematically varied either the relatedness of the comparative set to the subject genome (E. coli) or the diversity inherent within the comparative set itself. Consistent with our earlier results, we found that in both cases these changes had negligible effects on overall algorithm performance (see Fig. S1C and D in the supplemental material). Altogether, the data confirm the earlier indication that the algorithm is relatively insensitive to the phylogenetic distribution of comparative genomes, and they seem to suggest that near-optimal performance can be obtained using a variety of different comparative sets. This point should be stressed, because although it is not yet clear how to construct a perfectly optimal set of comparative genomes for a given bacterial genome, the very small differences we observed in algorithm performance even when relatively large changes were made to the comparative set suggest that a truly optimal comparative set may provide only a minimal improvement.
One other issue we were interested in testing was the effect of BLASTP stringency on the algorithm. The original comparative data were compiled with the requirement that potential orthologs were required to have a BLASTP expect value of less than 104; this value is relatively permissive and is similar to the cutoff values used in earlier studies (9). We therefore tested whether making this cutoff value more stringent might affect the predictive value of the comparative data. We found that even when the expect value cutoff was changed to 108 there was essentially no change in the algorithm's performance (see Fig. S1E in the supplemental material). Given this, all other experiments were performed with the 104 cutoff value.
Altogether, we found that the algorithm was relatively insensitive to a variety of changes in how the phylogenetic data were compiled, and this was true even when the intergenic distance component was removed from the HMM (that is, the robust behavior of the algorithm to changes in the phylogenetic reference set was not due to the fact that the intergenic distance component was overwhelmingly dominant [data not shown]). The comparative set that provided the best prediction performance was the set containing the 22 species of Proteobacteria taken from the original 35 species in Table 1, and using the phylogenetic data compiled from this set together with the intergenic distance information in our algorithm resulted in an area under the ROC curve of 0.916. As shown in Table 2 (data source E), we found that with this method we were able to predict operon structure in E. coli with both sensitivity and specificity of >85% and that by choosing the appropriate cutoff value we could fairly easily obtain levels of 90% in either parameter (with a corresponding drop to
80% in the other). The predictions generated in this set are available at http://www.sph.umich.edu/
qin/hmm/.
Operon prediction in Bacillus anthracis.
We anticipate that predicting operon structure in previously uncharacterized genomes will provide a variety of clues in terms of possible functional and/or regulatory relationships. This is particularly significant in pathogenic bacteria, where these leads may be useful from the perspective of drug or vaccine development. To test this idea, we used a 20-genome comparative set (chosen as a relatively small, widely diverse group) (Table 3) to predict operon structure in the gram-positive pathogen Bacillus anthracis. Although this organism is now being given considerable attention because of its potential as a bioterror agent and several B. anthracis strains have been fully sequenced (19, 24, 25, 36), its operon structure remains essentially unknown. On the chromosome of B. anthracis, which contains 5,308 protein-coding genes, the algorithm predicted a total of 2,473 cooperonic gene pairs when we used a prediction probability cutoff of 0.5. These pairs form 1,121 multigene operons that contain between 2 and 32 genes, and the probability distribution of operon length is remarkably similar (Pearson's correlation, 0.9917) to that reported for Bacillus subtilis in a recent study (7). Although there are very few experimentally verified operons to use in testing the predictions, we note that the gene pairs within the operons that have been verified (i.e., plcC-spmC, csaAB, rsbVW-sigB, asbABCDEF, and gerHABC) were all predicted successfully.
|
0.99 in each case) that BA1580, BA1581, BA1582, and BA1583 make up a single operon, and thus by association we are able to propose not only that these uncharacterized genes might be somehow related to formation of the spore coat, but also that they could potentially be targets for new therapeutics. Another relatively common finding in examining the predicted operon structure within the B. anthracis genome was that in many instances, regulatory genes (e.g., loci encoding transcription factors) appear to be cotranscribed with genes that have probable roles in sensing a particular environmental cue. One example is the predicted two-gene operon BA5371-2, which encodes an RNA polymerase sigma factor and a glutaredoxin family protein. This RNA polymerase sigma factor is one of many uncharacterized sigma factors encoded within the B. anthracis genome, and its apparent linkage to a glutaredoxin family member seems to suggest that its function may be related to the oxidative state of the environment. Another case in which a probable regulatory function is suggested by operon prediction is the putative three-gene operon BA5503-5, which encodes a sensor histidine kinase, a DNA-binding response regulator, and a UDP-glucose 4-epimerase, respectively. It is typically difficult to predict a priori the environmental signal that a two-component system responds to, and in this instance operon prediction provides a useful clue in suggesting that these genes may be associated with galactose utilization.
Perhaps even more useful are the instances in which prediction of operon structure links disparate biochemical functions and thus assists in our understanding of the organism's biology. A notable example of this is found in examining the genes BA1489 to -92, which encode a putative superoxide dismutase (sod15), a D-alanyl-D-alanine carboxypeptidase, and spore maturation proteins A and B, respectively (Table 4). The algorithm predicted that these four genes form a single operon, with a prediction probability of
0.99 for each of the three internal gene pairs and
0.01 for the two pairs on either side. Homologs of the three downstream genes have been shown to play roles in spore maturation (22), but there is no obvious function for superoxide dismutase (sod15) in this process. Since the finding that they appear to be part of the same operon may imply a possible functional or regulatory link between them, we sought to test the algorithm's prediction that these genes are cotranscribed. We isolated RNA from bacterial samples grown to late exponential phase and performed RT-PCR analyses as diagrammed in Fig. 3. Briefly, we designed primer pairs that would only amplify a product if two adjacent genes could be found on a single mRNA molecule, and we tested whether we could detect the presence of cotranscribed BA1489 and -90 (AB), BA1490 and -1 (CD), and BA1491 and -2 (EF) within the RNA pool. In each of these cases we detected an appropriately sized PCR product, indicating that these gene pairs are cotranscribed at least some of the time (our results do not rule out the possibility of multiple promoters and transcripts that may include some but not all of these genes). These data confirm the prediction made by our algorithm that these four genes are cotranscribed and point to a previously unseen link between the sod15 gene and the process of sporulation (and perhaps a specialized role that distinguishes the Sod15 protein from the other three B. anthracis superoxide dismutases). Significantly, a related study showed that a strain of B. anthracis missing the sod15 locus formed spores that had ultrastructural differences and a slightly higher sensitivity to heat relative to wild-type B. anthracis, confirming the idea that the sod15 locus is likely involved in sporulation (18). It was also interesting that this operon would have been missed if the algorithm had used either the intergenic distance or the phylogenetic data alone; a model using intergenic distance alone predicts that the BA1489-90 transition is an operon border, and a model using phylogenetic data alone incorrectly predicts that the BA1490-1 pair is not cooperonic. The combined information allows the correct identification of all three intraoperon pairs, and this instance highlights the value of using combined data sources.
|
|
qin/hmm/.
Conclusions.
In this study we have demonstrated that adjacent genes within the same operon tend to have a much more similar phylogenetic distribution than adjacent genes that are not cotranscribed, and we have developed a hidden Markov model-based algorithm in which these data can be used to predict operon structure in a newly sequenced bacterial genome. Furthermore, we have shown that when the phylogenetic data are combined with intergenic distance statistics in an inhomogeneous HMM, the algorithm is able to predict operon structure with a high degree of both sensitivity and specificity (Table 2, data source D). Significantly, we find that in general the algorithm appears to perform best using a relatively small group of comparative genomes, and it seems to be somewhat insensitive to the phylogenetic distribution of these genomes relative to each other and to the subject genome. Thus, it appears that the algorithm proposed here is easily portable to other completely sequenced bacterial species, including those for which there are little or no functional or experimental data available, and that operon prediction at or near the levels of specificity and sensitivity shown in Table 2 (data source D) should be attainable for these species as well.
This study was somewhat unique in aiming to construct an algorithm that does not rely on experimental data (e.g., gene expression data) or on detailed gene annotations (e.g., clusters of orthologous genes, or COG, family information) and in aiming to predict operon structure for any given bacterial genome. Perhaps the most similar study to date is that of Price et al., in which the authors proposed an algorithm that relies on intergenic distance, codon usage, and COG information for operon prediction in any bacterial species (23). Although our method relies on fewer data sources and assumes that the distributions of these data are generally species independent, we found that the algorithm's performance is almost identical to that described by Price et al. (areas under ROC curves of 0.916 and 0.917, respectively, when scored on an identical set of known E. coli operons) (Fig. 4). The fact that the algorithm presented here performs equivalently with a simpler set of input data is especially significant given that our method does not rely on detailed annotation information, which is unavailable for a significant fraction (20 to 30%) of most bacterial genomes (32), and is therefore able to predict operon structure throughout the entire genome.
|
Going beyond phylogenetic and intergenic distance data, it will also be interesting to test the utility of other data sources when added to the algorithm described here. These include both information available for essentially all bacterial species, such as codon usage or transcriptional terminators, as well as data that are only available for a smaller set of species, such as microarray and detailed functional classifications. Finally, we note that although different prediction algorithms often reach similar levels of accuracy, they typically do not make completely overlapping predictions. An equivalent problem is found in comparing gene prediction algorithms, and a recent study by Allen et al. showed that combining the results of multiple gene prediction models allowed for more accurate results than could be attained by any single algorithm alone (1). If the same trend holds true for operon prediction, it may be possible to reach unprecedented levels of accuracy by combining the predictions generated using several different statistical models.
| ACKNOWLEDGMENTS |
|---|
This work was supported by DHHS contract N266200400059C/N01-AI-40059.
| FOOTNOTES |
|---|
Published ahead of print on 22 November 2006. ![]()
Supplemental material for this article may be found at http://aem.asm.org/. ![]()
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| J. Bacteriol. | Microbiol. Mol. Biol. Rev. | Eukaryot. Cell | All ASM Journals |
|---|