Previous Article | Next Article ![]()
Applied and Environmental Microbiology, January 2006, p. 135-143, Vol. 72, No. 1
0099-2240/06/$08.00+0 doi:10.1128/AEM.72.1.135-143.2006
Copyright © 2006, American Society for Microbiology. All Rights Reserved.
Department of Infection Immunity and Inflammation, University of Leicester, University Road, Leicester LE1 9HN, United Kingdom,1 Department of Biotechnology, University of the Western Cape, Bellville 7535, Cape Town, South Africa,2 Genencor International B.V., Archimedesweg 30, 2333 CN Leiden, The Netherlands,3 State Key Laboratory of Microbial Resource, Institute of Microbiology, Chinese Academy of Sciences, 100080 Beijing, China,4 Department of Microbiology and Parasitology, Faculty of Pharmacy, University of Sevilla, 41012 Sevilla, Spain5
Received 11 August 2005/ Accepted 30 September 2005
|
|
|---|
|
|
|---|
In the context of gene discovery, access to the entire metagenome of these as yet uncultured organisms would source a completely new gene pool, which could provide novel enzymes and proteins of potential industrial or medical use. It is now common practice to isolate DNA directly from environmental samples and construct DNA libraries to access the metagenome (12). Cloning into expression vectors allows the isolation of novel enzymes and other biologically active proteins and peptides (the proteome) without prior cultivation of the organisms from which they are derived (8, 14, 18, 25). This approach is very suitable for samples containing prokaryotic DNA where, in general, genes contain few introns, allowing direct transcription and translation of the metagenome in a host organism such as Escherichia coli.
However, in the case of eukaryotic organisms, such a direct route from genome to proteome is generally not possible because most genes contain a number of large introns. Consequently, libraries for the screening and expression of proteins from eukaryotes are generally derived from the transcriptome, particularly mRNA, by reverse transcription to form double-stranded DNA and cloning into a suitable vector (26). Such libraries are now available commercially for a wide diversity of organisms, particularly for species that are important in agriculture, commerce, medicine, or the laboratory.
Libraries derived from environmental RNA have rarely been made. This is much more challenging, not least because of RNA instability. Methods have been described for isolating RNA from soils and sediments, most of which appear to have been derived from prokaryotes (16). Small
150-bp PCR products corresponding to bacterial mRNA species could be identified in this material. Very recently, cDNA libraries have been made from environmental prokaryotic RNA (23). Clones up to 1 kb were identified, but most were in the 200- to 500-bp range. In another study, cDNA libraries were made from environmental RNA fractionated to contain prokaryotic 16S-sized rRNA; seven of these clones appeared to be mRNA related, including two fungal sequences (7).
Our specific objective in this work was to identify novel protein-encoding open reading frames (ORFs) from eukaryotic microorganisms without their prior cultivation or identification. This requires the development of metagenomic cDNA technology: the ability to isolate full-length mRNAs, reverse transcribe them, and clone the cDNA to make libraries for sequence and expression studies. As far as we are aware, no attempts to specifically target eukaryotic mRNA from environmental samples have been previously described. We report here procedures suitable for stabilizing RNA in environmental samples in the field such that they can be transported back to the laboratory, the RNA isolated, and cDNA libraries made for subsequent sequencing and expression studies. The ability to stabilize RNA in environmental samples for subsequent purification and analysis in the laboratory may also have other uses, such as measuring changes in the environmental transcriptome of both prokaryotes and eukaryotes with time and in response to external change.
|
|
|---|
RNA extraction.
Total RNA was extracted from about 3 g (wet weight) of the samples using the QIAGEN RNeasy mini kit. For stored material, RNAlater was removed after centrifugation, and the sample was resuspended in the lysis buffer provided in the kit. In the case of the Acanthamoeba material, the amoebae were disrupted in the lysis buffer by repeated pipetting, and homogenization was ensured by passing the lysate down a QIAshredder column (QIAGEN) by following the manufacturer's protocol. The rest of the method followed the RNeasy protocol. The extraction from E. coli was preceded by lysozyme treatment (TE buffer, pH 8.0, with 400 µg/ml lysozyme) as detailed in the QIAGEN RNeasy protocol. For the algal mat, yeast, and activated sludge materials, a similar protocol was followed, but instead of using a QIAshredder column, disruption and homogenization were carried out by bead beating with the FastPrep apparatus (BIO 101, Qbiogene) using tubes containing lysing matrix E and a setting of 5.5 for 30 s. After centrifugation to pellet the debris, the rest of the method followed the RNeasy protocol. The eluted RNA was stored at 20°C after addition of the RNase inhibitor SUPERase-In (Ambion).
Purification of mRNA from total RNA.
For the activated sludge material, poly(A)-tailed mRNA was extracted from about 300 µg of total RNA using the poly(A) Purist MAG kit (Ambion). The method involves binding of the poly(A) tails to oligo(dT) magnetic beads, capture of the beads magnetically, and elution of the poly(A) RNA. The RNA was then ethanol precipitated and resuspended in a small volume (15 µl) of RNA storage solution supplied with the kit.
Library construction.
The Smart cDNA library construction kit (BD Biosciences, Clontech) was used to create Lambda cDNA libraries in the phagemid Lambda TriplEx2 vector. The method uses primers for reverse transcription (RT) that should optimize the production of full-length transcripts. First-strand DNA synthesis is based on a dT-rich oligonucleotide hybridizing to poly(A) RNA sequences such as at the 3' end of mRNAs. Second-strand synthesis only occurs if the reverse transcriptase reaches the 5' end of the RNA and adds additional non-template-encoded C residues. It also results in directional cloning of the cDNA and can generate polypeptides from all three reading frames in a single recombinant Lambda TriplEx2 clone. Plasmid can be excised from the lambda phagemid. The starting material for cloning can be total RNA or purified mRNA.
The cDNA library construction was initially carried out from total RNA extracts rather than from mRNA, since the amount of total RNA from most environmental samples is likely to be small. However, since abundant quantities of activated sludge were available for comparison, a library was also constructed from the mRNA isolated from this sample. The RT step was carried out using PowerScript reverse transcriptase with the SMART IV primer (5'-AAGCAGTGGTATCAACGCAGAGTGGCCATTACGGCCGGG-3') and the lock-docking oligo(dT) primer CDS III/3' [5'-ATTCTAGAGGCCGAGGCGGCCGACATG-d(T)30 (AGC)N-3'] provided in the kit. This should result in full-length single-stranded cDNA containing a sequence complementary to the SMART IV Oligo, which then serves as a universal priming site for subsequent amplification by long-distance PCR (LD-PCR). This was carried out with the Advantage 2 PCR kit and the 5' PCR primer (5'-AAGCAGTGGTATCAACGCAGAGT-3') with the CDS III/3' PCR primer used for the RT step to produce double-stranded cDNA. After digestion with SfiI to produce suitable restriction sites for cloning, the construction of the library essentially followed the manufacturer's protocol, except that size fractionation of the double-stranded cDNA (pooled from 4 to 6 PCRs) was carried out after agarose gel electrophoresis. Material of approximately 500 to 5,000 bp in size was extracted using the QIAGEN QIAEX II gel extraction kit. The cDNA was ethanol precipitated, ligated into the Lambda TriplEx2 vector, and packaged with the Stratagene Gigapack III Gold packaging extract, and titers were determined using E. coli XL1-Blue as the host. Blue/white screening of plaques was carried out to assess the percentage of clones with inserts. After amplification, the completed cDNA libraries were stored in 7% dimethyl sulfoxide at 80°C.
Characterization of cDNA inserts.
The inserts in a number of clones from each library were sequenced after PCR amplification from individual plaques excised into nanopure water using vector-encoded amplification primers (5'-CTCGGGAAGCGCGCCATTGTGTTGG and 3'-ATACGACTCACTATAGGGC). This was carried out using Taq polymerase (Abgene), with an initial denaturation for 2 min at 94°C, followed by 30 cycles with parameters of 94°C for 30 s and 68°C for 3 min, as indicated by the BDClontech protocol. The products were cleaned with the QIAquick PCR purification kit (QIAGEN), and sequenced using the 5' PCR primer listed above by Lark Technologies, Takely, United Kingdom. Some clone sequences were completed by primer walking.
PCR and characterization of 18S rRNA genes.
DNA was extracted from environmental samples which had been stored at 20°C using the GenomicPrep cells and tissue DNA isolation kit (Amersham Pharmacia). In the case of Acanthamoeba polyphaga, the kit was used to extract DNA from pelleted freshly grown cells to provide a positive control. PCR amplification of approximately the first 520 bp of the 18S rRNA gene was carried out with forward primer 5'-CCG AAT TCG TCG ACA ACC TGG TTG ATC CTG CCA GT-3' and reverse primer 516R 5'-ACC AGA CTT GCC CTC C-3'. The program consisted of 2 min of denaturation at 95°C and 30 cycles of 95°C for 30 s, 55°C for 40 s, and 72°C for 2 min, with a final 10-min extension at 72°C. The cleaned PCR products were ligated into the pGEM-T Easy vector and transformed into JM109 high-efficiency competent cells (Promega), with blue/white screening. Colony PCR was performed using M13 primers to amplify the inserts in a number of clones. In the case of the activated sludge clones, the PCR products in individual clones were screened using restriction digestion with HaeIII and clones with different restriction fragment length polymorphism patterns were selected for sequencing using the forward 18S primer.
Clone sequence analysis.
Nucleotide sequences were analyzed using the ORF finder and BLAST (2) facilities at NCBI (http://www.ncbi.nlm.nih.gov/) during May 2005. Complete nucleotide sequences were compared using BLASTN and BLASTX. ORFs identified using ORF finder were compared using BLASTP.
Nucleotide sequence accession number.
The sequences of the clones from the cDNA libraries described here have been deposited in the EMBL database. Those from the A. polyphaga library have accession numbers AJ876805 to AJ876809, those from the TC2 algal mat library have accession numbers AJ879837 to AJ879844, and those from the LP4 algal mat library have accession numbers AJ879808 to AJ879836. Sequences from the activated sludge library made with total RNA have accession numbers AJ879846 to AJ879868, and those from the activated sludge library made from poly(A)-enriched RNA have accession numbers AJ879869 to AJ879891.
|
|
|---|
![]() View larger version (100K): [in a new window] |
FIG. 1. Typical results of total RNA extraction from named species and environmental samples electrophoresed on a 1.2% denaturing formaldehyde agarose gel. Lane 1, RNA size markers (Ambion Millennium markers); lane 2, Escherichia coli; lane 3, Saccharomyces cerevisiae; lane 4, Acanthamoeba polyphaga; lanes 5 and 6, activated sludge.
|
![]() View larger version (105K): [in a new window] |
FIG. 2. Effect of storage in RNAlater on the stability of total RNA extracted from Acanthamoeba polyphaga. (A) TBE 1.2% agarose gel. Lane 1, RNA size markers (Millennium markers; Ambion); lanes 2 and 3, total RNA from a fresh culture of Acanthamoeba polyphaga. (B) Same as panel A, but lane 2, total RNA from A. polyphaga stored in RNAlater for 10 days at ambient temperature; lane 3, total RNA from A. polyphaga stored for 10 days at 4°C.
|
![]() View larger version (65K): [in a new window] |
FIG. 3. Results of reverse transcription and LD-PCR on RNA from different sources to produce cDNAs run out on TBE 1.2% agarose gels. (A) Lane 1, 1-kb DNA ladder (Invitrogen); lane 2, negative control; lane 3, product from the human placental mRNA provided as a control in the Smart cDNA Library construction kit. (B) Lane 1, 1-kb ladder; lanes 2 to 5, product from total RNA extracted from Acanthamoeba polyphaga. (C) Same as panel B, showing product from the LP4 Chinese algal mat. (D) Same as panel B, showing product from activated sludge total RNA.
|
Sequences of inserts in Lambda clones.
The sequences of inserts in individual clones from each of the libraries were determined after PCR amplification from isolated plaques. Most ranged in size from about 500 bp to 2 kb. Larger clones were not found, probably because small DNA fragments tend to be preferentially cloned into the vector. Sequencing was carried out using the 5' forward primer, resulting in up to 800 bases of sequence, depending on the size of the insert. Sequencing from the 3' end of the insert was unsuccessful, probably because of the poly(A) tails. Clone sequences were therefore completed by primer walking. All were found to have sequence at the 5' end corresponding to part (shown in boldface type) of the 3' end of the SMART IV primer (5'-AAGCAGTGGTATCAACGCAGAGTGGCCATTACGGCCGGG-3') used for first-strand cDNA synthesis. This indicates that cloning into the vector was indeed directional. It also indicates that the cloned products contain a sequence incorporated during the synthesis of cDNA, i.e., that we are cloning RNA sequences. This latter point is important and is confirmed by the absence of a PCR product if the reverse transcription step of the protocol was omitted, as described above. poly(A) tails at least 30 bases in length were evident for all fully sequenced clones, corresponding to the CDS III/3' PCR primer.
Sequence analysis.
If mRNA sequences had been cloned, the expectation would be that they would have an ORF over most of the insert length in a forward reading frame from the 5' end. In subsequent analysis, therefore, only ORFs in the 5' to 3' direction were considered. Where the complete insert up to the poly(A) region has been sequenced, a stop codon would also be expected.
The prediction of ORFs in sequences derived from the metagenome is complicated by a significant variation in codon usage in different organisms, both for translational initiation and termination (detailed for many species in the Codon Usage database tabulated) from GenBank (http://www.kazusa.or.jp/codon) (22), leading to a choice of 22 genetic codes. Initiation is most efficient from AUG, but in rare cases, other codons are utilized, e.g., in the yeast Candida albicans, molds, protozoans, coelenterate mitochondria, and mycoplasmas (1, 6, 29; NCBI taxonomy browser). Similarly, differences in the usage of termination codons have been observed in ciliated protozoa (13), and in some ciliates, stop codons are reassigned to sense codons (19). Clearly, the appropriate code can be selected when sequencing clones from a known organism, but choice is more problematic for environmental isolates. Accordingly, we used the standard genetic code in the sequence analysis of the clones in this study reported in Tables 1, 2, and 3 and in Table S2 in the supplemental material. Different results were obtained using genetic code 6 (ciliate, dasycladacean, and Hexamita code), where TAA and TAG stop codons are suppressed and read as glutamine. These results are included in Tables 2 and 3, see Discussion for more details. E values in Tables 2 and 3 and Table S2 in the supplemental material are derived from BLASTP unless referred to as BLASTX.
|
View this table: [in a new window] |
TABLE 1. Results of sequencing clones
|
|
View this table: [in a new window] |
TABLE 2. Activated sludge total RNA cDNA clones with possible matches to database proteinsa
|
|
View this table: [in a new window] |
TABLE 3. Activated sludge mRNA cDNA clones with possible matches to database proteinsa
|
(ii) Algal mat libraries.
Thirty-four LP4 and 19 TC2 cDNA clones were sequenced and analyzed using ORF finder, BLASTN, and BLASTX searches (Table 1). For the LP4 library, 17 clones of various lengths had high similarity to prokaryote rRNA gene sequences in the databases. Their identities are reported in the supplemental material. Only 6 clones from LP4 appeared to be related to protein sequences in the databases. The matches to proteins in the databases are generally quite high (e30 to e102). However, in 4 of 6 clones, no stop codon is evident at the end of the ORF, which reaches the end of the cloned sequence. In addition, these ORFs are short compared to the proteins they most closely match. The results are reported in Table S2 in the supplemental material. The remaining 11 clones sequenced from the LP4 library contained sequences with low relatedness to both nucleotide and protein database sequences. In most cases, ORFs in a forward reading frame were short, 35 to 80 amino acids, with only low similarity to known or hypothetical proteins. BLASTX searches similarly gave low matches over short stretches of the inserts. These are designated as not identified in Table 1.
For the TC2 library, most of the cloned inserts (15/19) gave the highest identity in BLASTN searches to rRNA gene sequences. Their identities are reported in the supplemental material. The remaining four sequences in the TC2 library have been designated as not identified in Table 1. ORFs in the forward direction were again short, with low matches to proteins in the databases.
(iii) Activated sludge libraries.
The inserts in 24 clones from the activated sludge total RNA library and 23 from the one constructed from poly(A)-enriched RNA were sequenced. As summarized in Table 1, the majority of inserts in both libraries (13 and 16, respectively) showed significant similarity to proteins in the databases. Both gave 6 inserts which had only short ORFs and low matches in BLASTN or BLASTX searches, designated as not identified. Five sequences from the total RNA library and only one from the poly(A)-enriched RNA library were related to rRNA sequences, all most closely matched eukaryotic large subunit 26-28S rRNA genes, mostly from alveolates.
The results of the searches for clones having matches to database proteins are shown in Table 2 (total RNA) and Table 3 [poly(A) RNA library]. In most cases, the sequence of the complete insert was obtained, using primer walking for the larger clones. The first criterion for inclusion of a sequence in Tables 2 and 3 is that an ORF in the forward direction has been found which extends for most of the length of the insert. In some cases, the match to proteins in the databases is quite high (e.g., ASt-7, ASt-33, ASm-59, and ASm-61), while others are lower (e.g., ASt-30, ASt-49, ASm-4, and ASm-60). Lower matches might be expected given the diversity of organisms likely to be in the sample and the fact that, for many of the organisms present in the algal mat, there may be no sequences in the databases.
For some sequences, the length of the ORF does seem to correlate with the length of the protein it most closely matches (e.g., ASt-61, ASt-65, ASm-13, ASm-32, ASm-61, and ASm-63), suggesting that we have identified complete reading frames, but for others, this is not the case. Where the similarities are low, this might be expected, but it might also suggest that incomplete cDNA products have been cloned.
Also included as possible proteins are sequences for which only short forward ORFs appear (sometimes two or three in the same reading frame). However, these have reasonable matches to known proteins, sometimes having conserved domains, and the BLASTX results show that the match is in fact over most of the length of the insert. The stop codons curtailing these ORFs could be genuine or introduced by errors in reverse transcription or during PCR. Examples in Tables 2 and 3 are ASt-19, ASt-28, ASt-73, ASm10, ASm25, and ASm46. For some of these inserts (e.g., ASt-19 and ASm-10), the BLASTX analysis does identify a protein of comparable length to the closest match, even though the corresponding ORF(s) seem prematurely terminated. An alternative explanation, at least for some of these sequences, e.g., ASt-73, ASt-74 and Asm-10, is that the termination signal is suppressed and a longer protein is made (Tables 2 and 3; see also Discussion).
Overall, analysis of the longest ORFs in clones from the environmental cDNA libraries also shows qualitative differences between the activated sludge and the Chinese algal mat libraries. In the case of TC2, only 3 sequences have the longest putative ORF in the forward direction, whereas 16 had the longest in the reverse orientation. For LP4, only 8 were forward and 26 were reverse. The activated sludge libraries both had many more clones with the longest ORF in the forward direction, 18 for the library made from total RNA and 17 for the mRNA-derived library. Both had only 6 sequenced clones where the longest ORF was in a reverse reading frame. This supports the conclusion that more of the inserts in the activated sludge libraries were directionally cloned and derived from mRNA.
18S rRNA gene amplification.
The cDNA product yield obtained following reverse transcription (Fig. 3) and the library clone sequencing (Tables 1, 2, 3; see Table S2 in the supplemental material) suggested that there were low levels of eukaryotic mRNA in the Chinese algal mat samples. Accordingly, 18S rRNA gene PCR amplification of the DNA samples was carried out to independently establish the presence or absence of eukaryotic material in the samples. The resulting PCR products are shown in Fig. 4; products of about 550 bp in size were expected. The Acanthamoeba material served as a control, yielding a single major band of the expected size (panel A, lane 3). A similarly sized PCR product was observed using activated sludge DNA as a template (lane 4). However, both LP4 and TC2 gave very weak signals, with multiple bands and little material of the expected size (panel B, lanes, 4, 5, 6). Amplification products from the activated sludge LP4 and TC2 samples were cloned and sequencing was carried out. Sequencing confirmed the presence of 18S gene products in the activated sludge libraries and LP4; no 18S sequences were identified in the TC2 library. The sequencing results are presented in the supplemental material.
![]() View larger version (92K): [in a new window] |
FIG. 4. 18S rRNA gene amplification from algal mat samples and activated sludge. (A) Lane 1, 1-kb ladder; lane 2, negative control; lane 3, Acanthamoeba polyphaga positive control; lane 4, product from activated sludge. (B) Lane 1, 1-kb ladder; lane 2, negative control; lane 3, Acanthamoeba polyphaga positive control; lane 4, algal mat LP4; lanes 5 and 6, product from two different DNA extracts from algal mat TC2.
|
|
|
|---|
Libraries produced from the algal mat samples also contained a large proportion of clones (17/34 in LP4 and 15/19 in TC2) (Table 1) matching prokaryotic rRNA sequences. These matches were mainly to cyanobacterial 23S rRNA sequences (see the supplemental material). Their abundance could simply be the result of mispriming during reverse transcription because of the large amounts of prokaryotic rRNA evidently present in the samples. Alternatively, polyadenylation of the prokaryotic rRNAs in these algal mats could account for the large percentage of such sequences in these libraries. Polyadenylation has also been shown to be involved in the degradation of ribosomal RNAs in prokaryotes, and in E. coli, the 23S rRNA is the major polyadenylated RNA (20).
RNA isolated from activated sludge contained substantial amounts of both prokaryotic and eukaryotic RNA, evident from the presence of 16S/18S and 23S/28S rRNA gene doublets (Fig. 1, lanes 5 and 6). Libraries made from this RNA were qualitatively different than the libraries made from the algal mat RNA described above (Table 1). There were many more clones matching possible protein sequences and many fewer matching rRNA sequences. This difference was even more pronounced with the library made from poly(A)-enriched RNA compared to total RNA (Table 1). The database matches were also mostly against eukaryotic sequences. Both the total RNA and the poly(A)-enriched RNA libraries had the same proportion of clones designated as not identified (6/24 and 6/23, respectively). If these clones were not derived from mRNA, we would have expected the proportion to decline in the library made from poly(A)-enriched RNA. As they did not, it seems likely that these sequences are in fact derived from mRNAs with no closely matching protein sequences present in the databases.
For most of the activated sludge clones detailed in Tables 2 and 3, complete sequences were obtained that reached the poly(A) tail, except for ASt-57 and ASt-60. In most cases, the apparent protein-encoding ORFs have a stop codon (usually TAA) when using the standard genetic code for the search. For some clones, e.g., ASm-21 and ASt-60, the encoded ORF protein matches part of a larger protein in the database. These could be domain matches. Other clone sequences appear to match database proteins closely in size and are probably full length. For example, ASm-32 has an ORF of 95 amino acids, with 36% identity to a cystatin ORF of 98 amino acids from the dust mite Lepidoglyphus destructor. In ASm-61, an ORF of 153 amino acids has 63% identity to the 40S ribosomal protein S16 (145 amino acids) of Gossypium hirsutum. Similarly, a 137-amino-acid ORF in ASm-63 has 49% identity to ribosomal protein L32 (134 amino acids) of Branchiostoma belcheri. An example of a larger full-length ORF would be ASt-33 at 315 amino acids, corresponding to a protein of 316 amino acids from Neurospora.
The presence of clone sequences with good BLASTX matches to known proteins but having only short corresponding ORFs or two or three short ORFs with high identity to the same protein (Tables 2 and 3) may be due to the introduction of erroneous stop codons during reverse transcription or the subsequent amplification steps. Alternatively, it could be due to suppression of the apparent stop codon in the unknown organism from which the sequence was derived. Some organisms reassign stop codons to code for particular amino acids. In the ciliate, dasycladacean, and Hexamita nuclear code, both TAA and TAG code for glutamine instead of chain termination (code 6; NCBI taxonomy browser). Since most of these ORFs end with a TAA codon, using code 6 results in a longer single ORF. Using the standard code, ASm-10, for example, has three short ORFs in the same reading frame, all with matches to 14-3-3 proteins. Translation using code 6 gives an ORF of 244 amino acids with a high identity (E = 9e-85) to a 14-3-3 protein of 244 amino acids from Tetrahymena pyriformis. Similar results using code 6 are obtained for other clones, e.g., ASt-73, which instead of an ORF of only 37 amino acids then has an ORF of 270 amino acids, with a match to serine-type carboxypeptidases. With ASt-74, a standard code 44-amino-acid ORF lengthens to 233 amino acids with code 6. Although caution is needed in interpreting these results, in these cases, because of the continuity of the ORF match to database proteins, it does seem likely that the TAA codon is not being used for chain termination when these sequences are translated in the organisms from which they derived.
Longer ORFs are also found, as would be expected, for most of the sequenced clones when translated with code 6. ASt-2 (accession no. AJ879863), for example, which with standard code has no ORF in a forward reading frame and is assigned to the not identified category, with code 6 has an ORF of 283 amino acids with an E value of 3e-12 to an SAP DNA-binding domain-containing protein from Dictyostelium discoideum (see Pfam accession number PF02037). Other clones in this group, e.g., ASt-38 (accession no. AJ879865), when translated using code 6 (presumably inappropriately) still show only short ORFs, the longest being 92 amino acids, with low similarity to database proteins.
Another observation on the correct length off ORFs as predicted by the NCBI facility can be illustrated using clone ASt-60 (Table 2). This putative ORF of 188 amino acids starts with a methionine residue and runs from base 164 to 730 of the 776-bp sequence. However, the BLASTX result shows similarity of the translated sequence to the closest matching protein starting from base 23 of the clone. Of the 47 amino acids added to the N terminus by this analysis, 45% are identical and 60% are positively related. This level of similarity compares well with the overall value of BLASTP for the ORF itself of 32% identity, with 61% positive over the 188-amino-acid length. It would seem possible in this case that the true start codon is upstream of the methionine given by ORF finder. A similar situation is found for ASt-33, where ORF finder places a leucine as the start codon at nucleotide 126 of the insert. However, the BLASTX results show putative alignment of the translated sequence to the highest match protein from nucleotide 33. This could be the result of alternative initiation codons being used, as discussed previously.
This study has shown that RNA from environmentally complicated and diverse samples can be stabilized under field conditions for subsequent laboratory analysis. cDNA libraries containing both prokaryotic and eukaryotic sequences can be made. Preliminary screening for 18S rRNA genes, as shown in Fig. 4, would help to determine whether a sample is likely to yield a library containing eukaryotic mRNA sequences. In the case of the activated sludge, good results were obtained even for the library made from total RNA. The methods we describe clone cDNAs directionally in a vector capable of expressing all three reading frames of an ORF. Our libraries can be expression screened for enzyme activity. A screen of 50,000 clones for esterase activity using methods previously described by us (24) was not successful. Improvements in our methodology are undoubtedly possible. Our RNA extraction method (QIAGEN RNeasy mini kit) is reported in its promotional literature to be effective on thick-walled structures such as bacterial spores and yeast, but we have made no attempt to compare different extraction techniques on model eukaryotes. The second strand of cDNA synthesis should only occur if the 5' end of the mRNA is copied, but other protocols may be better. A proofreading Taq enzyme would minimize possible mutational errors during PCR amplification of the cDNA. Even so, this work is a starting point for eukaryotic cDNA metagenomics. It is possible with reasonable efficiency to identify eukaryotic ORFs likely to code for full-length proteins. Comparing levels of environmental RNAs in response to time or changing conditions may also find uses in microbial ecology and physiology.
Supplemental material for this article may be found at http://aem.asm.org/. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»