Previous Article | Next Article ![]()
Applied and Environmental Microbiology, October 2008, p. 5975-5985, Vol. 74, No. 19
0099-2240/08/$08.00+0 doi:10.1128/AEM.01275-08
Copyright © 2008, American Society for Microbiology. All Rights Reserved.
,
Biological Resource Center, KRIBB, Daejeon 305-806, Korea,1 Environmental Research Department, Research Institute of Industrial Science and Technology, Gwangyang 545-090, Korea,2 Department of Life Science, Chung-Ang University, Seoul 156-756, Korea,3 Environmental Biotechnology National Core Research Center, Gyeongsang National University, Jinju 660-701, Korea4
Received 9 June 2008/ Accepted 8 August 2008
|
|
|---|
29 polymerase and random hexamers, to amplify viral DNA and construct clone libraries for metagenome sequencing. By the MDA method, the diversity of both single-stranded DNA (ssDNA) viruses and double-stranded DNA viruses could be investigated at the same time. On the contrary, by eliminating the denaturing step in the MDA reaction, only ssDNA viral diversity could be explored selectively. Irrespective of the denaturing step, more than 60% of the soil metagenome sequences did not show significant hits (E-value criterion, 0.001) with previously reported viral sequences. Those hits that were considered to be significant were also distantly related to known ssDNA viruses (average amino acid similarity, approximately 34%). Phylogenetic analysis showed that replication-related proteins (which were the most frequently detected proteins) related to those of ssDNA viruses obtained from the metagenomic sequences were diverse and novel. Putative circular genome components of ssDNA viruses that are unrelated to known viruses were assembled from the metagenomic sequences. In conclusion, ssDNA viral diversity in soil is more complex than previously thought. Soil is therefore a rich pool of previously unknown ssDNA viruses. |
|
|---|
Molecular analysis is essential to investigate the diversity of viral assemblages because the majority of viruses are uncultured due to a lack of suitable hosts, such as bacteria. Indeed, the cultivation of viruses that infect eukaryotes is not easy. A PCR-based approach is also not appropriate because there are no universally conserved genes or markers for viruses like the 16S rRNA gene for bacteria (17). Whole viral assemblage genome sequencing (viral metagenomics) recently overcame these limitations and became a promising method by which to investigate uncultured viral diversity (9, 10). By this approach, viruses are purified and concentrated by sequential filtrations and ultracentrifugation and whole viral genomes are extracted, amplified, and sequenced by shotgun cloning or pyrosequencing (3, 17). The advantages of sequence-independent amplification and metagenome sequencing for characterizing novel viruses are that they are simple, fast, and without bias toward any particular viral group (16). The diversity of DNA viral assemblages has been analyzed by viral metagenomics in near-shore seawater (10), marine sediment (8), human feces (9), equine feces (11), and recently soils (18).
In most previous viral metagenomic studies, linker-amplified shotgun library (LASL) techniques were used to amplify small amounts of viral DNA via PCR amplification of linker-ligated short sheared viral DNA (http://www.sci.sdsu.edu/PHAGE/LASL/index.html). Since the linkers can only be ligated to double-stranded DNA (dsDNA) and not to single-stranded DNA (ssDNA), the diversity of ssDNA viruses could not be revealed by the LASL technique (17). Recently, Angly et al. investigated marine DNA viral metagenomes from four oceanic regions by multiple-displacement amplification (MDA) and pyrosequencing (3). They could acquire metagenomic sequences from both dsDNA and ssDNA viruses by using MDA as an amplification method (3). MDA is the most widely used whole-genome amplification (WGA) method. This technique uses
29 DNA polymerase and random hexamers (14, 28), and subnanogram quantities to 100 ng of DNA can be amplified up to
80 µg with relatively minimal amplification bias compared to the other WGA methods available to date (28). In contrast to PCR, MDA is an isothermal amplification method which requires the template to be denatured with heat or chemicals prior to amplification (15). If the denaturing step is excluded, dsDNA has less of a chance to bind to random hexamers, so that ssDNA is preferably amplified. Additionally,
29 polymerase amplifies short circular DNA more efficiently than linear DNA via rolling-circle replication (15, 31). Given that most of ssDNA viruses have circular genome components, we hypothesized that ssDNA viral diversity could be selectively examined by MDA without the denaturing step. In this study, we sequenced libraries that were constructed from viral DNA amplified by MDA and found that ssDNA viral diversity could be selectively investigated by MDA without the denaturing step. This technique allowed us to demonstrate that rice paddy soil contains a diverse population of uncultured ssDNA viruses. As far as we know, this is the first report to selectively reveal the diversity of ssDNA viruses in an environment.
|
|
|---|
Concentration of viruses in rice paddy soil.
Viruses were extracted from soil samples with potassium citrate buffer according to previously described methods (50). Wet soil (500 g) was mixed in 1 liter of 1% potassium citrate buffer (10 g potassium citrate, 1.92 g Na2HPO4 · 12H2O, and 0.24 g KH2PO4 per liter, pH 7). Viruses were extracted from the soil particles by sonication (three times for 1 min each time at 300 W) plus 30 s of manual shaking. The suspension was centrifuged at 7,000 rpm for 10 min. The supernatant was transferred to fresh bottles and centrifuged at 7,000 rpm for 15 min before being sequentially filtered through a 0.45-µm filter and then through a 0.22-µm filter. The filtrate was concentrated with a 100-kDa polyethersulfone tangential-flow filter cartridge (Pellicon XL Filter; Millipore, Molsheim, France) equipped with the Labscale tangential-flow filter system (Millipore, Molsheim, France). The supernatant was concentrated from 850 to 60 ml. The concentrated viral suspension was filtered with a 0.22-µm syringe filter three times. The filtrate was treated with DNase I (final concentration, 20 U/ml) at 37°C for 30 min. An aliquot not treated with DNase I was used as a negative control. Viruses were stained with Sybr gold (Molecular Probes, Inc., Eugene, OR) for quantitative analysis by epifluorescence microscopy (EFM) as previously reported (37). Seven images taken by EFM were used to enumerate the viruses.
DNA extraction and WGA.
DNA was extracted with proteinase K and phenol-chloroform/isoamyl alcohol from 10 ml of the concentrated viral suspension as described previously (41). Viral DNA was amplified with the Genomiphi kit (GE Healthcare, Piscataway, NJ) according to the manufacturer's instructions. Briefly, 1 µl DNA (6.5 ng/µl) was used in each 40-µl reaction volume. One microliter of viral DNA was mixed with 19 µl of sample buffer. In order to compare the effect of DNA denaturation by heating, one sample was heated at 95°C for 3 min (designated RH) and the other sample was placed in ice for 3 min without heating (designated RX) before 18 µl reaction buffer and 2 µl
29 DNA polymerase were added. More than 10 µg of DNA was acquired after amplification at 30°C for 1.5 h. DNA was ethanol precipitated and digested with 5 U/µl S1 nuclease in 1X buffer (Takara, Tokyo, Japan) at 30°C for 1 h. Three aliquots of DNA were amplified by MDA, digested, mixed, and used for cloning. PCR was performed with 8F and 1492R bacterial universal primers with positive and negative controls to check for bacterial DNA contamination.
Cloning and sequencing.
Cloning was performed with 6.25 µg of RH DNA and 8.3 µg of RX DNA. DNA was sheared with a HydroGene machine (speed code 3) plus sonication for 60 s. The size distribution of sheared DNAs was viewed by agarose gel electrophoresis. DNA was ethanol precipitated, phosphorylated, and blunted by blunt kinase of the BKL reagent set (Takara, Tokyo, Japan). DNAs (381 ng of RH DNA and 929 ng of RX DNA) were incubated with pUC118 vector (6 ng) and ligase mixture at 16°C for 3.5 h. Ligated DNA was extracted with phenol and transfected into DH5
competent cells by electroporation. After incubation for 1 h in SOC medium (41), an equal volume of 30% glycerol was added and cultures were stored at –80°C. The insert was amplified by colony PCR after white-blue screening. Amplicons were sequenced with a forward primer, and a total of 396 and 389 sequences were obtained for RH and RX samples, respectively.
Analysis of viral metagenome sequences.
Three different database searches were performed in order to analyze the clone sequences, i.e., TBLASTX and BLASTX analyses against the GenBank database and a TBLASTX analysis against the Phage Sequence Databank (http://scums.sdsu.edu/phage/). We downloaded 510 complete phage and prophage genome sequences from the Phage Sequence Databank and analyzed them by TBLASTX comparison with the standalone BLAST program (2). Sequences containing any hits with an E value of <0.001 in at least one search were considered to be "known" sequences. In order to classify the known sequences into biological and taxonomic groups, we compared all of the results from three searches. Sequences were considered to be viral hits if there were any virus-positive hits within three to five of the best hits from the TBLASTX and BLASTX analyses of the GenBank database. When different searches resulted in conflicting classifications, the results of the TBLAST analysis against the Phage Sequence Databank had priority. Several bacterial hits in the TBLASTX search of GenBank were considered to be viral hits according to the results of the TBLASTX search of the Phage Sequence Databank. The categories of the proteins were determined on the basis of the BLASTX results.
Contig assembly.
Sequence assembly was performed with the SeqMan program (DNAStar, Madison, WI) with a minimum stringency of 98% identity on a sequence with a minimum overlap of 20 bp. This parameter was previously determined during the construction and assembly of the in silico shotgun library to discriminate between even closely related phage genomes (10).
Analysis of replication-related genes.
Partial and full open reading frames (ORFs) containing putative replication-related genes were extracted from the metagenomic sequences that possessed significant hits with replication-related genes with the ORF finder in NCBI (http://www.ncbi.nlm.nih.gov/gorf/gorf.html). Conserved domains were analyzed from these ORFs by the NCBI Conserved Domain Search (33). The ORFs containing Viral_Rep and Gemini_AL1 domains were aligned with reference sequences collected from the protein family (Pfam) database (19). Alignment and tree reconstruction were performed with the Cn3D and CDTree programs from the NCBI Conserved Domain Database (33). The block alignment algorithm of the Cn3D program was used for the alignment of the conserved domains. The parameters for reconstructing the tree with the CDTree program were as follows: alignment usage, normal alignment only; clustering method, neighbor joining; distance matrix, score of aligned residue; scoring matrix, BLOSUM62.
Real-time PCR of frequently detected sequences from unamplified and amplified DNAs.
Two primer sets were designed to target contigs C005 and C112, which represent the most frequently detected contigs of 10 and 16 sequences, respectively. The sizes of the C005 and C112 amplicons were 438 and 380 bp, respectively (Fig. 1A). PCR products from viral DNA were purified and used to make a standard curve. The PCR and real-time PCR conditions were 5 min at 94°C; 33 cycles of 30 s at 94°C, 30 s at 55°C, and 30 s at 72°C; and a 10-min final extension at 72°C. Real-time PCR was performed with the DyNAmo HS Sybr green quantitative PCR kit (FINNZYMES, Seoul, Korea) and the Opticon 2 thermal cycler (MJ Research). Unamplified and MDA-amplified viral DNAs were used as templates.
![]() View larger version (30K): [in a new window] |
FIG. 1. (A) Results of real-time PCR with specific primers. (B) Real-time PCR of C005 (left) and C112 (right). (A) Specific primers for C005 (lanes 1 and 2) and C112 (lanes 3 and 4) produced 438- and 380-bp amplicons, respectively. Viral DNA extracted from concentrated virus preparations (lanes 1 and 3) and DNA amplified by MDA without a denaturing step (lanes 2 and 4) were used as templates. Lane L, 100-bp ladder. (B) Circle a, DNA after MDA without the denaturing step (used for the RX library); circle b, DNA after MDA with the denaturing step (RH library); circle c, DNA before MDA. Similar concentrations of DNA (1.4, 1.3, and 1.0 ng/µl for circles a, b, and c, respectively) were used as templates. Nonlinear points indicated by red filled circles in the left-hand standard curve were not included in the standard curve. R2 values are 0.982 (left) and 0.988 (right).
|
![]() View larger version (38K): [in a new window] |
FIG. 2. (A) Preferential amplification of circular DNA during MDA without the denaturing step. (a) Plasmids left uncut (circular DNA) and cut with the Eam1105I restriction enzyme (linear DNA). Lane 1, S11 and uncut DNA; lane 2, S11 and cut DNA; lane 3, S36 and uncut DNA; lane 4, S36 and cut DNA; lane L1, 1,000-bp ladder. (b) PCR products amplified from plasmids cut with Eam1105I. Specific primers for S11 (lane 5) and S36 (lane 6) produced 391- and 378-bp amplicons, respectively. Lane L1, 100-bp ladder. (B) Quantification of circular and linear DNAs during MDA without the denaturing step. A 0.037-ng/µl concentration of each circular or linear DNA was mixed with 10 ng/µl lambda DNA, and MDA was performed without the heating step. Concentrations of circular and linear DNAs were estimated by real-time PCR. Filled circles, circular S11; filled squares, circular S36; empty circles, linear S11; empty squares, S16. Circular DNA was amplified much more efficiently than linear DNA. x axis, time in minutes; y axis, DNA concentration in picograms per microliter.
|
Nucleotide sequence accession numbers.
Contig and metagenome sequences have been deposited at DDBJ/EMBL/GenBank under the project accession no. ABQX00000000 as a whole-genome shotgun project. The version described in this paper is the first version, ABQX01000000. The accession numbers are ABQX01000001 to ABQX01000093 for contigs and ABQX01000094 to ABQX01000878 for single reads.
|
|
|---|
The amplification efficiencies of circular and linear DNAs during MDA without the denaturing step were compared with those of cut and uncut plasmids containing different inserts. From the results, we found that circular DNA (plasmids denatured by heat; the sizes were ca. 2,700 bp) was amplified much more efficiently than linear DNA during MDA without the denaturing step (Fig. 2).
Abundance and diversity of viruses from rice paddy soil.
The abundance of viruses in a rice paddy soil sample was estimated by directly counting virus-like particles by EFM after sequential filtration. We obtained 2.77 x 108 ± 0.47 x 108 viruses from 1 g (wet weight) of soil. We did not observe any bacterial cells by microscopy after isolation of the viruses (Fig. 3A), and the 16S rRNA gene was not amplified by PCR with the 8F and 1492R primers from the filtered samples (data not shown).
![]() View larger version (103K): [in a new window] |
FIG. 3. Virus-like particles (A) and bacteria (B) from rice paddy soil stained with Sybr gold and observed by EFM. Scale bars, 5 µm. (A) No bacterial contamination is observed.
|
|
View this table: [in a new window] |
TABLE 1. Classification of sequences from RH and RX libraries based on hits from database searches
|
In the RX library, 94% of the known sequences were virus-related hits and 88% of these were from ssDNA viruses (75% eu-viruses and 13% ssDNA phage). Circoviridae (44%), Nanoviridae (25%), Microviridae (14%), and Geminiviridae (16%) were the majority of ssDNA viral hits in the RX library; these results are similar to those of the RH library.
Significantly hit viral proteins from the metagenomic library were classified on the basis of a BLASTX search against the GenBank database (Table 2). We divided the proteins into two categories, (i) a dsDNA group (dsDNA phages and prophages from the RH and RX libraries) and (ii) a ssDNA group (ssDNA viral proteins from the RH and RX libraries). We found that the kinds of ssDNA viral proteins were not diverse compared to those of dsDNA viral proteins. The majority (76%) of the ssDNA viral proteins were similar to replication-related proteins, and 15% were similar to structural proteins. This is because major components of ssDNA viral genomes are replication-related and structural proteins and ssDNA viral genomes are very short and contain only a few ORFs in their genomes (38).
|
View this table: [in a new window] |
TABLE 2. Protein types with significant viral hits from two metagenomic libraries
|
|
View this table: [in a new window] |
TABLE 3. Contig formation of metagenomic sequences from two libraries
|
Phylogenetic analysis of replication-related genes from uncultured viruses.
In total, 122 viral DNA sequences showed significant hits with replication-related genes. After excluding short sequences and merging redundant sequences, we obtained 85 partial or full ORFs coding putative replication-related genes from the contig and singletons. Most of the sequences had distant relationships with known viral genes; therefore, the identities from the BLASTX search were lower than 35% in most cases. An NCBI Conserved Domain Database search was performed to investigate the phylogenetic relationship of these putative replication-related protein sequences. We found that 58 peptides contained at least one conserved domain; these included the putative viral replication protein domain (Pfam accession no. PF02407, Pfam ID Viral_Rep, 39 peptides), the viral replication domain C terminus (PF08419, Viral_Rep_C, 14 peptides), the Geminivirus Rep protein catalytic domain (PF00799, Gemini_AL1, 6 peptides), the Geminivirus Rep protein central domain (PF08283, Gemini_AL1_M, 3 peptides), the RNA helicase domain (PF00910, RNA_helicase, 4 peptides), and two COG domains; 28 ORFs showed no conserved domain in the NCBI Conserved Domain Database search.
Forty Viral_Rep and six Gemini_AL1 domain-containing peptides were aligned with the representative sequences for each conserved domain, and an additional four replication protein-like sequences from protozoa and plasmids containing conserved regions relating to Viral_Rep domains (21) were added to the alignment. Only properly aligned regions of the conserved domains were used for the phylogenetic trees. After excluding partially aligned sequences, 28 and 4 peptides were used for reconstructing phylogenetic trees for the RH and RX groups, respectively (Fig. 4; see Fig. S2 and S3 in the supplemental material).
![]() View larger version (29K): [in a new window] |
FIG. 4. Phylogenetic trees of the amino acid sequences from the Rep_Viral (A) and Gemini_AL3 (B) domains obtained in this study and related members with those domains. The alignment lengths for tree construction were 75 and 108 amino acids for trees A and B, respectively. The scale bars indicate the distance score calculated with the scoring matrix BLOSUM62. The prefix C in the sequence name means that the sequence is from a contig, and the prefixes RH or RX indicates the library where the sequence was retrieved. (A) Reference sequences: CFDV, Coconut foliar decay virus (accession no. Q66005); SCSVa, -b, and -c, Subterranean clover stunt virus (accession no. Q87009, Q87013, and Q9ICP7); BBTVa, -b, and -c, Banana bunchy top virus (accession no. Q8QTK9, Q83026, and Q65378); PCV2, Porcine circovirus 2 (accession no. Q8BB16); GCV, Goose circovirus (accession no. Q8AYY2); CCV, Canary circovirus (accession no. Q912W1); CuCV, Columbid circovirus (accession no. Q91GA3); EHa, -b, and -c, Entamoeba histolytica HM-1 (accession no. XP_648754, XP_648748, XP_650115); CV, Canarypox virus (accession no. NP_955176); AYVV-aDNA1, Ageratum yellow vein virus-associated DNA 1 (accession no. Q8QME8); MVDVa and -b, Milk vetch dwarf virus; FBNYVa, -b, -c, Fava bean necrotic yellows virus (accession no. O91250, O91250, and Q66862); TCSV-aDNA1, Tobacco curly shoot virus-associated DNA 1 (accession no. Q7T7M5); GI, Giardia intestinalis (accession no. Q9NJY0); p4M, plasmid of Bifidobacterium pseudocatenulatum (accession no. NP_613078). (B) Reference sequences: CSMV, Chloris striate mosaic virus (accession no. P18921); PSV, Panicum streak virus (accession no. Q00338); MSV, Maize streak virus (accession no. P03568); WDV, Wheat dwarf virus (accession no. P06847); TYDV, Tobacco yellow dwarf virus (accession no. P31617); TYLCV, Tomato yellow leaf curl virus (accession no. P36279); PHYVV, Pepper huasteco yellow vein virus (accession no. Q06923); BGMV, Bean golden mosaic virus (accession no. P05175), ACMV, African cassava mosaic virus (accession no. P14982); SLCV, Squash leaf curl virus (accession no. P2904); BCTV, Beet curly top virus (accession no. P14991); pPASa and -b and pPAUa and -b, plasmids of Candidatus P. asterisasteris (accession no. Q2NIE5, Q2NIE4, Q0QLC1, and Q0QLC5). For the accession numbers of sequences from this study in the figures and alignments of the translated sequences, see the supplemental material.
|
The Gemini_AL1 group appeared to be included in the established family Geminiviridae; however, the sequences were notably divergent from known members of this family (Fig. 4B). One sequence exhibited close relationships with plasmids from Phytoplasma sp. bacteria, phytopathogens like Geminivirus (5, 30).
Viral_Rep and Gemini_AL1 families belong to the Rep-like domain clan (CL0169, Rep), which contains eight protein family members that are related to replication proteins for viruses and plasmids. A clan contains two or more protein family members derived from a single evolutionary origin. We aligned these sequences and used them for rooting the tree; we proposed and indicated the rooting position in the trees (Fig. 4A and B).
Reconstruction of complete circular ssDNA viral genomes.
Among the contigs that were assembled from the total sequences of the two libraries, we found that 19 of them contained repeated sequences or circular sequences which had the same sequences in front (the start) and at the rear (the end). The sizes of these repeated or circular sequences varied from 290 to 2,495 bp (Table 4). Of these repeated or circular sequences, four contig sequences (C020, C112, C005, and C132, with sizes of 2,090, 1,984, 1,634, and 1,108 bp, respectively) showed significant hits with ssDNA eu-viruses in the TBLASTX and BLASTX searches of the GenBank database; the other contigs were not related to any known sequences. In order to confirm that the circular DNA contigs really had circular structures, we designed inverse PCR primers in opposite directions against two contigs (C005 and C112) that are thought to contain the putative circular genomes. This resulted in the amplification of PCR amplicons of the expected size from the MDA-amplified viral sample. Sequencing of the amplicons (data not shown) showed that they had the same sequences as the circular genomes. In the case of C005, the chimera sequence that existed in the contig sequence and was excluded during construction of the circular genome was not found in the sequence of the PCR amplicon. With the exception of replication gene-related ORFs, all of the other ORFs in the putative genomic components showed no significant hits to known proteins. The viral replication protein of C020 and C112 had two conserved domains from the putative viral replication protein (PF02407) and the viral replication domain C terminus (PF08419). The viral replication protein of C005 had one conserved domain of the viral replication domain C terminus (PF08419). No conserved domains were identified for C132. Analyses of these putative viral genomic components were performed (Fig. 5). C132 was excluded from the analyses because this contig was obtained from only two reads (Table 4), and potential sequencing errors and/or chimeric sequences can result in highly biased analyses. The genome organizations of the three putative genome components were similar to that of PCV2, a circovirus (Fig. 5A), as well as to those of other circoviruses, nanoviruses, and geminiviruses (35, 46). All of the components had a putative stem-loop structure in their intergenic regions. The loop region of this structure contains a conserved nonanucleotide motif that is found in plant geminiviruses and plant nanoviruses and that corresponds to the site of viral DNA replication (34). The stem-loop sequences were aligned with that of PCV2 (see Fig. S1 in the supplemental material). Except for the putative Rep protein, each putative genome component had only one or two additional ORFs (overlapped ORFs were not considered). If the circular component constitutes a viral genome, these ORFs could be the putative capsid protein but none of these ORFs gave significant hits with known proteins. Because various kinds of capsid proteins from circoviruses, nanoviruses, and geminiviruses have a high frequency of arginine/lysine residues in their amino-terminal regions (35), we investigated the arginine/lysine frequency in the amino-terminal regions of the ORFs of the putative genome components. We found that one of the ORFs of each component had an arginine/lysine-rich amino-terminal region (see Fig. S1 in the supplemental material), although this was only weakly so in the case of C112.
|
View this table: [in a new window] |
TABLE 4. Contigs forming circular DNA or containing repeated sequences
|
![]() View larger version (22K): [in a new window] |
FIG. 5. Genome organizations of the three putative circular genomic components reconstructed from soil viral metagenomic sequences. The components show a genomic organization similar to that of Porcine circovirus2 (PCV2), a representative circovirus.
|
|
|
|---|
29 DNA polymerase synthesizes dsDNA. Once the priming events have occurred, the displacement activity of the
29 polymerase continuously supplies new priming sites for unbound random hexamers (31). If the denaturation step is excluded, it is difficult for dsDNA to bind to the random hexamers because MDA is performed under isothermal conditions at 30°C. In contrast, ssDNA rapidly anneals to random hexamers at 30°C. This explains why almost all of the known sequences in the RX library were from ssDNA viruses. Another reason for the preferential amplification of ssDNA viruses might be their circular viral genomes. The ssDNA viruses detected in this study belong to the families Circoviridae, Geminiviridae, Nanoviridae, and Microviridae, all of whose members have small circular genomes ranging from about 1,000 to 9,000 bp. The
29 polymerase used in the MDA reaction amplifies DNA via a rolling-circle amplification mechanism in which single-stranded templates can be continuously produced along with the circular DNA (31). The results showing that circular DNA was amplified much more efficiently than linear DNA (Fig. 2) also supports the notion that MDA without the denaturing step amplifies ssDNA viruses selectively. Thus, ssDNA viruses containing short circular genomes could be amplified selectively from mixed viral DNA. The fact that the proportions of the two metagenomic sequences (C005 and C112) after MDA were high (Fig. 1) could explain why the sequences of contigs C005 and C112 were detected very frequently (9 and 11 times among 389 sequences) in the RX library (Table 4). The fact that these sequences were related to those of ssDNA viruses also provides further evidence that MDA without the denaturing step preferentially amplifies ssDNA genomes. In addition, it was reported that circular DNA virus genomes could be amplified by a sequence-independent strategy by MDA (or multiply primed rolling-circle amplification). The human papillomavirus circular genome has been amplified from DNA extracted from cell lines and bovine tissues; the concentration of the papillomavirus DNA was increased by 2.4 x 104-fold (40). This method was also used to amplify the circular genomes of various viruses, such as polyomaviruses (27), anellovirus (36), circoviruses (26), and geminiviruses (23). Considering that papillomavirus and polyomavirus were dsDNA viruses and these dsDNA viruses could be amplified efficiently by MDA, the circular structure was a main reason for the preferential amplification of circular ssDNA viruses. In addition, the result that 50% of the significant hits related to ssDNA viruses was increased to 90% without the denaturing step implied that eliminating the denaturing step was necessary to investigate ssDNA viruses more selectively. A precise and detailed evaluation of the factors that affect the efficiency of the MDA technique such as the DNA length, linearity, and strand type; the reaction time; and the denaturing step should be performed in future studies.
Diversity of viruses in rice paddy soil.
When MDA was performed with the denaturing step (RH library), we acquired a number of metagenomic sequences that were related to both dsDNA and ssDNA viruses. All of the dsDNA viruses were prokaryotic viruses (phages); the majority of them (>92%) were tailed phages, and only one was a polyhedral phage (belonging to the family Tectiviridae). A large proportion of the significant hits in the RH library were related to ssDNA viruses. This abundance of ssDNA viral hits is not in concord with a previous observation suggesting that dsDNA viruses are the major entity in a soil viral assemblage (50); preferential amplification of circular DNA by MDA was suggested to be the main reason for the results. The majority of ssDNA viruses detected in this study were related to animal and plant viruses (68 to 85%) and not to phages. The proportions of animal and plant viruses were almost the same. It is suspected that the major source of plant viruses is rice or other plants that grow in the field and the source of the animal viruses seems to be wild bird feces or composted manure used as organic fertilizer. Animal and plant viruses detected in this study belong to the families Circoviridae, Nanoviridae, and Geminiviridae, in which a lot of pathogenic viruses exist (32, 45, 48); this indicates that soil could be a reservoir for viruses that are pathogenic to animals and plants. Another main group of ssDNA viruses obtained in this study was Microviridae, which is a bacteriophage family. Chp1-like ssDNA microphages belonging to the family Microviridae were also found to be abundant in the Sargasso Sea (3).
Phylogenetic analysis of replication-related genes of uncultured viruses.
Phylogenetic analyses of replication-related genes showed that the majority of the sequences acquired in this study represented previously unknown viruses. The viral family Geminviridae, containing a large number of plant-pathogenic viruses, is represented by four genera: Begomovirus, Curtovirus, Mastrevirus, and Topocuvirus. The sequences acquired in this study relating to the family Geminiviridae did not fall into any of these genera, indicating that they are new members of Geminiviridae, the largest family of ssDNA viruses. The majority of the sequences obtained in this study were distantly related to Nanoviridae and Circoviridae but did not fall into to the established families. Some of these sequences were closely related to nonviral genomic material such as satellites, bacterial plasmids, genomes of protozoa, and dsDNA viruses rather than ssDNA viruses (Fig. 2). Nevertheless, some of these sequences might have originated from ssDNA viruses and they may represent new ssDNA virus families. The fact that more than 60% of the hits in this study were nonsignificant implies that ssDNA viral diversity in a soil environment may be more complex than previously thought.
Construction of complete circular ssDNA viral genomes.
The presence of a conserved stem-loop structure in the intergenic region and two ORFs encoding a putative capsid protein and a putative Rep protein suggests that the three circular sequences are the entire genomes or genome components of unknown ssDNA viruses (some ssDNA viruses, such as Nanovirus and Geminivirus, have multiple genomic components). From this point of the view, the other circular sequences (derived from contigs) that gave no hit with any known protein in the sequence databases might also turn out to be completely novel viruses. Breitbart et al. pointed out the possibility of sequencing the entire genome of uncultured viruses by using metagenomic approaches (10). Several complete genomes of RNA viruses have been reconstructed from a coastal RNA viral metagenomic study (13). In this report, several putative circular ssDNA viral genomes were reconstructed with a relatively small amount of sequencing (<800 sequence reads). These results indicate that metagenomic research is a useful method to uncover unknown genomic entities among environmental viruses.
MDA without the denaturing step cannot displace previous methods such as the LASL method. Rather, the work described herein points to an approach that permits the investigation of ssDNA viral diversity, something that cannot be accomplished by the LASL method. MDA has several shortcomings such as chimera formation (29, 52), biased amplification (24), and contamination of external DNA (52). Nevertheless, preferential amplification of circular ssDNA viral DNA by MDA without the denaturing step could provide a new tool to explore currently unexplored viral diversity.
Although viruses are major biological entities in soil, viral diversity in this environment was largely unexplored. By using the novel culture-independent metagenomic approach, we have investigated the diversity of DNA viral assemblages. MDA is a useful approach for the investigation of both ssDNA and dsDNA viral diversity; this technique was also used to selectively investigate the diversity of ssDNA viruses. Our research also showed that unknown short circular ssDNA viral genomes or genome components can be detected without viral cultivation by sequencing the metagenome and amplifying the DNA by MDA. Further studies are needed to reveal the viral diversity of different soil samples and to quantitatively analyze soil viruses. These efforts would contribute to our understanding of the role of viral assemblages in biological soil communities.
Published ahead of print on 15 August 2008. ![]()
Supplemental material for this article may be found at http://aem.asm.org/. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»