Previous Article | Next Article ![]()
Applied and Environmental Microbiology, December 2007, p. 7629-7641, Vol. 73, No. 23
0099-2240/07/$08.00+0 doi:10.1128/AEM.00938-07
Copyright © 2007, American Society for Microbiology. All Rights Reserved.
,
Thomas E. Hanson,1
Kurt E. Williamson,1,
Dhritiman Ghosh,2
Mark Radosovich,2
Kui Wang,3 and
K. Eric Wommack1*
College of Marine and Earth Studies, University of Delaware, Newark, Delaware 19711,1 Department of Biosystems Engineering and Soil Science, University of Tennessee, Knoxville, Tennessee 37996,2 Center of Marine Biotechnology, University of Maryland Biotechnology Institute, Baltimore, Maryland 212023
Received 26 April 2007/ Accepted 26 September 2007
|
|
|---|
|
|
|---|
In addition to direct impacts on ocean biogeochemistry, the viral infection process may significantly alter the structure of microbial host communities. Virus-mediated changes in community genetic diversity are induced by selective infection and lysis of abundant community members (45, 63). Thus, host-selective viral infection may increase the overall clonal diversity of microbial host populations by removing numerically dominant community members from particular niches. Viruses, particularly bacteriophages, can also directly alter the phenotypes of host cells through genetic exchange (specific and generalized transduction) (44) or through the cryptic infectious state known as lysogeny (50), which has been observed in many marine ecosystems, especially under conditions which are unfavorable for host growth (40, 70). Because viral nucleic acids are incorporated into the host genome, lysogeny can lead to phage-mediated phenotypic conversion of prokaryotic hosts (44, 50).
The effects of viral infection in marine ecosystems emerge from the collective phage-host interactions in marine microbial communities. However, with the exception of a few well-known bacteriophages (e.g., T4, T7,
, and P20), relatively little is known about the genetic capabilities and phenotypic characteristics of the vast majority of viruses. Whole-genome sequence data for a small collection of marine bacteriophage-host systems have revealed that the phages carry an unusually high proportion of unknown genes and have previously unexpected gene functions, such as involvement in phosphate uptake (e.g., phoH in Roseophage SIO1) (53) and photosynthesis (e.g., psbA in marine cyanophage) (38, 58, 59). Genomic investigations of phycoviruses have also revealed a high proportion of unknown genes and unusual functional genes, such as genes involved in the induction of apoptosis (71). Although investigations of single phage-host systems have provided unparalleled insights into these interactions, a broader understanding of virioplankton composition and diversity can come only from cultivation-independent approaches.
Characterization of whole viral assemblages based on pulsed-field gel electrophoresis (PFGE) has indicated that marine virioplankton community diversity varies dynamically in response to both seasonal and spatial gradients in ecosystem properties (74, 75) and that these changes mirror those in host bacterial communities (33). Higher-resolution surveys using marker genes (e.g., g20 and DNA polymerase genes) within specific lineages of virioplankton have uncovered extraordinary diversity (23, 77), yet some viral strains have been found in nearly all marine environments (15, 56, 57). While PFGE and marker genotyping approaches have been critical in the development of a foundation for synecological studies of virioplankton, they are limited either in resolution (PFGE) or breadth (marker genotyping).
Characterization of microbial communities using high-throughput DNA sequencing and bioinformatic approaches (i.e., metagenomics) addresses some of the limitations of these approaches and provides a high-resolution view of microbial diversity, as well as the potential functional capabilities within these assemblages. Based on the few metagenome surveys of marine viral communities completed to date, a consensus is emerging that virioplankton communities are extraordinarily diverse and contain a high proportion of unknown sequences (27). Against the typical backdrop of over 60% unknown sequences, viral metagenome libraries tend to contain a collection of gene homologs that are relatively distant from better-known representatives in bacterial genomes (13, 16). Despite the predisposition for horizontal gene transfer between bacteriophage genomes (32), there appears to be a specific "marine" aspect to virioplankton assemblages, which are dominated by genes from marine phages and cyanophages in particular (6). Analyses of a large database of short-read (
100-base) sequences from four oceanic regions estimated that the compositions of virioplankton assemblages were extraordinarily even, that the assemblages contained between 500 and 130,000 genotypes, and that there were significant overlaps in genotype composition between disparate geographic regions (6).
While viral communities in near-shore waters, sediments, and open oceans have been examined (6, 13, 16), the genomes of virioplankton in a highly productive estuarine ecosystem have not been described in detail. Previous investigations characterizing bacterioplankton diversity demonstrated that the variable estuarine environment selects for a unique bacterioplankton assemblage whose composition is temporally and spatially dynamic (25, 34). The earliest demonstrations that virioplankton exhibit seasonally dynamic patterns of abundance (72), diversity, and composition (74, 75) were obtained from studies of the Chesapeake Bay. This report describes an analysis of the first estuarine viral community metagenome from the Chesapeake Bay in terms of the viral community's taxonomic, functional, and genotypic diversity.
|
|
|---|
Enumeration of viruses, bacteria, and Synechococcus.
Water samples were collected from discrete depths using 10-liter Niskin bottles mounted on a conductivity-temperature-depth rosette. Subsamples were immediately collected in 50-ml centrifuge tubes, fixed with glutaraldehyde (final volume, 2.5%), and stored at 4°C in the dark for no more than 2 weeks prior to microscopy. Viral particles and bacterial cells were collected by gentle vacuum filtration onto a 25-mm-diameter 0.02-µm-pore-size Anodisk (Whatman) and stained with SYBR gold (Molecular Probes) as described by Chen et al. (22). Bacteria and viruses in 10 fields of view (a minimum of 200 total viruses and bacteria) were counted for each sample. For Synechococcus enumeration, bacterial cells were collected on a 25-mm-diameter 0.2-µm-pore-size black polycarbonate filter (Poretics) using gentle vacuum filtration and counted as described by Wang and Chen (66). At least 200 Synechococcus cells in 10 fields of view were selectively enumerated using green excitation (528 to 553 nm).
Metagenome library construction and sequencing.
A random shotgun library of pooled Chesapeake Bay viral concentrates (see above) was constructed using the linker amplified shotgun library method (16, 54) through the Nanoclone service provided by Lucigen Corporation. Transformation mixtures were plated on LB agar plates containing kanamycin and grown for 14 to 16 h at 37°C. A total of 3,072 colonies were picked and grown in 96-well plates in LB containing kanamycin (60 mg/ml) for 22 to 24 h. After growth, sterile 50% glycerol was added to each well (final concentration, 15%) and plates were frozen at –80°C.
Two microliters of glycerol stock for each clone was used for TempliPhi (Amersham Biosciences) rolling circle amplification according to the manufacturer's instructions, with a extension step consisting of 16 h at 30°C. The completed TempliPhi reaction mixtures were diluted 1:1 with sterile H2O, and 6 µl of the dilutions was used in standard 20-µl sequencing reaction mixtures with Dynamic ET terminator chemistry (Amersham Biosciences). Each clone was sequenced bidirectionally, using a modified version of the forward primer (SL1; 5' CAGTCCAGTTACGCTGGAGTC 3') and reverse primer (SR1; 5' CTTTCTGCTATGGAGGTCAGGTATG 3') recommended for the pSMART-HCK vector (Lucigen Corp.). During protocol optimization, 768 clones were sequenced twice. Sequencing reaction mixtures were cleaned by ethanol precipitation and resuspended in the loading solution provided with a Dynamic ET chemistry kit (Amersham Biosciences). The products were separated with a MegaBACE 4000 capillary electrophoresis instrument (Amersham Biosciences) using low voltage (6 kV) and long run times (240 min) to obtain 550- to 650-base read lengths for over 85% of 6,912 total sequencing runs. Initial base calling and quality assessment were done using the Sequence Analyzer program (Amersham Biosciences).
Metagenome sequence analysis.
Sequences were screened for vector sequence, linker sequence, and minimum base quality using Phred and Crossmatch (28). After screening, all sequences smaller than 50 bases were removed and 6,478 sequences were carried forward for further analysis. The clones which were sequenced twice during protocol optimization were compared, and the shorter of each pair of corresponding sequences was removed, leaving 5,641 nonredundant sequences. These sequences were translated in six frames and compared (as amino acids) to six databases using tBLASTx version 2.2.8 (for nucleotide databases) or BLASTx version 2.2.9 (for protein databases) (3, 4). The GenBank databases used were updated on 1 July 2004 prior to all BLAST comparisons and included the nonredundant nucleotide (nt) and protein (nr) databases, as well as environmental nucleotide (env-nt) and environmental protein (env-nr) databases. Two additional viral metagenome sequence databases were used in tBLASTx homology searches. The first of these databases included viral sequences from a California near-shore water column and sediment, as well as viral sequences from human feces (13, 14, 16). The second database was composed of viral sequences generated from a Delaware agricultural soil sample (K. E. Wommack, S. R. Bench, and K. E. Williamson, unpublished data). At the time of analysis the GenBank environmental databases were composed of environmental microbial metagenome sequences from an acid mine drainage biofilm (64) and the Sargasso Sea (65). The databases were grouped into three categories according to the origin of their sequences: (i) traditional sequences, generally derived from cultivated organisms (GenBank nt and nr databases); (ii) microbial metagenome sequences (env-nt and env-nr databases); and (iii) viral metagenome sequences (vir-mg databases). Further comparisons of viral metagenome sequence data against single viral genomes were performed using tBLASTx with metagenome reads as queries and the nucleotide sequence of each viral genome as a single sequence subject database. Translated BLAST alignments to viral genome sequences with E values below 10–6 were considered significant.
The composition of the metagenome sequence library was determined based on BLAST sequence homology, using only alignments with E values less than 10–3. Each sequence was categorized based on the alignment quality, organism, and gene function of its most similar BLAST homolog. Taxonomic origins and functions were proposed for the subset of sequences with a significant BLAST homolog in one or both of the nt and nr databases. For bacterial species, categories were based on the NCBI taxonomy (8, 68) of the organism supplying the top homolog. For viruses, taxonomy was established as described by the International Committee on Taxonomy of Viruses (ICTV) (18) and also by using the phage proteomic tree (27, 52). Functional gene assignments were grouped according to the TIGR-CMR functional categories (46), which were originally derived from functional information for Escherichia coli genes (51). In the event of conflicts between databases for assigning taxonomy and function, priority was given to the alignment with the lowest E value.
Construction of PsbA phylogenetic tree.
A collection of 99 nonredundant PsbA protein sequences were collected from public sequence databases and used as a comparison set for the nine unique PsbA sequences that were sufficient length (>187 amino acids) identified in the Chesapeake Bay metagenomic sequence data. Clones identified as psbA gene homologs were sequenced as described above, using internal primers to obtain coverage of a larger portion of the gene. Multiple-sequence alignment was performed by using the ClustalW algorithm in MEGA (36). The sequence data were also evaluated by ProtTest (1) to establish the most appropriate amino acid substitution matrix to reconstruct phylogenetic relationships among the sequences. The final tree was constructed by the neighbor joining method in MEGA using the JTT matrix allowing for rate variation between sites with a gamma distribution over four rate categories of 0.441, as suggested by the ProtTest analysis with pairwise deletion of gapped positions. Bootstrapping was performed for 500 trials, and the results were displayed as a percentage of the trees containing the node specified. Alternative tree topologies, including collapse of the tree on the nodes, were examined to verify nodes with bootstrap values below 50. All of the alternative trees agreed with the grouping of the Chesapeake Bay PsbA sequences presented below.
Estimates of viral community diversity.
Metagenome library sequences were assembled using Sequencher (Gene Codes Corporation) according to parameters described by Bretibart et al. (16), and the number of resulting contiguous sequences (i.e., the contig spectrum) was used to predict possible virioplankton population structure at the time of sampling. Three assemblies were generated and analyzed: one with all library sequences, one with only forward sequence reads, and one with only reverse sequence reads. The online PHACCS tool was used for assessing viral community diversity with the power law model (5, 27). The contig spectra used as input contained values for the first 12 contig types (i.e., up to contigs containing 12 sequences) and were as follows: assembly of all 5,641 sequences = [5435 100 2 0 0 0 0 0 0 0 0 0]; assembly of 2,798 forward sequences = [2712 43 0 0 0 0 0 0 0 0 0 0]; and assembly of 2,843 reverse sequences = [2758 41 1 0 0 0 0 0 0 0 0 0 0].
Nucleotide sequence accession numbers.
The nonredundant set of 5,641 metagenome sequences has been deposited in the GenBank database (http://www.ncbi.nlm.nih.gov/) under genome project number 16522; the accession numbers are EI103240 to EI108880.
|
|
|---|
100-bp) virioplankton sequences from 68 sites in four ocean regions were categorized as known sequences (6). This lower frequency of known sequences is likely related to the sequence lengths of the libraries. Extensive comparisons of short-read (
100-bp) and long-read (>600-bp) sequence data sets generated in silico, starting with the Chesapeake Bay metagenome sequence collection, indicated that short-read sequences fail to detect more than 60% of the BLAST homologs detected by long-read data, even with 6- to 10-fold oversampling relative to the long-read data set (K. E. Wommack, J. Bhavsar, and J. Ravel, submitted for publication). These observations suggest that short-read sequences are less appropriate than long-read sequences for functional characterization of viral metagenomes based on BLAST homology searches. |
View this table: [in a new window] |
TABLE 1. Metagenome sequence BLAST homology by database and domain (E value, <10–3)
|
Earlier reports of viral metagenome libraries focused on BLAST searches against only GenBank nt and nr databases. However, the release of data for 1,360 Mb of microbial metagenome sequences from the Sargasso Sea (65) and of data for 76 Mb from an acid mine drainage microbial community (64) enabled homology searches against strictly environmental sequences. Comparing quality scores of BLAST alignments from environmental databases to quality scores from the GenBank nt and nr databases revealed that the Chesapeake Bay metagenome was more similar to environmental sequences. For example, the average and median E values were approximately 3 logs lower for homologs to environmental sequences than for sequences with GenBank nt/nr database homologs (Table 1).
This type of comparison also revealed that Chesapeake Bay sequences were more similar to known viral sequences than to other known sequences. The vast majority (91%) of the 2,195 sequences with similarity to GenBank nt and nr database sequences were most similar to sequences from prokaryotes or viruses. Alignments to viral sequences were the highest in quality, with median and average E values more than 11 logs lower than the values for BLAST alignments to bacterial sequences (Table 1). Alignments to environmental sequences were the second highest quality, with E values 7 logs lower than the values for bacterial alignments. The 149 Chesapeake Bay viral sequences that were most similar to eukaryotic sequences had the lowest quality alignments (the E values were up to 22 logs higher than the values for virus sequence alignments), suggesting that eukaryotic viruses were rare in the Chesapeake Bay at the time of sampling (Table 1).
The overlap of homology between databases also suggested the nature of the source DNA used for library construction. Among viral metagenome sequences with BLAST homologs, the largest fraction (1,235 sequences or 31%) showed similarity to at least one sequence in each of the three database categories (Fig. 1, central region) (see Materials and Methods for descriptions of database categories.). One-half (51%) of the Chesapeake Bay metagenome sequences had a homolog in the env-nt database or the env-nr database or both, and more than one-half of these (30% of the total) had homologs only in the environmental databases and no homology to any sequence in the GenBank nt or nr database. The majority of the matches were to sequences from the Sargasso Sea metagenome library (65), indicating that there were overall similarities between the microbial and viral communities in diverse marine environments. This type of signal was not detectable in short-read virioplankton libraries, where even marine-derived viral sequences had a low BLAST homolog rate (4%) with the env-nt and env-nr databases, similar to the rate observed when the GenBank nr and nt databases were queried (6). Sequences with matches to both the GenBank nt/nr and env-nt/env-nr databases occurred at a frequency (19%) similar to the frequency of matches with homologs in both the env-nt/env-nr and vir-mg databases (16%). Chesapeake Bay virioplankton sequences with homology solely to the GenBank nt/nr or vir-mg database or to both the GenBank nt/nr and vir-mg databases were rare, with 88 to 163 sequences per category (Fig. 1).
![]() View larger version (35K): [in a new window] |
FIG. 1. Distribution of translated BLAST (tBLASTx against nucleotide databases and BLASTx against protein databases) matches between all database "types." The upper, largest circle represents matches to the env-nt and/or env-nr database comprised mostly of Sargasso Sea bacterial metagenomic data. The leftmost circle represents matches to either of the traditional GenBank (nt and nr) databases. The bottom right circle represents matches to any of a series of small viral metagenomes from terrestrial and marine environments (see Materials and Methods for details). Intersections of circles represent sequences that had BLAST homology to more than one database type, and the center area represents sequences with homology to all three types.
|
![]() View larger version (13K): [in a new window] |
FIG. 2. Distribution of translated BLAST sequence matches across taxonomic domains sorted by match quality. Sequences were placed in nonredundant bins according to quality (i.e., E value), and relative domain percentages were calculated for each bin. The smallest E values represent the highest-quality matches on the left. The least confident matches are on the right, with a maximum E value of 10–3. The numbers of sequences in the bins are indicated above the bars.
|
The majority (nearly 60%) of the 1,056 Chesapeake Bay virioplankton sequences with best BLAST matches to prokaryote sequences were most similar to Proteobacteria (Fig. 3). The Gammaproteobacteria subphylum accounted for the largest portion (40%) of the proteobacterial homologs, while the Alpha-, Beta-, and Deltaproteobacteria were less common, accounting for only 15 to 20% of this group (Fig. 3, inset). BLAST homologs to Cyanobacteria and Firmicutes accounted for another 26% of the prokaryotic sequences (15 and 11%, respectively). One caveat of these data is the potential influence of the subject database on the taxonomic distribution of metagenome homologs. To estimate the amount of taxonomic bias introduced by database contents, we compared the taxonomic distribution of the metagenome sequences to the phylogenetic composition of prokaryotic genome sequences in GenBank. This comparison revealed that four of the eight most common phyla occurred at similar frequencies in the query (Chesapeake) and subject (GenBank) databases (Fig. 3). However, the frequency of BLAST homologs among the remaining four phyla differed from the GenBank distribution by more than 5%.
![]() View larger version (10K): [in a new window] |
FIG. 3. Distribution of translated BLAST prokaryotic homolog sequences. The data are organized according to prokaryotic phyla. Data for completed prokaryote genomes in GenBank (at the time of metagenome sequence comparison) are shown to illustrate groups that are overrepresented (e.g., cyanobacteria and Proteobacteria) or underrepresented (e.g., Firmicutes) in the metagenome relative to the subject database. CB, Chesapeake Bay.
|
Archaea and Firmicutes were underrepresented in the Chesapeake Bay metagenome library, while Proteobacteria and Cyanobacteria were overrepresented with respect to the GenBank database (Fig. 3). The underrepresentation of archaeal homologs likely reflected the known low abundance of archaea previously observed in the Chesapeake Bay (11) and the lack of genomic sequence information for mesophilic marine archaea (35). Although members of the Firmicutes and Actinobacteria are known to occur in the Chesapeake Bay (34), the underrepresentation of these groups in the virioplankton metagenome may reflect a bias towards terrestrial strains in the GenBank databases. Overall, these discrepancies illustrate real taxonomic biases in the composition of the Chesapeake Bay host community and the fact that available sequence databases are not ideally suited for characterization of metagenomic libraries because many environmentally important groups of prokaryotes are underrepresented in these databases.
Similar to other viral metagenomes, when sequences were classified using the taxonomic scheme outlined by the ICTV (18), the vast majority of Chesapeake Bay metagenome sequences with viral homologs were categorized as bacteriophages. Over 90% of these sequences were most similar to the tailed bacteriophage order Caudovirales (Table 2), and 83% of these sequences were most similar to members of the Myoviridae and Podoviridae families (42 and 41%, respectively). In contrast, the third major family, Siphoviridae, accounted for only 6% of the viral homolog sequences, and another 3% were unclassified below the order level. Viruses that infect algae (Phycodnaviridae; e.g., viruses PbCV-1 and EsV-1) comprised only 1% of the virus BLAST homologs to Chesapeake Bay virioplankton sequences, further demonstrating that viruses with eukaryotic hosts were rare in this sample (Table 2). The relative percentages of the Caudovirales families observed in the Chesapeake Bay library contrast strikingly with the results of previous studies of a variety of sample types and locations, which estimated that between 28 and 76% of the sequences could be classified as Siphoviridae (13, 14, 16, 20). The uniquely low proportion of Siphoviridae found in the Chesapeake Bay library suggests that most bacteriophages in this estuary are virulent as a large proportion of temperate phages belong to the Siphoviridae. By extension, it also suggests that there is a relatively low rate of lysogeny within Chesapeake Bay bacterioplankton host populations in late summer and is consistent with results for Tampa Bay that showed that the lowest proportion of lysogenic hosts within estuarine bacterioplankton occurred in warm productive months (40, 70).
|
View this table: [in a new window] |
TABLE 2. Distribution of top BLAST homologs to viral sequences (E value, <10–3) organized by ICTV taxonomy or phage proteomic tree cluster
|
Classification of metagenome sequences using the second version of the phage proteomic tree (27, 52) showed that cyanophage P60 was the phage most commonly detected (>50% of all proteomic tree homologs), followed by homologs to the closely related Pseudomonas aeruginosa phage PaP3 and Roseophage SIO1 (Table 2). Virioplankton sequence homologs were distributed throughout the P60, PaP3, and SIO1 genomes, indicating that intact phage genomes similar to these species, rather than particular genes or regions, were abundant in the metagenome library. These three phages are in the T7-like Podophage clade of the phage proteomic tree (27) and are listed within the "Cyanophage P60 group" in NCBI taxonomy (8, 68). Because P60 was isolated from an estuarine environment and is known to infect Synechococcus species (21) that were also abundant in the Chesapeake Bay at the time of sampling (see Fig. S1 in the supplemental material), the similarity to P60-like phage is not surprising. Recent analyses of the taxonomic structure of marine phage communities based on the phage proteomic tree indicated that members of the T7-like Podophage clade are also common in a broad range of oceanic environments (6). In contrast to the ICTV taxonomic distribution, the phage proteomic tree comparison identified a slightly higher proportion (11%) of virioplankton sequences as most similar to Siphophage families. However, at the time of this analysis the phage proteomic tree contained 167 phages whose genomes had been sequenced (27) and did not include the cyanomyophages P-SSM2 and P-SSM4 or the cyanopodophage P-SSP7. The percentage of siphophage homologs might be closer to the ICTV percentage if the metagenome were compared to a proteomic tree which included these cyanophages.
Chesapeake Bay metagenome sequences were divided into 17 functional categories according to the gene function of the highest-quality BLAST alignment. Each functional category was further divided based on the likely taxonomic position of the BLAST homolog (viral, bacterial [no evidence of prophage in the genome region of the best BLAST homolog], prophage [bacterial genome match to an annotated or suspected prophage], or mobile element [transposon or plasmid]) (Fig. 4). Of the 2,195 sequences with a BLAST homolog in the GenBank nt/nr databases, 86% (2,010 sequences) were annotated. Among the 39% "known" virioplankton sequences, functional categories for virion structure, replication/recombination, virion assembly, and nucleotide metabolism each represented between 6 and 15% of the known BLAST homologs. Only a small fraction of sequences were homologous to functional groups outside those directly related to viruses (assembly, structure, and lysogeny) or nucleotide metabolism (biosynthesis, DNA modification, replication, recombination, and transcription), and these sequences originated almost entirely from viruses or bacterial genomes with no evidence of prophage in the genomic neighborhood of the match. However, unknown or hypothetical proteins were the most dominant functional class (36% of the functionally classified sequences) in the Chesapeake Bay virioplankton library (Fig. 4). These findings are similar to those reported for other long-read viral metagenome libraries (13, 14, 16) and support the "unknown" nature of extant DNA virus diversity.
![]() View larger version (12K): [in a new window] |
FIG. 4. Viral metagenome translated BLAST homologs sorted according to annotated functional gene category. Each sequence was assigned to a presumptive functional category based on the highest-quality sequence homolog. The most likely phylogenetic affiliations (virus, bacteria, prophage, and mobile element) for each sequence category are indicated. Asterisks indicate categories that could not contain prokaryote sequences because they are purely viral functions.
|
The relative abundance of prophage-like sequences contrasts with the low number (36 sequences) of lysogeny-related functional genes identified in the Chesapeake Bay metagenome library. Furthermore, the small number of Siphoviridae-like sequences identified also suggested that lysogeny was not prevalent in the Chesapeake Bay at the time of sampling, as discussed above. One possible explanation for this inconsistency may be the subject databases and methods used for functional assignments. Specifically, the vast majority of annotated genes in the GenBank nr database come from cellular organisms and not viral genome sequences. As a result, BLAST similarity searches of viral metagenome sequences would be more likely to find a homolog in the larger cellular sequence set. Thus, even virus-derived metagenome sequences may appear to be most similar to prokaryotic (i.e., host) sequences, because their exact homologs are not represented by a viral sequence in the subject database. Another explanation may be that bacteriophage groups other than Siphoviridae contribute more to bacterial lysogeny in this environment. If this is the case, then a clear signal of lysogenic frequency may be difficult to discern from viral metagenome sequence data. To discern between unrecognized, genuine lysogeny and overannotation of prophage sequences, future viral metagenomic investigations should be directly coupled with induction experiments to measure the level and ascertain the identity of inducible prophages in the community at the time of sampling.
Comparison of Chesapeake Bay metagenome sequences with specific phage genomes revealed that nearly 14% of the library had significant homology to cyanophages, while a much smaller fraction of the library was homologous to known noncyanophage viruses (Table 3). Mapping of metagenome sequences onto the genome of P-SSM2 revealed that the vast majority of identified open reading frames in this cyanophage had homologs in the Chesapeake Bay library (Fig. 5). The position map revealed particular functions that were found with increased frequency in the metagenome library, such as phage structural genes, nucleotide metabolism and replication genes, and a gene (psbA) encoding the core photosystem II D1 protein.
|
View this table: [in a new window] |
TABLE 3. Frequency of Chesapeake Bay metagenome BLAST homologs to virus genomes
|
![]() View larger version (12K): [in a new window] |
FIG. 5. Positions of Chesapeake Bay virioplankton BLAST homologs on the Prochlorococcus phage P-SSM2 genome. Regions with high levels of coverage are indicated by brackets. Only translated BLAST homologs with E values below 10–6 are shown. psbA, core photosystem II reaction center protein; nrdA&B, alpha and beta subunits of ribonucleoside reductase.
|
![]() View larger version (27K): [in a new window] |
FIG. 6. Phylogenetic tree of PsbA amino acid sequences deduced by a comparison of viral metagenome sequences with PsbA amino acid sequences derived from public databases. The tree is based on alignment of 187 homologous positions. Scale bar = 0.05 substitution per position.
|
0.5% (43, 58) and 2.4% (58) of the total genome length, with the highest percentage seen in the cyanopodophage P-SSP7. Comparing these percentages to the 2.8% psbA length fraction in the Chesapeake Bay metagenome, it appears that the vast majority of cyanophages carried psbA at the time of sampling and that the strategy for maintaining host photosystem functionality during infection may be nearly universal among cyanophages in the Chesapeake Bay. Recently, examination of over 30 cyanophages showed that 88% of them carried psbA, and the propensity for these phages to carry both psbA and psbD appeared to coincide with the host specificity and/or genome size of a given strain (59). Broad-host-range cyanomyophages with larger genomes carried both genes, while narrow-host-range cyanopodo- and cyanosiphophages carried only psbA. Combining the 2.8% psbA length fraction of cyanophage homologous sequence in the library with the fact that psbD was rarely observed in the library (10-fold-fewer significant BLAST homologs than psbA) indicated that the Chesapeake Bay cyanophage assemblage was dominated by small-genome, narrow-host-range cyanopodophages at the time of sampling.
In contrast to the high degree of genetic homology to known cyanophages, contig spectra community analysis suggested that there was a virioplankton assemblage that was evenly distributed among thousands of genotypes, with the most abundant genotype accounting for <0.1% of the community (Table 4). For the assembly of all reads, the power law model estimated that there were 4,110 and 1,650 total genotypes for average genome sizes of 50 and 125 kb, respectively. PFGE of virioplankton assemblages indicated that the sizes of the viral genomes in samples used for library construction ranged from
30 to
250 kb and that there were two major subpopulations: viruses with moderate-size genomes (
30 to
60 kb) and viruses with larger genomes (
125 to
250 kb) (Fig. 7). Fifty-kilobase genomes were most abundant, while 125 kb was the mean size of all genomes observed (unweighted for abundance). In all assemblies, the estimates for the most abundant genotype ranged from 0.04 to 0.6% of the total community (Table 4), resulting in evenness estimates very close to 1 and Shannon diversity indices near the maximum value allowed for the estimated number of genotypes (55). Recent assessments of virioplankton richness based on short-read sequences resulted in similarly divergent conclusions, i.e., a high level of BLAST homology to cyanophage genomes and an extremely diverse global ocean community containing between 57,600 and 129,000 viral genotypes estimated by contig spectral analysis (6). These discrepancies likely reflect the relative sensitivities of the two types of analyses. While tBLASTx tolerates a greater range of sequence diversity because it relies on translated amino acid sequences, contig spectra rely on the outcome of a high-stringency nucleotide sequence assembly.
|
View this table: [in a new window] |
TABLE 4. Community analysis of Chesapeake Bay viral metagenome based on contig spectraa
|
![]() View larger version (46K): [in a new window] |
FIG. 7. PFGE gel of virioplankton concentrates used to construct the Chesapeake Bay metagenome library. The numbers above the lanes indicate the stations in the bay (see Materials and Methods for the location of each station). For station 858, surface and bottom samples are indicated by the suffixes "s" and "b," respectively. The numbers on the left indicate marker band sizes (in kilobases). Marker lanes M contained concatemers of phage genomes (with resolvable bands at positions ranging from 291 to 48.5 kb) mixed with a HindIII digest of genomic DNA (23.1 and 9.4 kb). The viral concentrate from the bottom water sample at station 858 was not used in construction of the metagenome library in this study.
|
This report describes the first detailed examination of an estuarine viral metagenome. An extensive effort was made to place observed genetic diversity in a functional and taxonomic context. Overall, the results demonstrate the unique capabilities of long-read metagenomic sequence data for characterization of natural viral communities. The large amount of unknown and novel DNA sequence observed in this study dramatically underscores the fact that extant gene diversity among dsDNA viruses is poorly constrained and illustrates the need to develop additional approaches that move beyond cataloging and on to determining the ecological significance and evolutionary advantages conferred by the genes to the viruses that carry them.
We thank Mya Breitbart and Forest Rohwer for useful suggestions on the contig spectrum analyses. We are also indebted to Larry Tindell, Leo Genyuk, Jaysheel Bhavsar, and Sowmya Vijayaraghavan for computational assistance and to the captains and crew of the R/V Cape Henlopen for assistance during research cruises.
Published ahead of print on 5 October 2007. ![]()
Supplemental material for this article may be found at http://aem.asm.org/. ![]()
Present address: Ocean Sciences Department, University of California Santa Cruz, Santa Cruz, CA 95064. ![]()
Present address: J. Craig Venter Institute, Rockville, MD 20850. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»