ABSTRACT
Salmonella enterica is represented by >2,600 serovars that can differ in routes of transmission, host colonization, and in resistance to antimicrobials. S. enterica is the leading bacterial cause of foodborne illness in the United States, with well-established detection methodology. Current surveillance protocols rely on the characterization of a few colonies to represent an entire sample; thus, minority serovars remain undetected. Salmonella contains two CRISPR loci, CRISPR1 and CRISPR2, and the spacer contents of these can be considered serovar specific. We exploited this property to develop an amplicon-based and multiplexed sequencing approach, CRISPR-SeroSeq (serotyping by sequencing of the CRISPR loci), to identify multiple serovars present in a single sample. Using mixed genomic DNA from two Salmonella serovars, we were able to confidently detect a serovar that constituted 0.01% of the sample. Poultry is a major reservoir of Salmonella spp., including serovars that are frequently associated with human illness, as well as those that are not. Numerous studies have examined the prevalence and diversity of Salmonella spp. in poultry, though these studies were limited to culture-based approaches and therefore only identified abundant serovars. CRISPR-SeroSeq was used to investigate samples from broiler houses and a processing facility. Ninety-one percent of samples harbored multiple serovars, and there was one sample in which four different serovars were detected. In another sample, reads for the minority serovar comprised 0.003% of the total number of Salmonella spacer reads. The most abundant serovars identified were Salmonella enterica serovars Montevideo, Kentucky, Enteritidis, and Typhimurium. CRISPR-SeroSeq also differentiated between multiple strains of some serovars. This high resolution of serovar populations has the potential to be utilized as a powerful tool in the surveillance of Salmonella species.
IMPORTANCE Salmonella enterica is the leading bacterial cause of foodborne illness in the United States and is represented by over 2,600 distinct serovars. Some of these serovars are pathogenic in humans, while others are not. Current surveillance for this pathogen is limited by the detection of only the most abundant serovars, due to the culture-based approaches that are used. Thus, pathogenic serovars that are present in a minority remain undetected. By exploiting serovar-specific differences in the CRISPR arrays of Salmonella spp., we have developed a high-throughput sequencing tool to be able to identify multiple serovars in a single sample and tested this in multiple poultry samples. This novel approach allows differences in the dynamics of individual Salmonella serovars to be measured and can have a significant impact on understanding the ecology of this pathogen with respect to zoonotic risk and public health.
INTRODUCTION
Salmonella enterica is the leading bacterial cause of foodborne illnesses in the United States, with symptoms including diarrhea, fever, and abdominal cramps that occur 12 to 72 h after infection. There are over one million cases of salmonellosis each year that result in 20,000 hospitalizations, 400 deaths, and an economic burden of $3.3 to $4.4 billion (1–3). While diverse food products, such as sprouts, cucumbers, dried pepper, tuna, and tomatoes, have been increasingly associated with Salmonella outbreaks in the last decade, poultry meat products, eggs, and live poultry remain responsible for many Salmonella infections in humans (4).
Salmonella enterica is an extremely diverse species that has over 2,600 serovars, with each being distinguished by their O (somatic) and H (flagella) antigens (5). Within S. enterica subsp. enterica, there are over 1,500 serovars, and many exhibit different properties with respect to causing animal and human illnesses. For example, the avian-restricted Salmonella serovar Gallinarum poses little risk to public health, unlike the unrestricted Salmonella serovars Enteritidis and Typhimurium (reviewed in reference 6). Some serovars have distinct virulence mechanisms, such as the ability of Salmonella serovars Montevideo and Javiana to induce host DNA damage (7), and many studies have demonstrated differences in antimicrobial resistances between serovars (8–13). There are also serovars that are more prevalent in certain geographical regions; for example, Salmonella Virchow is the third most frequent serovar associated with human illness in Australia, yet it is not commonly found in the United States (14).
Salmonella serovars Kentucky and Enteritidis are both commonly found in retail chicken in the United States (comprising 38.3% and 23.3% of Salmonella spp., respectively) (15); this is important because of their association with illness. While Salmonella Enteritidis is the serovar most frequently associated with human salmonellosis, Salmonella serovar Kentucky is rarely associated with human illness in the United States (16). Further, with regard to antimicrobial resistance, from 2008 to 2015, 61% (141/230 isolates) of Salmonella serovar Kentucky isolates from retail chickens were resistant to three or more classes of antibiotics, whereas the equivalent was true of only 2% (4/230) of Salmonella serovar Enteritidis isolates (17). Collectively, these examples highlight different risks associated with different serovars and demonstrate a need to be able to resolve the precise depth of serovar diversity during routine Salmonella surveillance.
Current surveillance protocols for Salmonella include preenrichment, followed by an enrichment process in a selective medium, such as tetrathionate broth or Rappaport-Vassiliadis broth. Subsequent cultures are then streaked onto selective agar, such as xylose-lysine-deoxycholate (XLD), and 1 to 2 presumptive H2S-positive Salmonella colonies are picked (18, 19). After confirmation that these colonies are Salmonella, they are then serotyped and subtyped to provide strain identification. Thus, the entire identity of the Salmonella population within a single sample is defined by the few colonies initially selected for serotyping. Statistically, this will usually represent only the dominant strain or serovar in that sample. Importantly, this precludes the detection of minority Salmonella serovars by serovars that are more abundant. The cost of increasing the number of colonies to be able to routinely identify minority serovars is prohibitive.
Clustered regularly interspaced short palindromic repeat (CRISPR)-Cas systems are found in ∼45% of sequenced bacterial genomes, including those of Salmonella spp. (20, 21). These systems have been characterized as a prokaryotic adaptive immune system that provides protection from foreign nucleic acids, such as during phage infection (reviewed in reference 22). A key component of CRISPR-Cas systems is the highly variable spacer sequences that reside within the CRISPR array and which are separated from each other by invariant direct repeat sequences. Spacers are derived from foreign nucleic acids in a polar manner by the addition of new spacers to one end of the array. Dynamic differences in spacer composition within a CRISPR array can therefore represent a historical record of the exogenous nucleic acids with which a bacterium has come into contact (23, 24).
Salmonella has two CRISPR arrays, CRISPR1 and CRISPR2, located <10 kb apart (21, 25). The direct repeats and spacers for both arrays are 29 and 32 nucleotides in length, respectively. While the CRISPR-Cas system in this pathogen is no longer adapting to phage via the acquisition of new spacers, the CRISPR arrays are intact and have been maintained within the Salmonella genome (26). CRISPR typing approaches in Salmonella spp. have been shown to have increased discrimination compared to other molecular typing approaches, such as pulsed-field gel electrophoresis, and multilocus variable-number tandem-repeat analysis (25, 27–29). Importantly, the wealth of CRISPR information derived from these studies has revealed distinct serovar-specific spacer compositions. At the isolate level, this specificity has been harnessed for rapid serotyping, using either a bead-based “CRISPOL” assay or using directed quantitative PCR (qPCR) probes (25, 30). To date, CRISPRs have not been used to examine mixed Salmonella populations within a single sample.
Here, we report the development of CRISPR-SeroSeq (serotyping by sequencing of the CRISPR loci), a novel and high-throughput application that exploits serovar spacer differences in Salmonella CRISPRs to reveal the diversity of Salmonella serovars in single samples. We applied this method to investigate the depth of serovar diversity from poultry samples and demonstrate that CRISPR-SeroSeq is a cost-effective approach to identify multiple serovars, including minority serovars present at levels that are orders of magnitude less than the dominant serovar.
RESULTS
CRISPR-SeroSeq can detect a low-abundance S. enterica serovar in a mixed population.To determine the sensitivity of CRISPR-SeroSeq, we mixed together genomic DNA from two Salmonella isolates that represent two different serovars, Salmonella Enteritidis and Kentucky. The spacer compositions of these two isolates are shown in Fig. S3, and there is no spacer overlap between them. We prepared DNA samples such that Salmonella serovar Kentucky genomic DNA was present at different proportions from Salmonella serovar Enteritidis DNA. We were able to detect reads even in the 1:10,000 sample (where Salmonella serovar Kentucky accounted for 0.01% of the sample), and the proportion of reads was within the same order of magnitude as expected (see Fig. 1B and S3). The read counts remaining after 100% barcode selection are listed in Table S1.
Development of CRISPR-SeroSeq to detect background serovars in a mixed population. (A) CRISPR-SeroSeq is an amplicon-based sequencing approach to detect Salmonella serovars. CRISPR arrays consist of invariant direct repeats (diamonds) and highly variable spacer sequences (squares). The CRISPR-SeroSeq primers (black arrows) target the direct repeat sequences, such that the PCR products produce a laddering effect, as seen in the gel image. (B) CRISPR-SeroSeq was tested on mixed populations of genomic DNA from two Salmonella serovars, Enteritidis (blue) and Kentucky (red). Observed reads (Obs.) were calculated based on the proportions of DNA used in each experiment and plotted on the graph next to the expected reads (Exp.).
Generation of a Salmonella CRISPR spacer database.We generated a database of Salmonella CRISPR spacers as outlined in Fig. S2b. Briefly, whole-genome sequences from GenomeTrakr were accessed from NCBI Pathogen Detection and assembled using SPAdes (31). CRISPR spacers were identified using CRISPRFinder (20), extracted, and then added to a database in fasta format. To date, our database represents spacers from 102 different serovars and includes many that are commonly associated with human illness (Table S2). To account for possible differences between strains, CRISPR arrays from five to 10 different isolates or genomes were used to compile a comprehensive spacer list for each serovar. The SRR number for a representative genome from which the CRISPR sequences were derived for each serovar is listed in Table S2.
Prevalence of Salmonella spp. in poultry samples.To determine the functionality of CRISPR-SeroSeq from environmental samples where the diversity and distribution of Salmonella serovars was unknown, we collected and analyzed 48 samples from poultry facilities (Table 1). The presence of Salmonella spp. was highest in the processing plant samples collected at slaughter (100% [8/8 samples were positive for Salmonella]) and low in samples collected at rehang, directly after plucking (25% [2/8]) (Table 2). The rehang samples were not further considered in our experiments. We found more boot sock samples were Salmonella positive (56%) than were direct fecal samples (38%). The lowest most probable number (MPN) measurable by this scheme was 7.5/ml for the boot sock samples and rinsates and 69/g for the fecal samples. Of the four sample types, the highest average MPN was found in boot socks (Table 2).
Sample collection information
Enumeration of Salmonella bacteria in poultry samples
Detection of multiple serovars in a single sample.We performed CRISPR-SeroSeq on a selection of these poultry samples. The read counts remaining after 100% barcode selection are listed in Table S1 and range from 14,089 to 975,892 reads per sample. Sequences yielded an average Q score of 34.07 after barcode filtration, and at the average sequence length of 145 bases, the average expected error was 0.40. CRISPR-SeroSeq analysis of 22 samples showed that 91% of these contained at least two serovars (Fig. 2). On a per-sample basis, sample 57 demonstrated the most diversity, as we identified four distinct serovars, Salmonella Montevideo, Kentucky, Enteritidis, and Typhimurium. Two samples (63 and 66) had three different serovars. The least diverse samples were 37 and 62, where we only detected Salmonella serovar Montevideo.
Identification of multiple Salmonella serovars using CRISPR-SeroSeq in poultry samples from farms and a processing plant. Relative serovar diversity is plotted for each sample, and each serovar, lineage, or group is represented by the colored bars as indicated. The top graph is an enlarged view of the region within the dashed line of the bottom graph. Hse., house; Proc., processing plant.
The most frequently detected serovars were Salmonella Montevideo and Kentucky (Fig. 2). Salmonella serovar Montevideo was found in all samples and Salmonella serovar Kentucky in 86% (19/22) of samples. In all but three of the samples, Salmonella Montevideo was the most prevalent serovar identified. In the remaining three samples, Salmonella serovar Kentucky was the most prevalent. Salmonella serovars Enteritidis and Typhimurium were also identified in 14% and 9% of the samples, respectively, though neither was ever identified as the major serovar in any one sample.
Using CRISPRs to differentiate between different Salmonella strains in a mixed sample.Our data show that we are also able to identify different strains of a serovar in a single sample. Phylogenetic analysis of whole-genome sequences showed that Salmonella serovar Montevideo is diverse and has multiple monophyletic lineages (32). In our study, we identified two of these lineages that can be distinguished based on their CRISPR content. We have termed these Montevideo I (CRISPRs are related to Salmonella serovar Montevideo strain 4441H; GenBank accession no. AESY00000000) and Montevideo II (related to Salmonella serovar Montevideo strain 29N; GenBank accession no. AESW00000000). Two-thirds of our samples harbored both Salmonella Montevideo lineages (Fig. 2). Montevideo I was found in 87% (20/22) of samples and was the prevalent Salmonella serovar in 50% of all samples. Montevideo II was identified less frequently (73% of samples), although it was the major Salmonella serovar in 36% (8/22) of samples.
Salmonella serovar Kentucky is polyphyletic (32, 33), and this is clearly represented in the CRISPR profiles, which have been used to term two groups, I and II (34). Group I was identified in every sample that was positive for Salmonella Kentucky. Salmonella group II Kentucky was only detected in a single sample (sample 80; Fig. 2) that also contained group I Kentucky, with group II being proportionally higher. The findings from Salmonella serovars Montevideo and Kentucky demonstrate that in addition to identifying a serovar, CRISPR-SeroSeq can, in some cases, distinguish between strains of a single serovar within one sample.
Serotyping of single isolates confirms CRISPR-SeroSeq output.To confirm that the major serovar that was detected via CRISPR-SeroSeq (i.e., with the largest total number of reads) represented the dominant serovar in each sample population, we further analyzed the samples from one farm and their cognate processing samples. A single colony was picked from XLD plates, restreaked, and serotyped. In each case, the serotyping results confirmed the CRISPR-SeroSeq data (Table 3). Since serotyping cannot differentiate between different evolutionary lineages, we performed CRISPR typing on these isolates to determine which groups were present. Again, in each case, the dominant group identified by CRISPR-SeroSeq was confirmed (Fig. 3 and Table 3).
Confirmation of dominant Salmonella serovar by serotyping and CRISPR typing
The major serovar or group is confirmed by CRISPR typing of individual isolates. Sequence analysis of both CRISPR arrays was performed on individual isolates that were picked from XLD plates (PCR). Spacer content from CRISPR-SeroSeq is shown below the sequenced array (CSS). The colored boxes each represent a spacer sequence, and uniquely colored boxes represent a unique spacer sequence.
DISCUSSION
Salmonella serovars are remarkably diverse; some, such as Salmonella serovars Enteritidis and Typhimurium, are commonly responsible for human salmonellosis, while others, such as Salmonella serovar Kentucky, often exhibit resistance to multiple classes of antibiotics but infrequently cause human illness (13, 17, 34). With current Salmonella surveillance protocols that rely on one or a few colonies, there is an inability to effectively detect minority serovars within a given sample. To fully comprehend the risks posed by Salmonella in different agricultural systems or environments, serovar diversity should also be considered.
In addition to annual surveillance by the US Drug Administration-Food Safety and Inspection Service (USDA-FSIS), numerous studies have addressed the prevalences of Salmonella serovars in poultry (13, 35–39). As an example, an impressive study of 55 different broiler flocks by the Singer group identified over 10 different serovars via traditional culture and serotyping techniques (35). In that study, 6% of samples were identified as containing more than one serovar. In these cases, it is likely that the two serovars were present in similar proportions. Previous observations and probabilities have suggested that six colonies need to be picked to ensure a 95% probability of finding two serovars that are present in equal numbers in a single sample (40). Further, if one serovar is outnumbered 10:1 by another, 32 colonies must be picked to be able to find this minority serovar (40). It has been pointed out that a need exists to be able to more fully characterize Salmonella serovars in poultry (41). However, thus far, no cost-effective approaches have been developed and implemented to examine serovar diversity in poultry at a resolution that allows the identification of multiple serovars in a single sample, and in particular, detection of serovars present at low levels compared to others.
Here, we have developed a novel molecular tool termed CRISPR-SeroSeq, which uses CRISPR loci as a target for amplicon-based next-generation sequencing to reveal the population of Salmonella serovars in a single sample. We tested CRISPR-SeroSeq by investigating the diversity of Salmonella serovars on two poultry farms and in a poultry processing facility. Ninety-one percent of the samples contain more than one serovar, and we found as many as four serovars in a single sample. In 82% of the samples, the major serovar that was detected by traditional colony isolation and serotyping was represented by over 75% of the total Salmonella CRISPR reads for that sample. In one sample (sample 36), spacer hits corresponding to Salmonella serovar Kentucky were only 0.003% of the total spacer sequences for that sequence. In the study using genomic DNA from Salmonella serovars Enteritidis and Kentucky, we were able to detect Salmonella Kentucky spacers when it was prevalent at rates as low as 0.01%, compared to Salmonella Enteritidis. Collectively, our data demonstrate that CRISPR-SeroSeq can detect minority serovars in samples where they are at extremely low levels compared to the dominant serovar. Furthermore, using CRISPR-SeroSeq to separate polyphyletic serovars into different lineages increases the resolution of this tool to allow discrimination between two strains of a single serovar in one sample. In our data set, this distinction occurred 14 times, mostly with Salmonella serovar Montevideo.
A recent study was published that also harnesses the power of next-generation sequencing technologies for mixed serovar identification in individual samples (42). In this work, massive parallel whole-genome sequencing and subsequent serovar identification based on diagnostic single-nucleotide polymorphism (SNP) analysis of two genes, rpoB and ileS, were used. Here, each sample was run on a single MiSeq lane or multiplexed in a HiSeq run. We were able to use this approach in vitro, with different serovars being present in different proportions, to detect serovars prevalent at rates as low as 0.34%. This approach was also successful in vivo and revealed important differences as to serovar colonization in different tissues, including an expansion of Salmonella serovar Typhimurium in the majority of tissues tested. There are two major benefits of CRISPR-SeroSeq compared to massively parallel whole-genome sequencing. First, as shown here, is the ability to multiplex 30 samples on a single MiSeq lane and thereby maintain realistic sequencing costs (∼$50/sample). In other unpublished studies, we increased our multiplexing capacity to 36 samples. Second, the use of a PCR-based approach has increased the sensitivity of this assay. In our pilot study, we identified a serovar that constituted just 0.01% of the sample, and in the poultry work, one serovar accounted for just 0.003% of Salmonella spacer reads. The depth of sequencing coverage compares to the sensitivity of CRISPR-SeroSeq. Samples with the lowest sequencing reads (Table S1) were able to detect serovars at 1.8% and 0.16% of total spacer reads (samples 33 and 44, respectively), whereas samples with higher sequencing read numbers had a great detection capacity (e.g., sample 36, 405,246 reads could detect 0.003% spacer reads to call a serovar). Future analyses will address the depth of sequencing coverage that is optimal for CRISPR-SeroSeq to be maximally effective and therefore the extent to which samples can be multiplexed.
There are some limitations with CRISPR-SeroSeq that preclude its use for precisely quantifying individual serovars. First, as an amplicon-based sequencing approach, it is susceptible to inherent PCR biases that may occur between different spacer sequences which are more pronounced than SNP-based analysis of a single-locus target, such as 16S rRNA. Second, as determined by our group and others, there are many spacers that are shared between serovars. For example, Salmonella serovars Typhimurium and Heidelberg share multiple spacers, though each has unique spacers which differentiate the two serovars (26). As part of our pipeline, only unique spacers were used to determine the presence of a serovar. The serovar was then quantified by counting reads of all spacers that belong to that serovar, including ones that were shared with another serovar, so long as that other serovar did not also have unique spacers present in the sample. Thus, if two serovars are present, we cannot attribute shared spacers to one serovar or another. When examining the CRISPR spacer composition of the top 15 serovars that cause salmonellosis, CRISPR-SeroSeq would be able to differentiate between each serovar but not between Salmonella serovar Typhimurium and its monophasic variant, I 4,[5],12:1-. One alternative approach to address this would be to amplify the entire array sequence. Although this would provide strain information, the dramatic differences in array sizes among different serovars and isolates of a single serovar (26) would result in PCR bias that would preclude abundance analyses. These limitations aside, CRISPR-SeroSeq is able to detect multiple Salmonella serovars in individual samples at a much greater depth than by the traditional culture-based approach followed by serotyping.
There are many potential applications of this technique, including the detection and monitoring of minority serovars, as they potentially increase in abundance within a given ecosystem. The identity of major serovars in commercial poultry has fluctuated over the last several decades and occurs mostly in response to eradication or targeting of one or a few serovars (reviewed in reference 43). As an example, it is hypothesized that the eradication of the poultry-restricted Salmonella serovars Gallinarum and Pullorum allowed Salmonella serovar Enteritidis to inhabit the resulting ecological vacuum, and it subsequently became prominent in poultry (44). This serovar, unlike its predecessors, is a major foodborne pathogen and is most frequently associated with salmonellosis in humans (16). Thus, being able to more accurately predict which minority serovars may successfully establish themselves following mitigation strategies targeted at a dominant serovar has clear implications in public health.
The most common serovars identified in this study were Salmonella Montevideo and Kentucky. Salmonella Montevideo ranks in the top 10 serovars that cause human illness (16), though it is most often associated with beef products rather than poultry (37). We expect that our results are due to the limited number of flocks that we investigated as part of this study and are not necessarily representative of nationwide surveillance data. We identified Salmonella serovar Kentucky in 86% of samples, and in one house (samples 41, 42, and 44), this was the major serovar. Salmonella serovar Kentucky is currently the serovar most frequently isolated from poultry (retail meats and broiler carcasses) (37, 45), so for this serovar, our data are consistent with nationwide trends.
Salmonella serovar Kentucky is polyphyletic (32, 33) and represented by two groups, group I (that includes multilocus sequencing-typed sequence type 152 [ST152] strains) and group II (ST198), each with distinct CRISPR arrays (that do not share any spacers) and vastly different antibiotic resistance profiles (34). A recent study showed that Salmonella group I serovar Kentucky isolates, but not those of group II, are frequently associated with domestic food animals, including poultry (46). Outside the United States, the ST198 type (group II) of Salmonella serovar Kentucky is commonly linked to human illness and has been found to be ciprofloxacin resistant (47, 48). A study of multiple poultry flocks in Nigeria showed that all of the identified Salmonella serovar Kentucky isolates matched ST198 (49). Collectively, these data suggest that Salmonella group II Kentucky isolates pose a greater risk to human health than do group I isolates. If this is the case, there is a rationale for specifically monitoring group II in domestic poultry populations. In the current study, we identified a robust signal for Salmonella group II Kentucky from a single sample from the processing facility, demonstrating that while Salmonella group I Kentucky is the dominant Salmonella Kentucky group identified, group II isolates are also present in poultry, albeit at a much lower abundance.
Amplicon-based next-generation sequencing (NGS)-based approaches, such as 16S rRNA gene sequencing and cognate microbial community profiling, have unequivocally transformed the field of microbial ecology. However, 16S rRNA gene analysis is unable to distinguish between Salmonella serovars, and CRISPR-SeroSeq will allow future studies to monitor how serovar populations change relative to each other in a single system or environment. CRISPR-SeroSeq serves as a valuable tool to the Salmonella community by being able to perform such population analyses. Furthermore, such experiments may provide an opportunity to address whether certain serovars cooperate with each other in different ecosystems, which has recently been explored using next-generation sequencing approaches (50).
Salmonella enrichment in different media, such as tetrathionate broth (TTB) and Rappaport-Vassiliadis (RV), or at different temperatures, can bias Salmonella detection (51). Additionally, a recent study examined the differences in serovar survival in response to changes in the pH of preenrichment medium (52). The identification of Salmonella-positive colonies based on the development of black H2S-producing colonies on selective medium, such as XLD or its derivatives, risks the overlooking of Salmonella spp. that form atypical colonies, or the introduction of user biases when selecting colonies for serotyping. To circumvent these issues, CRISPR-SeroSeq could be used to reveal a more accurate picture of how different serovars respond to various enrichment and selective media. Approaches, such as CRISPR-SeroSeq and massive parallel whole-genome sequencing, have the capacity to lend themselves well to longitudinal studies that investigate fluctuations in the relative abundances of serovars. As a PCR-based approach, CRISPR-SeroSeq might be able to detect Salmonella serovar diversity in preenriched samples, and this may eliminate possible enrichment biases, though this remains to be tested.
Another useful application of CRISPR-SeroSeq could be the rapid identification of multiple serovars associated with a single outbreak. Since 2012, there have been 12 multistate outbreaks where more than one serovar was reported, according to the Centers for Disease Control and Prevention's National Outbreak Reporting System (https://wwwn.cdc.gov/norsdashboard/). Most notably, six serovars have been implicated in an outbreak associated with kratom in 2018, and four serovars were implicated in a Salmonella outbreak in papayas in 2017.
The CRISPR-Cas system in Salmonella spp. is no longer adapting to foreign DNA; rather, the remnants of the system demonstrate differential spacer composition that is ideal for use in typing studies (26). In this work, we have demonstrated that CRISPR-SeroSeq is a valuable tool for identifying low-abundance Salmonella serovars in a single sample. CRISPR loci have been categorized in many bacteria, including those that are pathogenic. Similar CRISPR interrogation has been successful in Escherichia coli (57). CRISPR-SeroSeq could be adapted for use in other species with high CRISPR diversity and clear correlations within strains or serotypes. In such cases, spacer databases would need to be generated to reflect potential differences in spacer content between geographically distinct strains.
MATERIALS AND METHODS
Sample collection.Samples were collected in July 2016 from four poultry houses across two farms and subsequently from a processing plant that was processing the same flocks sampled in the poultry houses (Table 1). Each poultry house held between 5,900 and 10,300 broilers, and samples were collected using boot socks and also by collecting chicken feces. For the boot sock sample, sanitized boots were covered by a boot sock, which was then moistened with sterile water. Four pairs of boot socks were collected per house, with each pair corresponding to walking the length of the building on each side, and two along the center on either side of the water lines. Each pair of boot socks was stored in a plastic bag on ice until processing. Fecal samples were collected along these same four lines, with each sample comprising five individual fecal samples, and these were also stored in plastic bags on ice.
The processing plant samples were collected at two stages, at evisceration, before the birds were plucked; and at rehang, directly after plucking. Each bird was placed into a large plastic bag and rinsed with buffered peptone water (BPW) for 1 min. For the slaughtered preplucked birds, 400 ml of BPW was used, and for the plucked birds, 100 ml was used. The rinsates were stored on ice until the samples were processed.
Sample enrichment and Salmonella identification and enumeration.A total of 50 ml of BPW was added to the boot sock samples and massaged for 2 min. For the fecal samples, 81 ml of BPW was added to 9 g of feces, and this was also massaged for 2 min. For enrichment, we followed a procedure similar to that described by Berghaus et al. (35). For all four sample types (boot sock, fecal, slaughter rinsate, and plucked rinsate), 0.2 ml of the sample in BPW was added to 4.5 ml of tetrathionate (TT) broth and cultured in a shaking incubator overnight at 42°C. The next day, 0.2 ml of the TTB culture was used to inoculate 4.5 ml Rappaport-Vassiliadis (RV) broth, and samples were again cultured overnight at 42°C. Ten-microliter loops were used to streak enriched cultures onto xylose-lysine-deoxycholate (XLD) agar plates that were then incubated at 37°C. Plates were observed for the development of black colonies. A presumptive Salmonella colony was picked from each plate and reconfirmed on another XLD plate. Isolates were grown overnight in LB broth and stored at −80°C in glycerol. The isolates were serotyped by the National Veterinary Services Laboratories.
Most probable number (MPN) analysis was used to enumerate Salmonella bacteria. For each sample, the enrichment cultures were performed in a 3-fold dilution series in triplicate. The MPNs were calculated as described in reference 53. For clarity, an overview of sample enrichment procedures is shown in Fig. S1.
DNA isolation, amplification, and sequencing.DNA was isolated from 1.5 ml of enriched cultures in enrichment broth using the Wizard genomic DNA kit (Promega, WI), according to the manufacturer's instructions. Genomic DNA was resuspended in 200 μl of water and stored at −20°C. The two-step PCR scheme for CRISPR-SeroSeq is shown in Fig. 1A. Two microliters of DNA template was used in the first PCR with 1 unit Taq polymerase (New England BioLabs, MA) and 2 nmol dinucleoside triphosphates (dNTPs; New England BioLabs, MA) in 25 μl, using the following primers: 5′-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCGCGCCAGCGGGGATAAACC-3′ (CSS-F) and 5′-GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGCTGGCGCGGGGAACAC-3′ (CSS-R). These primers contain sequences to allow for the addition of Illumina adaptors and dual barcodes in a second PCR step, according to the Illumina Nextera protocol (Illumina, CA). After each PCR, 5 μl of the reaction mixture was examined by gel electrophoresis to confirm the presence of “laddering” that forms when amplifying CRISPR arrays. The remaining PCR sample was purified either with a PCR-cleanup column (Omega Bio-Tek, GA) or using Agencourt AMPure beads (Beckman Coulter, CA), according to the manufacturer's instructions. Up to 30 samples were multiplexed, and sequencing was performed on an Illumina MiSeq platform in a 1 × 300-bp configuration. A 10% PhiX spike was included in the sequencing run.
For the pilot study, genomic DNA was extracted, as described above, from strains of Salmonella serovars Enteritidis (08E0786 [54]) and Kentucky (P1427264 [46]). The following amounts of DNA were used as the template for the CRISPR-SeroSeq PCR: 1:100 (5 ng Salmonella Enteritidis, 0.05 ng Salmonella Kentucky), 1:1,000 (5 ng Salmonella Enteritidis, 0.005 ng Salmonella Kentucky), and 1:10,000 (5 ng Salmonella Enteritidis, 0.0005 ng Salmonella Kentucky).
For analysis of CRISPR arrays from individual isolates, genomic DNA was isolated from 600-μl cultures of single isolates in LB, as described above. PCR amplification of CRISPR1 and CRISPR2 was performed as described previously (34, 54). Sanger sequencing reads were assembled using Lasergene SeqMan (DNAStar, WI), and the spacer content was visualized using an Excel-based macro (55). This macro was also used to visualize the spacers derived from CRISPR-SeroSeq.
Bioinformatic analysis.Demultiplexed sequences from the Illumina MiSeq system were further purified by removing reads with less than a 100% dual-indexed barcode sequence match. This purification and selection were carried out using the BBMap short-read aligner (56).
The CRISPR-SeroSeq pipeline is shown in Fig. S2a. Briefly, quality-filtered reads were compared to the Salmonella CRISPR database locally using BLAST, with the spacer database being used as the query against the sample reads. Salmonella spacers are typically 32 bp in length, so a perfect match was scored as 60.2. Matches with a ≥60.2 score were grouped according to serovar and whether the reads were unique to that serovar. Serovars that were predicted to be present in the sample were ranked in order from highest to lowest confidence, and the results were written to Excel. To increase stringency, spacer hits of <0.0015% of the corrected read number listed in Table S1 were excluded from analysis.
Redundant reads, where a spacer was shared between two serovars, were collectively grouped and analyzed separately. Here, a spacer was only considered to count toward a particular serovar if there were also unique reads from other spacers of that serovar in the same sample. If a spacer was shared between two or more serovars that also each had unique reads, the redundant spacer counts were discounted from further analysis.
ACKNOWLEDGMENTS
We thank Elizabeth Garcia for her help in performing some of the CRISPR typing experiments; Jacob Marogi, Hallie Rauch, and Claire Woodward for their contributions in generating the Salmonella CRISPR spacer database used in this study; and John Mooney for his help with initial pipeline development. We are also grateful to Phillipe Horvath (Danisco) for sharing the macro that was used to visualize CRISPR spacers.
This work was funded by a USDA-NIFA grant to Nikki W. Shariat (award 2016-69003-24615). This research was also supported in part by a grant to Juniata College from the Howard Hughes Medical Institute through the Precollege and Undergraduate Science Education Program.
FOOTNOTES
- Received 29 July 2018.
- Accepted 21 August 2018.
- Accepted manuscript posted online 31 August 2018.
Supplemental material for this article may be found at https://doi.org/10.1128/AEM.01859-18.
REFERENCES
- Copyright © 2018 American Society for Microbiology.