Table 3

Representative clustering times for mock-community samples with various algorithms

Clustering methodTotal run time (h:min:s)a
Mock communitybEnvironmental samplec
Distribution-based clustering (complete)1:09:40NA
Distribution-based clustering (parallel)d0:21:317:58:57
De novo (avg neighbor)0:06:36NA
De novo (USEARCH)0:00:230:00:26
Closed reference0:06:091:26:23
Open reference0:06:051:23:25
  • a Times are approximated by the difference between the start time and end time in the shell script examples in the supplemental material. NA indicates that the method was not performed.

  • b The mock community contains 565,498 total reads and 5,489 unique sequences.

  • c The environmental sample contains 7,539,779 total reads and 120,601 unique sequences.

  • d The distribution-based clustering algorithm was the only one that was parallelized; 60 to 100 different processes were run at one time. The other methods would have had improved speeds if run in parallel.