In a recent edition of the Society for General Microbiology magazine, Microbiology Today, Professor W. Ford Doolittle discussed how genomics data are giving microbiologists cause to think very carefully about how to define bacterial species. This text is derived from his thought-provoking article.
Biologists have long struggled with the problem of defining and recognizing species. Even animal and plant species are sometimes difficult to define, and philosophers disagree about the precise meaning of the word species. But microbes seem to pose special problems. Their small size and frequent uncultivability (only a very small percentage can grow in the lab, see previous posts on Soil Bacteria and How many bugs?) confound efforts to describe and archive type specimens. Worse still, prokaryotes reproduce asexually and so are unable to conform to Ernst Mayr’s Biological Species Concept, that is, groups of actually or potentially interbreeding natural populations which are reproductively isolated from other such groups.
So microbiologists have in general been willing to admit that the practical need for identification and naming might best be met by some widely agreed-upon but provisional (and arbitrary) species definition, while the more theoretically driven search for a unifying species concept that would explain patterns of microbial diversity in ecological and population genetic terms could wait for the accumulation of more data, especially gene and genome sequence data. Operationally, molecular biology has won the day, and species are usually expected to share at least 70% binding in standardized DNADNA hybridization and/or over 97% gene-sequence identity for 16S ribosomal RNA (rRNA).
As a bonus, molecular methods allow microbiologists to identify and count microbial species in the environment without isolating and cultivating any of the organisms – for example, using specific PCR primers to amplify and sequence 16S genes from unfractionated environmental DNA preparations. What we often find is an astonishing number and diversity of apparent species, with few 16S sequences assignable at the 97% identity level to any cultivated isolates – including other isolates from the same sampling site. Moreover, many species, as defined by the 97% 16S identity cut-off, will themselves be represented by multiple different but similar individual sequences in any sample.
Such gene sequence microdiversity is the rule rather than the exception with environmental sampling of many microbial genes (in addition to 16S), and may be matched by another kind of variation at the level of genome composition. Martin Polz and collaborators have used pulsed-field gel electrophoresis to show that Vibrio splendidus isolates (with >99% 16S sequence identity) from a sample site on the Massachusetts coast can differ by as much as a megabase in genome size, comprising “at least a thousand distinct genotypes, each occurring at extremely low environmental concentrations (on average less than one per millilitre)”. Gene content diversity has also emerged as a principal message of more traditional complete genome sequence studies, based on cultivated isolates. When such activities began, a decade ago, the thought was that one sequence (that of strain K12) would surely be enough to define Escherichia coli, one would do for Bacillus subtilis, and so on. Completion of the second E. coli genome (that of the O157:H7 strain) gave us a shock. This sometimes lethal food contaminant proved to have 1,387 genes not present in K12, scattered in hundreds of small or large clusters around its genome. Reciprocally, E. coli K12 had 528 genes not present in the O157:H7 strain. The two genomes were otherwise (aside from one inversion) colinear, exhibiting 98% average nucleotide identity between shared genes.
Now that there are more than three dozen bacterial species with more than one strain sequenced, results like this seem to be almost the norm. Strains which are very close on the basis of the sequences of the genes they do share may nevertheless differ by up to 30% in gene content, the differences being attributable to scores or hundreds of events of gene gain (mostly by lateral gene transfer) and gene loss after divergence from a common species ancestor. Although many variable sequences are phages or transposable elements, others are genes vitally important in defining a strains specific niche. Sometimes such genes are transferred together: pathogenicity islands exemplify this, but the phenomenon is not limited to pathogens. Endosymbiotic nitrogen-fixing strains of Mesorhizobium, for instance, possess an approximately 500 kb symbiosis island which encodes not only the dozen and more genes needed to form root nodules, but most or all genes needed for nitrogen fixation and the islands own strain-to-strain transfer.
It has recently become popular to think in terms of species genomes or pangenomes, consisting of a core or backbone of genes shared by all strains and an auxiliary or flexible gene pool found only in one or some strains. For some groups, the number of such auxiliary genes already exceeds the number of genes in the core, and just keeps on climbing with each new genome sequenced. Core genes in contrast get fewer with each new genome. But still there will be hundreds to thousands of core genes for any group we might want to call a species, and they will usually comprise a colinear backbone, within which laterally transferred genes and islands can be seen to be embedded. We might use concatenated (strung-together) core gene sequences to define species with greater precision and nuance than either DNA:DNA hybridization or 16S allow. An average nucleotide identity of greater than 94% for core genes characterizes most species defined by other means. We might also use phylogenetic trees of core gene sequences to establish lineage relationships of strains within a species, although for this, homologous recombination poses a complication. Increasingly, many bacteria (and some archaea) turn out to indulge in promiscuous inter-strain homologous recombination. Recombination means that different genes will have different evolutionary histories, and there will be no unique phylogeny relating the genomes of different strains of a species. The rate of recombination in bacteria will of course never approach that of animals, who must recombine every time they reproduce, but still it can exceed mutation as a generator of evolutionary novelty. This raises the possibility that the Biological Species Concept might appropriately be applied to some bacteria after all, at least in so far as it entails sharing of core genes by recombination in a common gene pool.