ABSTRACT
The late embryogenesis abundant (LEA) family is composed of a diverse collection of multidomain and multifunctional proteins found in all three domains of the tree of life, but they are particularly common in plants. Most members of the family are known to play an important role in abiotic stress response and stress tolerance in plants but are also part of the plant hypersensitive response to pathogen infection. The mechanistic basis for LEA protein functionality is still poorly understood. The group of LEA 2 proteins harbor one or more copies of a unique domain, the water stress and hypersensitive response (WHy) domain. This domain sequence has recently been identified as a unique open reading frame (ORF) in some bacterial genomes (mostly in the phylum Firmicutes), and the recombinant bacterial WHy protein has been shown to exhibit a stress tolerance phenotype in Escherichia coli and an in vitro protein denaturation protective function. Multidomain phylogenetic analyses suggest that the WHy protein gene sequence may have ancestral origins in the domain Archaea, with subsequent acquisition in Bacteria and eukaryotes via endosymbiont or horizontal gene transfer mechanisms. Here, we review the structure, function, and nomenclature of LEA proteins, with a focus on the WHy domain as an integral component of the LEA constructs and as an independent protein.
INTRODUCTION
Liquid water loss is one of the most life-threatening abiotic stress conditions, as it negatively affects all biological functions. It may lead to double-stranded DNA (dsDNA) breaks and oxidative lesions, damage to RNA, protein aggregation, cell shrinkage, and various other deleterious molecular and metabolic changes (1–3). However, organisms in all kingdoms of life have developed mechanisms for resisting or compensating for the effects of desiccation, by either prevention of intracellular water loss or repair of desiccation-linked damage (3, 4). Interestingly, both freezing and hypersaline stresses are related to dehydration stress, in that each leads to a state of low intracellular water potential (aw) (5), despite the fact that physical cause of the low intracellular aw status differs for each imposed condition, i.e., external low %RH atmosphere (aridity-induced desiccation), osmotic imbalance (hypersalinity), or an intracellular water phase transition (freezing). In the case of the phase transition, not only is the remaining intracellular water incapable of supporting normal physiological processes (6), but the formation of ice crystals in intracellular spaces can physically damage the integrity of the cell vessel (7).
The synthesis of the osmotically active late embryogenesis abundant (LEA) proteins is one of the better-known mechanisms involved in organismal protection against abiotic stresses, including cold and desiccation stresses (4, 5, 8–10). LEA proteins were first identified in cotton (Gossypium hirsutum) as proteins that accumulated during the late maturation stages of seed development (11). Extensive research for over 3 decades has demonstrated that the accumulation of the hydrophilic LEA proteins is not only restricted to embryonic tissues but is prevalent in vegetative plant tissues under water deficit conditions (10, 12–18). Although the functions of many LEA proteins are not fully understood, the consensus is that members of this protein family play an important role in organismal stress tolerance, particularly in dehydration and cold stress, by acting as chaperones to protect other cell proteins and membrane structures (19, 20). LEA proteins have been found in numerous organisms across a wide taxonomic and evolutionary spectrum, including bacteria, nonfilamentous fungi, plants, and vertebrates (4, 15, 21, 22), while LEA-like proteins have been detected in nematodes and filamentous fungi (21).
CLASSIFICATION OF LEA PROTEINS
LEA proteins are present in many organisms, but neither their structures nor their functional mechanisms are fully understood, leading to (rather unhelpful) references to this protein family as a “continuing conundrum” (22) or as being “enigmatic” (21). The classification of LEA proteins is equally controversial. The original grouping of LEA proteins is based on common structural features, which were first identified in the prototypical cotton plant (G. hirsutum) (11), but subsequent alternative classifications have led to large inconsistencies with respect to the original taxonomy (23). LEA proteins have been variously assigned to three major groups associated with their taxonomic origins, i.e., plants, bacteria, and vertebrates (22), while other classifications yield five (23) or seven (21) major groups, with nine to 14 LEA subgroups (16, 24–26). The different classification structures have been based on the analysis of transcripts (27, 28), amino acid sequences and conserved motifs (26, 29, 30), three-dimensional protein structures, or chemical characteristics, including in silico analyses of protein or oligonucleotide probability profiles (POPP) (16, 22, 23, 25, 26, 29–33). Despite the different classification strategies for LEA proteins, most primary structures share similar biophysical features, most prominently the high levels of hydrophilicity (11, 34). Using database searching, it has been shown that the criteria of a Gly content greater than 6% and a hydrophilicity index of greater than 1 include most LEA proteins in the more widespread group of hydrophilin proteins (21).
The most well-characterized group of these highly hydrophilic proteins is LEA group 2, also commonly referred to as LEA 14 proteins (21). A review by Battaglia et al. (21) presents a tabular summary of the differing nomenclature in the historical literature. We note that the EMBL-EBI database uses the term LEA_2 (PFAM PF03168).
The LEA group 2 proteins include the functionally important dehydrin proteins (35). A range of abiotic stress conditions, including drought, cold, and salinity stresses, are known to upregulate dehydrin gene expression and dehydrin protein levels (35). For example, the expression of LEA 5 and LEA 14 (termed cDNA D95 by Galau et al. [36]) was highly induced in mature leaves of water-stressed plants (36).
STRUCTURE, BIOCHEMISTRY, AND FUNCTION OF THE LEA 14 PROTEINS
LEA 14 proteins all contain a conservative N-terminal sequence and form amphipathic α-helical structures (37). Nuclear magnetic resonance (NMR) microscopy of LEA 14 showed the presence of an αβ-fold consisting of one α-helix and seven β-strands that form two antiparallel β-sheets (16). This structure was later confirmed for the LEA 14 protein from the rubber tree (Hevea brasiliensis), which also showed a single-α-helix and seven-β-sheet configuration (38).
In plant tissues, LEA proteins are expressed constitutively, at low but varied levels, through all developmental stages but with no obvious tissue specificity (37). However, these levels may be greatly upregulated in their response of imposed stresses. For example, LEA 14 expression was found to be strongly induced by dehydration and NaCl and abscisic acid treatments in sweet potato (Ipomoea batatas) plants (39). Quantitative real-time PCR (RT-PCR) also revealed a variety of different I. batatas LEA14 expression patterns under various abiotic stress conditions. Stress-induced upregulated expression of LEA 14 also induced secondary phenotypic changes in fibrous sweet potato roots, particularly by enhanced lignification (39). LEA protein expression is also upregulated as part of the plant hypersensitive response, activated by microbial infections (25). After infection by Aspergillus flavus and Aspergillus parasiticus, maize (Zea mays L.) showed upregulated expression of LEA 3 and LEA 14 proteins, among others (40).
THE WHy DOMAIN, A LEA 14 FAMILY MEMBER
Structure and function of the WHy domain.A number of protein families, particularly the HinI, LEA 8, and LEA 14 proteins, contain a unique domain, the water stress and hypersensitive response (WHy) domain. The WHy domain was so named simply because it was detectable in proteins expressed during the response to desiccation (25). Public databases (NCBI, EMBL, etc.) show long protein sequences (300 to 615 amino acids [aa]) with multiple WHy domains (each of 92 to 140 aa) for many plants (e.g., Arabidopsis thaliana [accession no. NP_181934.1] and Malus domestica [accession no. XP_008394249.1]) and archaea (e.g., Methanotorris igneus [accession no. WP_013799711.1] and Archaeoglobus veneficus [accession no. WP_013683559.1]).
The WHy domain is typically 100 to 165 aa long and approximately 18.6 kDa (24–26). The domain sequence is composed of alternating hydrophobic and hydrophilic residues with an invariant NPN motif near the N terminus and with a secondary structure typical for members of the LEA 14 family, which mostly consists of β-strands with a C-terminal α-helix (16, 38).
It has been shown that the hydrophilins, LEA proteins, dehydrins, and the WHy domain all confer protection against dehydration, possibly through similar mechanisms. In all cases, these proteins appear to bind to cellular structures (such as proteins) and to reduce denaturation and inactivation by acting as “molecular shields,” either by direct binding to protein surfaces and replacement of coordinated water (41–43) or by ordering water molecules around the associated macromolecules (41). The protein binding and water release process represents a well-established mechanism of water-driven entropic stabilization (44) where the entropy of the protein-protein system [S(protein) + S(H2O)] is greater than that of the free protein and/or the denatured protein. The direct binding of hydrophilins to target proteins has been demonstrated by protein-protein cross-linking studies (41).
Interestingly, LEA class 2 proteins and hydrophilins, which exhibit intrinsic structural disorder in solution, also show a cryoprotective effect in freeze-thaw cycles in vitro. The degree of protection appears to rely on both the flexible protein structure and the hydrophilic characteristics of the conserved domains (41, 45, 46).
It has also been shown that dehydrins are able to bind strongly to negatively charged membranes (35). This is thought to be due to the α-helical structure of dehydrins and the exposure of ionic side chains, which interact electrostatically with the negatively charged membrane lipids (47, 48). However, dehydrins also bind both water and ions, acting as buffers during desiccation (49).
The upregulated expression of the WHy protein, and proteins containing this domain, during both abiotic stress and pathogen infection argues for a shared mechanism for these two different stress conditions (25). A recent in vitro study demonstrated that the recombinant WHy protein conferred protection to E. coli against freeze-thaw cycle damage (24), suggesting that this domain has a very broad stress response function.
WHy protein in prokaryotes.Genes encoding a WHy domain protein homologue have recently been identified in both bacteria (e.g., Pseudomonas and Burkholderia spp.) and archaea (e.g., Haladaptatus and Halosimplex spp.) (Fig. 1) (24, 25). In bacteria, these genes usually encode a single WHy domain homologous to the LEA 2 superfamily sequence, typically of around 100 aa.
Phylogenetic relationships among 138 WHy domain-containing protein sequences from the three domains of life. The blue, gold, and green circles represent the WHy sequences of Bacteria, Archaea, and Eukarya (plants), respectively. The maximum likelihood tree was generated using RAxML (54) based on the LG substitution model predicted using PhyML-SMS (58). The tree was visualized using Evolview version 2 (59).
While the primary structure of the bacterial WHy protein structure typically includes nonhomologous N-terminal (20 to 38 aa) and C-terminal (26 to 44 aa) sequences, the protein structures of the multidomain constructs in plants and archaea show inconsistent numbers of amino acids at the flanking termini (Fig. 2). An analysis of up- and downstream sequences (data not shown) for multiple WHy gene homologues suggests that WHy protein genes do not appear be part of any obvious functional island but are randomly located within the bacterial genomes. A similar random location of the WHy-containing protein gene is evident in both plant and archaeal genomes.
Schematic structure of the WHy domain-containing protein. (A) Bacterial protein containing one WHy domain and an N terminus (N-term) and C terminus (C-term). (B) Multi-WHy-domain-containing proteins in plants and archaea with variable terminus sizes and numbers. (C) Sequence alignment of bacterial, archaeal, and plant WHy proteins. Asterisks (*) indicate positions which have a single fully conserved residue, colons (:) indicate conservation between groups with strongly similar properties, and periods (.) indicate conservation between groups with weakly similar properties.
REVISITING THE EVOLUTIONARY HISTORY OF THE WHy DOMAIN
Proteins containing the WHy domain have been reported to be widespread in the genomes of archaea, bacteria, and plants but are apparently absent from fungal and animal genomes. The current hypothesis for the evolutionary origin of this domain postulates that WHy domain-containing proteins originated in plants and that the prokaryotes acquired the WHy-encoding gene via horizontal gene transfer in two separate events (i.e., for archaea and bacteria [25]). This hypothesis was based on the premise that proteins containing WHy domains are a part of the hypersensitive response system activated in plants after microbial infection, and that the prokaryotic distribution of the WHy domain is dominated by plant-pathogenic or symbiotic species, such as those of Pseudomonas and Burkholderia, which may have acquired the protein as a mechanism to allow the prokaryotic symbiont to evade the plant hypersensitive response system (25). It was also suggested that the presence of the WHy-containing hinI gene in the green alga Chlamydomonas reinhardtii (accession no. AV395132) represents further support for the plant origin of the domain. However, both manual screening and a protein domain search using the SMART protein database (http://smart.embl-heidelberg.de/) (50, 51) showed that neither the translated HinI protein (accession no. AV395132) nor the putative_Hin1_116192 protein (GenBank accession no. AT1G32340.1) of Chlamydomonas reinhardtii contains homologues of any known WHy domain (data not shown).
In order to determine the possible origin of the WHy domain within the three domains of life, all 709 WHy domain-containing protein sequences included in the SMART protein database were obtained. Of these, 138 nonredundant proteins (see Table S1 in the supplemental material), selected with a similarity threshold of 98% using JelView 2 (52), were used to reconstruct the possible ancestry of the WHy domain using the FASTML web server (53).
Contrary to earlier hypotheses (25), our phylogeny and the ancestral sequence reconstruction using FASTML suggested that the WHy domain most likely originated among the archaea. The archaeal protein sequences M0CGC5 and E7QNG4 from Haladaptatus paucihalophilus and Halosimplex carlsbadense (Fig. 1 and Table S1) were predicted to be the most ancient of the WHy domain-containing sequences included in the analyses. The phylogeny of the WHy domain-containing proteins reconstructed using RAxML (54) suggests that plants may have initially acquired the domain from archaea and subsequently via bacterial lineages. Although the first lateral gene transfer events most likely occurred between archaea and plants, the possibility of subsequent horizontal gene transfers between the three domains is evident from the tree topology (Fig. 1).
The mechanism of the horizontal gene transfer (HGT) process, potentially that underlying the distribution of the WHy domain within plant and bacterial taxa, might be explained by endosymbiotic theory (55). We note that endosymbiotic theory suggests that the earliest Eukarya, anaerobic mastigotes, may have originated from permanent whole-cell fusions between archaea (e.g., Thermoplasma-like organisms) and eubacteria (e.g., spirochaete-like organisms) (56, 57). Such a mechanism provides a lateral gene transfer (LGT) pathway which is compatible with our suggestion of an ancestral origin of the WHy gene in archaea, and with the subsequent proliferation of the gene product (as a domain in larger protein constructs) in plant proteins.
ACKNOWLEDGMENTS
We thank the following organizations for financial support: the University of Pretoria, the South African Technology Innovation Agency, and the National Research Foundation.
FOOTNOTES
- Accepted manuscript posted online 25 May 2018.
Supplemental material for this article may be found at https://doi.org/10.1128/AEM.00539-18.
- Copyright © 2018 American Society for Microbiology.
REFERENCES
Author Bios

Jasmin Mertens was born and educated in Germany. She received her M.Sc. in biology at the Georg August University in Göttingen (Germany) with a major in microbiology. Under the supervision of Professor Rolf Daniel, she completed her thesis focusing on the identification of bacterial antibiotic-resistance-conferring genes from different metagenomic libraries. During her Ph.D. in molecular biology at the University Hospital in Göttingen, she investigated the functional relevance of single nucleotide polymorphisms (SNPs) in the human genome of hepatitis C virus (HCV)-infected patients. Since March 2015, she has worked as a Postdoctoral Fellow in the Centre for Microbial Ecology and Genomics (CMEG) under the leadership of Professor Don A. Cowan at the University of Pretoria. Her research experience has been in the areas of microbiology and molecular genetics, with her main research interest focused on applied metagenomics and biotechnology.

Habibu Aliyu was born and raised in Kaduna, Nigeria. He received a B.Sc. in biological sciences and an M.Sc. in plant breeding from Ahmadu Bello University Zaria. He received a Ph.D. in genetics from the University of Pretoria (UP) in 2016, where he studied the genomics of the polyextremophilic Nesterenkonia spp. isolated from Antarctica. He was a UP Postdoctoral Researcher at the Centre for Microbial Ecology and Genomics, UP (2016–2017), and presently is a Georg Foster Postdoctoral Research Fellow at the Institut für Bio- und Lebensmitteltechnik (Bereich II: Technische Biologie), Karlsruhe Institute of Technology (KIT), Germany, working on the genomics of hydrogen-producing bacteria. He has spent the past two years studying the phylogenomics of the thermophilic geobacilli. His major research interest is the application of multiomics to elucidate the determinants of adaptation strategies in prokaryotes.

Don A. Cowan was educated in New Zealand at the University of Waikato, moving to University College London in 1985. In 2001, he was appointed Professor of Microbiology at UWC, Cape Town, South Africa, where he was a Senior Professor and Director of the Institute for Microbial Biotechnology and Metagenomics. He joined the University of Pretoria in 2012 as Director of the Institutional Research Theme (Genomics) leading the Centre for Microbial Ecology and Genomics (CMEG). He has published over 210 research papers, review articles, and book chapters and sits on the editorial boards of 8 international journals. He is currently the President of the Royal Society of South Africa. His projects include the general fields of environmental microbiology, genomics, functional metagenomics, and applied enzymology, currently focusing on the molecular ecology of hot (Namibian) and cold (Antarctic Dry Valley) desert soils, metagenomic gene discovery, and the genomics and genome analysis of extremophilic bacteria.