1 Embryogenesis: a complex developmental process
Embryogenesis is a critical step in the life cycle of the plant during which a multicellular organism is formed from a single cell by division, growth and differentiation [1,2]. Most of the essential developmental processes are initiated within the embryo and will be reiterated throughout the life of the plant. This complex process of seed development can be conceptually divided into three main steps. The first step will establish the pattern of the embryo by rapid cell division. The second step is characterized by cell expansion and massive synthesis and accumulation of reserves [3,4]. Ultimately, the seed will desiccate, arrest its development and enter into dormancy [5]. In addition, the development of the three components of the seed, namely the maternal integuments, the triploid endosperm and the diploid embryo, has to be coordinated to produce a viable seed. It is not surprising therefore that the expression of a large number of genes is required for seed development, as revealed by genetic analyses. Following mutagenesis, a substantial proportion of plants do not produce a normal seed set and as a consequence, homozygous plants cannot be obtained. Typically, the presence of 1/4 of aborted seeds in the siliques represents the principal characteristic of most emb mutations. The fertilized ovule is the first homozygous diploid stage of a mutation and the disruption of any gene associated with essential cellular functions will lead to seed abortion. Hence, in addition to the expected (and initially desired) seed-specific functions associated with the three phases of embryogenesis, seed mutants also represent the main source for the identification of essential genes in Arabidopsis (Arabidopsis thaliana). The group of David Meinke [6] has collectively named the seed mutations as embryo-defective (emb) and the corresponding genes as EMB, whose functions are essential for seed viability. Based on the number of emb mutations resulting from mutagenesis in relation to the size of the Arabidopsis genome, the average size of a gene and genetic mapping of these mutations, the number of essential genes has been estimated to be in the range of 500 to 1000 [7].
2 Essentiality and embryogenesis: identification of genes essential for seed development
To obtain information concerning the genes governing this complex developmental process several approaches have been used to identify the defective genes in embryonic mutants. The first EMB genes were identified by chromosome walking in mutant lines created by chemical/physical mutagenesis [8,9]. With the production of large-scale insertional mutant collections and the optimization of the methodologies to determine the sites of T-DNA insertion [10–13], the number of EMB genes that have been characterized by forward and reverse genetics has dramatically increased in the recent years. Large-scale screens for seed mutants have been performed by several laboratories [12,14–16]. The results of this international effort have been compiled in a common database called SeedGenes database (http://www.seedgenes.org), of which Dr. David Meinke is the curator. The SeedGenes database contains molecular and phenotypic information on essential Arabidopsis genes giving a seed phenotype when disrupted by mutation [17]. The work of my group has focused on the genetic analysis of the emb lines from the Arabidopsis T-DNA insertion lines of the INRA-Versailles collection. The emb lines identified among 16,000 T1 plants fell into five major classes named according to the morphology of the embryo: preglobular (14%), globular (34%), heart-transition (23%), torpedo-cotyledon (27%) and disorganized embryo morphology (2%) (Fig. 1). The two critical steps in early development (preglobular) are the resumption of cell division following fertilization [18,19] and the formation of the protoderm (embryonic epidermis) by periclinal division [20]. Arrest of the embryo at the globular stage represents the most frequent phenotype of emb mutants and the screening of independent mutant collections has also identified this stage as being the most sensitive step of embryo development [12,21]. The globular-to-heart transition is rich in developmental and metabolic events. Here, the embryo will acquire a bilateral symmetry by emergence of the cotyledons, a relative metabolic autonomy and a photosynthetic capacity associated with the differentiation of proplastids into chloroplasts. The first step of differentiation of the proplastids occurs during embryogenesis at the globular-to-heart transition phase [22]. The emb phenotypes in the heart and torpedo classes may be more related to seed-specific functions involved in seed maturation and desiccation. At the time of writing this review (July 2008), there were 219 confirmed EMB genes in the SeedGenes database corresponding to 27.5% of preglobular, 26.1% of globular, 11.4% of heart-transition and 35.1% of torpedo-cotyledon mutants. The enrichment in the molecular characterization of EMB genes in the torpedo-cotyledon class reflects a greater interest of plant biologists for the later stages of seed development, which are associated with potentially important agronomic traits. This enrichment is mainly to the detriment of the globular mutants, which belong to the class characterized by the less informative phenotype. Furthermore, the more recent use of reverse genetics allowed the rapid characterization of multiple alleles of an EMB gene, showing that in some cases the phenotypic class is correlated to the severity of the allele.
3 Can the minimal gene set concept be applied to eukaryotic unicellular and multicelullar organisms?
A previous release of the SeedGenes database allowed the classification of the function of genes required for seed development and a preliminary comparison with essential genes in other organisms [23]. Essential genes are defined as genes being indispensable to support cellular life. They constitute a minimal gene set required for a living cell and can be considered as the foundation of life itself. In addition, proteins encoded by such genes constitute excellent targets for antibacterial agents, for the development of novel drugs against pathogens and of herbicides together with the development of an “antidote” provided to the crop plants but not to the weeds. Therefore, an identification of the essential gene set in plants is of considerable interest not only to answer fundamental questions but also for therapeutic and agronomic uses.
Curiously, the description of a minimal gene set does not necessarily require systematic functional genomics (for a review, [24]). Instead, this detection rather benefited from large-scale comparative genomic studies. Several theoretical studies have endeavored to derive the minimal set of genes that are necessary and sufficient to sustain a functional cell under ideal conditions. A comparison of the first two complete bacterial genomes, Haemophilus influenza and Mycoplasma genitalium, resulted in a catalogue of the minimal gene set consisting of 256 genes. Similar estimates have been obtained by analyzing viable gene knockouts in Haemophilus influenza, Bacillus subtillis and M. genitalium [25–28]. However, the gene knockout approach revealed that absolute evolutionary conservation does not necessarily equate with gene essentiality. To rigorously assess the gene essentiality in M. genitalium, both individually and in network, the complete chemical synthesis and assembly of the M. genitalium genome has been performed [29]. Although this work represents a considerable technical achievement, the authors have yet to demonstrate that their synthesized genomes will support the life of a bacterium. Information on the basic functions of these genes can be found in the Database of Essential Genes (DEG [30]: http://tubic.tju.edu.cn/deg/). This minimal synthetic organism will possess the complete system for translation, transcription and replication but with other systems such as the repair machinery, the metabolic pathways, the signal transduction apparatus, and the molecular chaperones reduced to a bare minimum. Only 35% of the bacterial minimal gene set could be identified in yeast, as assessed by COGs (Cluster of Orthologous Group). The COG approach is based on the concept that any group of at least three proteins from distant genomes that are more similar to each other than to any other protein within the same genomes most probably belongs to a family of orthologs. This indicates that even among unicellular organisms, the composition of the COGs in the minimal gene set will be divergent between prokaryotes and eukaryotes. This may be due to selective gene loss from clades and horizontal gene transfer (HGT) occurring in prokaryotic evolution whereas HGT is insignificant/implausible in the evolution of eukaryotes. Although the essential house-keeping functions for a prokaryotic or a eukaryotic cell are probably similar, an identification of the genes encoding these functions requires different comparative and functional genomics. Furthermore, defining the minimal gene set without specifying the conditions under which the minimal organism is supposed to survive makes no sense.
The precepts of the minimum gene set in prokaryotes and those of COGs, together with the comparison of eukaryotic genomes, have been applied to the analysis of the Arabidopsis EMB genes. The list of the 219 confirmed EMB genes present in the SeedGenes database has been used in the present analysis. Although the minimal gene set concept may be intuitive for simple and unicellular organisms, the paradigm requires a reappraisal for multicellular organisms. Similarly to the bacterial functional niches, different plant cell types may have different sets of minimal genes. Due to the low complexity of the Arabidopsis embryo composed of few tissues and cell types, most of these genes will not be identified in screens of seeds arrested in development. The first step of seed development involves cell division probably using some products stored in the egg cell and some that are newly synthesised [2]. The second phase requires the de novo production of proteins by transcription and translation. Therefore, we would expect to find among the functions controlled by the Arabidopsis essential genes many common to the essential functions of a prokaryotic cell (described above) with the important caveat that gene duplications have occurred during eukaryotic genome evolution. This implies that the description of the minimal Arabidopsis gene set cannot be solely based on the identification of the EMB genes but requires a combined approach of functional and comparative genomics. However, the molecular characterization of the EMB genes provides the possibility to address the important question of the essentiality of a gene and to establish the criteria on which it is based.
The Arabidopsis genes have been classified in KOGs (euKaryotic Orthologous Group, http://www.ncbi.nlm.nih.gov/COG/grace/shokog.cgi/; [31]), as established by a comparison between seven complete eukaryotic genomes (namely, the genomes of three animals: Homo sapiens, Drosophila melanogaster, Caenorahabditis elegans, of the green plant Arabidopsis thaliana, of two fungi: Saccharomyces cerevisiae and Schizosaccharomyces pombe and of the microsporidian Encephalitozoon cuniculi). We interrogated this KOG database with the list of the 219 Arabidopsis EMB genes. A significant proportion of these genes could not be assigned to any KOGs, a probable consequence of Arabidopsis being the only representative of the photosynthetic eukaryotes. However, there remained 860 KOGs common to the seven eukaryotes. Among these ubiquitous KOGs, 131 are represented by a single orthologous gene. Using this KOG nomenclature, the data presented in Fig. 2 and Table 1 show that the majority of KOGs are involved in translation and transcription machineries, cell division processes, similar to the essential prokaryotic functions. Furthermore, genes involved in metabolism account for a substantial proportion of EMB genes. The same data set was classified according to phenotype, showing strong bias such as predominance of KOGs in translation/transcription processes (J and A) for globular arrested seeds and of KOGs in cellular processes (O) in heart-transition mutants. The torpedo/cotyledon mutants exhibited the highest proportion of EMB genes that do not fall in KOGs (Fig. 2, no KOGs). This may indicate that these genes are involved in seed-specific functions, which are not represented in other eukaryotes. The existence of a correlation between ubiquitous KOGs and essentiality was tested and compared to the results obtained in yeast and C. elegans (Fig. 3). Disruption of genes in ubiquitous KOGs has a greater chance to lead to lethality than in other genes. Upon analysis of the 131 ubiquitous KOGs containing only a single orthologous gene, no enrichment in essential genes was observed (Table 1). This observation is somewhat similar to that documented in prokaryotes showing that gene conservation does not imply essentiality. In the case of Arabidopsis, it suggests that most genes belong to ubiquitous KOGs containing several homologous genes within the same genome but with non-redundant functions.
EMB genes and functional classification by KOGs
Information storage and processing | Number of EMB genes ⁎ |
J: Translation, ribosome structure and biogenesis | 14 |
A: RNA processing and modification | 15 |
K: Transcription | 6 |
L: Replication, recombination and repair | |
B: Chromatin structure and dynamics | |
Cellular processes and signaling | |
D: Cell cycle control, cell division, chromosome partitioning | |
Y: Nuclear structure | 1 (YU) |
V: Defense mechanisms | 0 |
T: Signal transduction mechanisms | 3 |
M: Cell wall/membrane/envelope biogenesis | 1 |
N: Cell mobility | 0 |
Z: Cytoskeleton | 2 |
W: Extracellular structures | 0 |
U: Intracellular trafficking, secretion, vesicle transport | |
O: Posttranslational modification, protein turnover, chaperones | |
Metabolism | |
C: Energy production and conversion | 3 |
G: Carbohydrate transport and metabolism | 5 |
E: Amino acid transport and metabolism | |
F: Nucleotide transport and metabolism | 3 |
H: Coenzyme transport and metabolism | 4 |
I: Lipid transport and metabolism | |
P: Inorganic transport and metabolism | 1 |
Q: Secondary metabolite biosynthesis, transport and metabolism | 3 |
Poorly characterized | |
R: General function prediction only | 18 |
S: Unknown function | 2 |
No KOGs | 71 |
Total number of Arabidopsis KOGs | 3285 |
KOGs represented by one ortholog in seven eukaryotic genomes | 4/131 (expected 8–9 KOG) |
⁎ Total number of confirmed EMB: 219 as to May 2008.
4 How has “essentiality” evolved in the context of whole genome duplication in eukaryotes?
The eukaryotic genomes have evolved by whole genome duplication and subsequent divergence of genes leading to new genes (neofunctionalization) and clade-specific gene losses [32–34]. Orthologs are defined as homologous genes that evolved via vertical descent from a single ancestral gene in the last common ancestor of the compared species. Paralogs are homologous genes that evolved by duplication of an ancestral gene. In addition, the eukaryotic genomes evolved by lineage specific expansion (LSE) [35]. There are 1458 LSEs in the Arabidopsis genome, a number similar to that of human genome (1373), but much higher than that of yeast genome (134). Among the major clusters of Arabidopsis LSEs are the plant-specific kinases, the plant-specific F-box containing proteins and the PPR proteins [36]. LSEs constitute one of the principal sources of organizational and regulatory diversity. The presence of several PPR genes among the EMB genes is in agreement with specialization occurring after gene multiplication rather than redundancy [37].
Paleoploidy or whole genome duplication is widespread in flowering plants and is believed to play an important role in the evolution and diversification of species [38,39]. The Arabidopsis genome has also been subjected to several rounds of polyploidy [34,40–42]. Therefore, the question arises as to what extent have gene divergence and gene loss been imposed on the duplication of the ancestral essential genes. Maintaining duplicates of essential genes may be an advantage and now we have to distinguish the concepts of “essential function” and “essential gene” since knock-out of one orthologous gene will no longer lead to lethality while the function still remain essential. Alternatively, duplication may affect gene dosage and gene expression and has a consequence on gene function. Blanc and Wolfe [43] studied the evolutionary effects of polyploidy on plant gene function in Arabidopsis and suggested that gene loss is the most likely outcome. However, these authors found evidence for a non-random pattern of elimination of duplicated genes. It appears that the duplicates of genes associated with transcriptional function have been preferentially retained, whereas duplicates of genes associated with DNA repair have been preferentially lost. In addition, more than one half of the gene pairs showed significant variation in the pattern of gene expression indicating the possibility of a divergence of function. This observation has been confirmed by a recent work of Ganko et al. [44]. Therefore, one consequence of polyploidy is neo-subfunctionalization. The list of the confirmed Arabidopsis EMB genes has been presently analyzed using the web site at http://wolfe.gen.tcd.ie/athal/dup to identify those genes retaining a paralog in the Arabidopsis genome and those that are now unique. This information was correlated to the function encoded by the genes. Of the 219 EMB genes, 21 (9.6%) have retained a paralog in the Arabidopsis genome. On the whole Arabidopsis genome, less than 27% of the genes are duplicated between the sister genomic regions. Such low values suggest that duplicated EMB genes have been more susceptible to gene loss than any other genes. Although there are some striking examples of duplicated EMB genes such as the two genes encoding the catalytic subunit of DNA polymerase epsilon [19,45] and the three genes encoding the ribosomal protein L8/L2 (EMB2296, Meinke/SeedGenes), there seems to be an advantage for the functionality of an EMB protein to be encoded by a single gene.
Studies in yeast revealed that highly “connected” proteins are encoded by few duplicated genes and that deletion of these genes tends to result in lethality. In agreement with this, the proportion of essential genes is significantly higher among singletons [46]. One aspect of “connectivity” of proteins with essential functions has already been hypothesized in eukaryotes sharing the 131 KOGs [31]. Indeed, nearly all of these functionally characterized KOGs consist of proteins that are subunits of multiprotein complexes. The most prominent of these complexes are those involved in rRNA processing and ribosome assembly together with the pre-40S subunit, as well as the spliceosome and various complexes involved in transcription [31]. The preponderance of multiprotein complex formation among the single gene KOGs is fully compatible with the balance hypothesis [47]. Accordingly, among the list of confirmed Arabidopsis EMB genes, several encode proteins participating in the formation of large macromolecular complexes as subunits of the RNA processing complex (SIG5 [48], AtPARN [49], PRP8 splicing factor, PRP4 [50], RNA helicase [51]), of the proteasomes (CUL1 [52], CUL3a and b [53], RPN1 [54]), of ribosomes/translational apparatus (RPL3 and RPL8A, Meinke, Seedgenes, AML1 = RPS5 [55], aminoacyl-tRNA synthetase [56]) and of molecular chaperones (SLP [57], HSP90, Meinke, Seedgenes). In yeast, a correlation has been established between protein connectivity, essentiality and gene uniqueness. Surprisingly, gene essentiality and gene duplication are not correlated in mammals [58]. The difference seems to be due to the fact that in yeast, singleton-encoded proteins tend to have more interacting partners than in mammals, suggesting that they are more intrinsically essential for the organism [59]. In mammals duplicated genes have a higher connectivity than do singletons suggesting that intrinsically they are more essential. In Arabidopsis, some of the EMB genes participating in large functional complexes still possess their paralogs. They are particular examples of gene diversification of expression or of function following duplication with high connectivity.
5 What about plant–specific essential genes?
The characterization of genes essential for seed development in Arabidopsis constitutes a foundation for further studies such as genome evolution in relation to functional genomics in photosynthetic organisms. The completion of the list of the Arabidopsis essential genes would considerably benefit from a comparison of appropriate genomes [60]. At present, our understanding is limited by the number of species whose genome has been completely sequenced and for which systematic knockouts have been characterized. As a consequence, a large proportion of the Arabidopsis proteins are not found in KOGs and thus comparisons are limited to prokaryotes and eukaryotes in the non-plant kingdoms. Although we can learn important features dealing with the basic cellular house-keeping functions common to all eukaryotes, we presently have great difficulties in determining the minimal gene set for a plant cell. However, with the increasing number of available genome sequences of photosynthetic organisms (e.g., prokaryotes, algae [61,62], and lower [63] and higher plants [64]) combined with functional genomics (such as in rice [65,66]), the description of the minimal gene set for a photosynthetic cell will become a realistic goal.
Acknowledgements
I would like to acknowledge the work of the team of Early embryogenesis in our laboratory: J. Guilleminot, S. Albert, B. Depres, S. Lahmy, V. Delorme, J. Gadea, A. Ronceret, C. Garcion, S. Got and A. Martinez. I thank Michel Delseny for his support and sharing his knowledge on seed development. I am grateful to Georges Pelletier for the access to the Arabidopsis T-DNA lines from INRA Versailles and Roger Voisin and Nicole Bechtold for the management of the INRA collection. The screening and characterization of embryo-defective mutants was in part supported by the AF020 and AF015 GENOPLANTE projects in collaboration with Frédéric Berger and Loïc Lepiniec. I am debited to Thomas Roscoe for his constant support and his “ubiquitous” contribution in the correction of the English of my manuscripts. Finally, I would like to thank David Meinke for my initial two weeks training on emb mutants in his laboratory and his effort in maintaining and updating the SeedGenes collection.