The housekeeping proteome

A large number of proteins are essential for all cells throughout the human body. These proteins are sometimes called housekeeping proteins, suggesting that their expression is crucial for the maintenance of basic cellular function.

Defining the housekeeping proteome

There are a number of definitions of housekeeping proteins that usually are associated with varying stringency with four main biological criteria - stable expression across samples, essentiality, involved in cellular maintenance and evolutionary conserved. However, the key assumption of any definition of housekeeping genes is that they are expected to be expressed in every cell type in the organism. A transcriptomics analysis of samples from 40 tissues, 81 single cell types and 1132 cancer cell lines grouped into 28 cancer types was used to identify the number of protein-coding genes detected in all analyzed tissues, cell types or cancer cell line groups, respectively. The results of the analysis are shown in Table 1, which also presents numbers corresponding to genes with less variance in expression across samples based on exclusion of genes classified as enriched or elevated in the RNA expression categorisation. The overlaps between genes classified as Detected in all in the different datasets are shown in the Venn diagram in Figure 1.

Table 1. The genes with expression in all tissues, single cell types or cell lines including or excluding enriched and elevated genes.

Category TissuesSingle cellsCell linesOverlap
Detected in all 8899500095714779
Detected in all excluding enriched 8584473795094450
Detected in all excluding elevated 6719208691581867

Figure 1. Venn diagram showing the overlaps between genes classified as Detected in all in the three different data sets. Corresponding gene lists can be obtained by clicking the numbers in the plot

Tau score is another method for measuring tissue specificity that does not depend on expression cut-offs and results in a specificity value between 0 and 1, where 0 means broadly expressed and 1 means specific expression. The bar charts in Figure 2 show overlaying data representing of the Tau scores for genes belonging to the three different Detected in all categories in Table 1, for the three different datasets.

Figure 2. The Tau scores for genes corresponding to the three categories of Detected in all in Table 1 overlayed in bar plots for the three different data sets. The content of different overlays is shown using mouse-over and corresponding gene lists can be obtained by clicking in the bar plots.

Functions of housekeeping proteins

House-keeping proteins exist in all classes of proteins but are clearly overrepresented in those involved in basic cellular functions such as gene expression and regulation, metabolism and cell structure. Below are treemap plots showing overrepresented functions for house-keeping genes in the three different data sets followed by examples of classes of house-keeping proteins.

          Tissues                                                         Single cells                                                  Cell lines mitochondrial electron transport, NADH to ubiquinone vesicle coating ribosome disassembly nuclear pore organization RNA splicing, via transesterification reactions U2-type prespliceosome assembly primary miRNA processing protein targeting to ER cotranslational protein targeting to membrane retrograde protein transport, ER to cytosol endoplasmic reticulum to cytosol transport vesicle targeting, rough ER to cis-Golgi COPII vesicle coating Golgi to endosome transport mRNA methylation pseudouridine synthesis lipoprotein biosynthetic process cellular response to leucine starvation lipoprotein biosynthetic process mitochondrial electron transport, NADH to ubiquinone negative regulation of DNA- templated transcription, elongation protein targeting to ER regulation of protein neddylation RNA splicing, via transesterification reactions vesicle coating vesicle targeting, rough ER to cis- Golgi cytoplasmic translation cap-dependent translational initiation exonucleolytic catabolism of deadenylated mRNA tRNA surveillance regulation of protein localization to Cajal body positive regulation of protein localization to Cajal body RNA capping 7-methylguanosine cap hypermethylation mitochondrial RNA 3'-end processing pyrimidine nucleotide-sugar transmembrane transport establishment of mitochondrion localization, microtubule-mediated ribosomal small subunit export from nucleus ribosomal large subunit export from nucleus Golgi transport vesicle coating COPI-coated vesicle budding COPI coating of Golgi vesicle snoRNA localization snRNA modification protein ufmylation regulation of non-motile cilium assembly regulation of stress granule assembly 7-methylguanosine cap hypermethylation cytoplasmic translation exonucleolytic catabolism of deadenylated mRNA negative regulation of glucocorticoid receptor signaling pathway positive regulation by host of viral genome replication pyrimidine nucleotide- sugar transmembrane transport regulation of protein localization to Cajal body regulation of stress granule assembly regulation of translation in response to endoplasmic reticulum stress ribosomal small subunit export from nucleus snoRNA localization snRNA modification aerobic respiration mitochondrial respiratory chain complex I assembly NADH dehydrogenase complex assembly snRNA metabolic process tRNA aminoacylation maturation of LSU-rRNA regulation of telomere maintenance via telomere lengthening nuclear envelope organization vesicle coating mitotic sister chromatid cohesion lysosomal lumen acidification DNA unwinding involved in DNA replication RNA splicing, via transesterification reactions U2-type prespliceosome assembly regulation of mitochondrial translation regulation of protein neddylation mitochondrial RNA metabolic process vesicle targeting, rough ER to cis-Golgi COPII vesicle coating protein localization to centrosome regulation of telomerase RNA localization to Cajal body aerobic respiration cellular response to misfolded protein mitochondrial respiratory chain complex I assembly mitochondrial RNA metabolic process negative regulation of DNA- templated transcription, elongation protein quality control for misfolded or incompletely synthesized proteins regulation of DNA damage checkpoint regulation of mitochondrial translation regulation of telomerase RNA localization to Cajal body RNA splicing, via transesterification reactions tRNA aminoacylation for protein translation vesicle coating vesicle targeting, rough ER to cis- Golgi

Figure 3. Treemap plots of GO biological processes based on gene set enrichment analysis for the three different data sets consisting of Detected in all genes

Transcription and translation

An easily understood class of housekeeping proteins are those involved in the genetic machinery of gene expression, e.g. RNA polymerases and ribosomal proteins, essential for transcribing and translating the DNA into proteins. It is intuitive that without these genes the cell and organism cannot function at all.

RNA Polymerases

The RNA polymerases are enzymes responsible for synthesizing RNA copies from a DNA template by the process of transcription. In eukaryotic cells, transcription takes place in the cell nucleus, illustrated in the images below showing distinct staining of RNA polymerase II subunit A (POLR2E) in the nucleus of every cell. Some of these RNA transcripts are further processed into messenger RNAs (mRNA), the direct templates for any protein, which are exported to the cytoplasm where translation takes place. Out of the 34 polymerase proteins (KEGG PATHWAY: hsa03020), 31 are found to be expressed in all tissues.

Figure 4. Immunohistochemical staining showing the nuclear localization of the polymerase protein POLR2E.

Ribosomal proteins

The ribosomal proteins form the ribosome complex together with ribosomal RNA (rRNA). The role of the ribosome complex is to translate the genetic code of the mRNA molecules into proteins. Translation is facilitated through a reading of the combination of three base codons of the mRNA, each codon coding for an amino acid, and the formation of a resulting peptide chain, which when done, will be post-processed to be turned into a functional protein. Translation occurs in the cytosol, isolated from transcription. Out of all 180 ribosomal proteins, 176 are found to be expressed in all studied tissues.

Figure 5. Immunohistochemical staining of ribosomal protein RPL17 in liver, showing the cytosolic localization of the protein.

Metabolism

Apart from being able to translate DNA into functional proteins, a cell also needs to extract energy from organic matter and to utilize the energy to construct necessary components. These diverse and essential processes are together referred to as metabolism.

Citric acid cycle

The citric acid cycle is a central part of the metabolic pathway that converts organic matter from carbohydrates, proteins and fats into chemical energy through a series of chemical reactions. The enzymes that catalyze these reactions are apt examples of housekeeping proteins, since all cells require energy to survive and function. Out of the 30 genes involved in the citric acid cycle (KEGG PATHWAY: hsa00020) 27 are expressed in all tissues. Genes that are exceptions always have variants that are expressed in all tissues, as exemplified by the pyruvate dehydrogenase complex subunits PDHA2 (expressed exclusively in testis) and PDHA1 (ubiquitously expressed).

Figure 6. The citric acid cycle takes place in the matrix of the mitochondria, illustrated here by the immunohistochemical staining of SDHB.

Mitochondrial proteins

The main location for energy production in the cell is the mitochondria where, among other pathways, the citric acid cycle takes place. The mitochondrion is an unusual organelle, since it is semi-autonomous, in that it contains its own genome, and has a separate machinery for protein synthesis, while, however, the majority of its genes have been transferred to the nuclear genome. Since the mitochondrion, with its central part in energy production, is crucial for cell survival, most proteins involved in its function and structure are considered to be housekeeping proteins.

Structural proteins

Many proteins involved in the basic structure of the cell are expressed ubiquitously in all cell types, since all cells naturally need certain structures and scaffolds to function. Structural proteins can have numerous functions, but one crucial and obvious housekeeping function is providing rigidity to the cell and to maintain its shape.

Cytoskeleton

The cytoskeleton is a scaffold present in the cytoplasm of all cells, consisting of different types of filaments. The cytoskeleton is also highly involved in the movement of cellular components. Since many specialized uses of the cytoskeleton are present in various cells, far from all genes associated with the cytoskeleton are expressed everywhere. For instance the myosin heavy chains are involved in muscle contraction, and are thus exclusively expressed in muscle tissues. However many of the components are necessary for basic cell functionality and expressed everywhere.

Location of housekeeping proteins

The location of the housekeeping proteins in the three different data sets were analysed using membrane and signal peptide prediction methods and antibody-based immunofluorescence. The predicted location was classified as membrane, secreted or intracellular based on the results of majority decision methods for membrane region predictions (MDM) and signal peptide predictions (MDSEC). Antibody -based assays were used to determine the subcellular location experimentally.

Predicted location

According to the predictions the majority of the housekeeping proteins are, not surprisingly, intracellular proteins. The pie charts in Figure 7 show the results of the analyses for the three different data sets and by clicking on the numbers the gene sets corresponding to the predicted locations can be investigated.

Figure 7. Predicted location of the genes belonging to the Detected in all category in the three data sets Tissues, Single cells and Cell lines.

Subcellular location

Immunofluorescence (ICC-IF) and confocal microscopy was used to determine the subcellular location of housekeeping proteins in Tissues, Single cells and Cell lines. The pie charts in Figure 8 show that the majority of the analysed proteins reside in the cytoplasm or nucleus

Figure 8. Subcellular location of the genes belonging to the Detected in all category in the three data sets Tissues, Single cells and Cell lines. Genes without experimental data are classified as N/A.

Relevant links and publications

UhlĂ©n M et al., Tissue-based map of the human proteome. Science (2015)
PubMed: 25613900 DOI: 10.1126/science.1260419