Subcellular resource - Methods summary

Summary: The subcellular resource provides information about the subcellular localization of proteins in human cell lines. It also includes observations of cell-to-cell variabilities in protein expression, as well a detailed analysis of cell cycle-correlations in protein and RNA expression for a subset of the variable genes.

Key publication: Thul PJ et al. (2017) “A subcellular map of the human proteome” Science 356 (6340): aal3321

Learn about

  • The subcellular distribution of proteins in human cell lines.
  • The proteomes of different organelles and subcellular structures.
  • Single-cell variability in the expression levels and/or localizations of proteins.

Data overview

Data type Count Data Coverage (nr genes)
Protein location 49 Protein location data across 13534 genes 13534

How has the data been generated?

The subcellular distribution of each protein is assayed in up to three human cell lines selected from a subset of the cell lines found in the cell line resource of the Human Protein Atlas. The cells are grown in 96-well glass bottom plates (Figure 1). The location of proteins is assayed by indirect immunofluorescence staining (ICC-IF) followed by high-resolution confocal microscopy imaging. The cells are fixed in 4% formaldehyde and permeabilized with Triton X-100. The target protein is targeted by an in-house polyclonal antibody generated within the HPA project, or sometimes using an antibody from a commercial source. The antibody for the target protein is combined with marker antibodies targeting gamma tubulin, to show microtubules, and calreticulin, to show the endoplasmic reticulum (ER), respectively. The nucleus is counterstained with 4',6-diamidino-2-fenylindol (DAPI). The primary antibodies are detected with the help of species-specific secondary antibodies labelled by different fluorophores (Alexa Fluor 488 for the protein of interest, Alexa Fluor 555 for microtubules and Alexa Fluor 647 for ER). The cells are imaged using a laser scanning confocal microscope with a 63X objective. The different fluorophores are displayed as different channels in multicolor images, with the protein of interest shown in green, the nucleus in blue, microtubules in red and ER in yellow.


Figure 1. Schematic overview of the immunofluorescence workflow used in the subcellular resource.

How has the data been analyzed?

All images are analysed manually. This involves describing various aspects of the staining characteristics and classifying the localization of the target protein into one or more of 35 different organelles and subcellular structures. These structures can be recognized by trained experts based on the staining pattern of the target antibody together with the markers. The analysis also involves comparisons of the staining patterns for one antibody in different cell lines, and for different antibodies targeting the same protein, as well as comparisons to external experimental evidence for protein localization found in the UniProtKB/Swiss-Prot database. Main and additional localizations are given a reliability score (Validated/Supported/Approved/Uncertain) that reflect the agreement with external data and potential existence of internal data that can be used for enhanced antibody validation. The individual localization reliability scores finally converge into the overall antibody and gene reliability scores.

  • Enhanced - One or more antibodies are enhanced validated and there is no contradicting data, such as literature describing experimental evidence for a different location.
  • Supported - There is no enhanced validation of the used antibody, but the annotated localization is reported in external literature.
  • Approved - The localization of the protein is partially in agreement with external data, or has not been previously described.
  • Uncertain - The antibody-staining pattern contradicts experimental data or expression is not detected at RNA level.

What data is presented?

For each genes, information about the subcellular localization of the encoded protein is presented under the tab for the subcellular resource. This section includes an overview of RNA expression in different human cell lines and representative confocal microscopy images of the protein stained in up to three of these (Figure 2). Additional assays may include protein localization in ciliated cells, human sperm cells, mouse cells, RNA and protein expression relative to cell cycle progression, and assay-specific antibody validation by co-localization with a GFP-tagged version of the protein and/or by siRNA-mediated knockdown of gene expression. In addition to the gene-centric presentation of data, the subcellular resource provides descriptive chapters for the proteome of each organelle and subcellular structure, as well as for the multi-localizing proteome and for the cell-to-cell variable proteome.


Figure 2. Examples of data presented in the subcellular resource.

Data validation

For the subcellular resource, the observed subcellular localization of the protein is compared to independent experimental data from external sources in the UniProtKB/Swiss-Prot database. Main and additional localizations are given individual reliability scores (Supported, Approved and Uncertain), where Supported reflects agreement with external data, Approved reflects partial agreement or lack of external data, and Uncertain reflects disagreement with external data. The individual localization reliability scores finally converge into the overall antibody and gene reliability score. The highest reliability score, Enhanced, is given to antibodies and genes that have been validated by the use of an independent antibody targeting a different part of the protein, by co-localization with a GFP-tagged version of the target protein, and/or by reduced signal upon siRNA-mediated knockdown of RNA expression.

Read more about the assays and annotations used in the subcellular resource here.