The Human Protein Atlas
The Human Protein Atlas is a Swedish-based program initiated in 2003 with the aim to map all the human proteins in cells, tissues, and organs using an integration of various omics technologies, including antibody-based imaging, mass spectrometry-based proteomics, transcriptomics, and systems biology. All the data in the knowledge resource is open access to allow scientists both in academia and industry to freely access the data for exploration of the human proteome.
The Human Protein Atlas consists of eight separate resources, each focusing on a particular aspect of the genome-wide analysis of the human proteins:
- The Tissue resource, showing the distribution of the proteins across all major tissues and organs in the human body
- The Brain resource, exploring the distribution of proteins in various regions of the mammalian brain
- The Single Cell resource, showing expression of protein-coding genes in immune cells and human single cell types based on bulk and single cell RNA-seq
- The Subcellular resource, showing the subcellular localization of proteins in single cells
- The Cancer resource, showing the impact of protein levels for the survival of patients with cancer
- The Blood resource, describing proteins detected in blood and showing protein levels in blood in patients with different diseases
- The Cell line resource, showing expression of protein-coding genes in human cancer cell lines
- The Structure & Interaction resource, showing predicted 3D structures and exploring protein-coding genes in the context of protein-protein and metabolic interaction networks.
The Human Protein Atlas program has already contributed to several thousands of publications in the field of human biology and disease and is selected by the organization ELIXIR as a European core resource due to its fundamental importance for a wider life science community. In addition the Human Protein Atlas has been appointed Global Core Biodata Resource (GCBR) by the Global Biodata Coalition. The Human Protein Atlas consortium is mainly funded by the Knut and Alice Wallenberg Foundation.
The full publication list is available here.
Tissue
This resource of the Human Protein Atlas focuses on the expression profiles in human tissues of genes both on the mRNA and protein level. The protein expression data from 44 normal human tissue types is derived from antibody-based protein profiling using conventional and multiplex immunohistochemistry. All underlying images of immunohistochemistry stained normal tissues are available together with knowledge-based annotation of protein expression levels. The protein data covers 15302 genes (76%) for which there are available antibodies. The mRNA expression data is derived from deep sequencing of RNA (RNA-seq) from 40 different normal tissue types.
More information about the specific content and the generation and analysis of the data in the resource can be found on the Methods Summary.
Learn about:
- protein localization in tissues at a single-cell level
- if a gene is enriched in a particular tissue (specificity)
- which genes have a similar expression profile across tissues (expression cluster)
Example:
FCAMR
Fc fragment of IgA and IgM receptor.
Selective microvilli expression in proximal renal tubules, group enriched in kidney and lymphoid tissue at the mRNA level.
Brain
The Brain resource gives an overview of protein expression and distribution in the mammalian brain. Externally and “In-house” generated data are integrated to explore regional protein expression in the human, pig and mouse brain. Protein expression data are based on quantification of messenger RNA using RNA sequencing techniques and in situ hybridization. Protein distribution data are generated using antibody-based immunohistochemistry and immunofluorescence techniques. The brain resource can be utilized to create an overview of regional and cross species expression of proteins of interest or can be used to identify regional or functional clustered genes based on expression levels across regions of the brain. More information about the specific content and the generation and analysis of the data in this resource can be found in the Methods Summary.
Learn about:
- Expression levels for all human proteins in regions and subregions of the human brain
- Expression levels for all proteins with human orthologs in regions and subregions of the pig and mouse brain
- Brain enriched genes with higher expression in any of the regions of the brain compared to peripheral organs
- Regional enriched genes with higher expression in a single or few regions of the brain
- Cell-type and cell-compartment distribution of selected proteins in the human and mouse brain
- Differences in gene expression between mammalian species
Example:
NECAB1
N-terminal EF-hand calcium binding protein 1.
Subsets of neurons show distinct somato-dendritic immunoreactivity throughout the brain. The image show protein location in subsets of neurons in the hippocampus of mouse brain.
Single Cell
The Single cell resource contains four different transcriptomics datasets with single cell focus. Expression profiles are provided across human tissues and cell types, utilizing single cell RNA sequencing, cell sorting, single nuclei RNA sequencing, and bulk RNAseq deconvolution analyses.
Single cell type
This part of the Single cell resource contains Single cell type information based on single cell RNA sequencing (scRNAseq) data from 31 human tissues including peripheral blood mononuclear cells (PBMCs). The data is linked to in-house generated immunohistochemically stained tissue sections presented in the Tissue resource in order to visualize the corresponding spatial protein expression patterns. The scRNAseq analysis was based on publicly available genome-wide expression data and comprises all protein-coding genes in 557 individual cell type clusters corresponding to 15 different cell type groups, and 81 different cell types. A specificity classification was performed to determine the number of genes elevated in these single cell types. The genes expressed in each of the cell types can be explored in interactive UMAP plots and bar charts, with links to corresponding immunohistochemical stainings in human tissues.
More information about the specific content and the generation and analysis of the data can be found on the Methods Summary. A cluster comparison between the HPA pipeline and Tabula Sapiens can be found here.
Learn about:
- mRNA and protein expression in single cell types
- if a gene is enriched in a particular cell type (specificity)
- which genes have a similar expression profile across cell types (expression cluster)
Example:
TSPY2
Testis specific protein, Y-linked 2.
Selective nuclear expression of spermatogonia at the protein level, enriched in spermatogonia at the mRNA level.
Tissue Cell Type
The Tissue Cell Type section contains cell type expression specificity predictions for all human protein coding genes, generated using integrated network analysis of publicly available bulk RNAseq data. A specificity classification is used to predict which genes are enriched in each constituent cell type within an individual tissue. The data can be explored on a tissue-by-tissue basis, together with in-house generated immunohistochemically stained tissue sections. In addition, a core cell type analysis focuses on the cell types found in all, or the majority, of the profiled tissues, e.g., endothelial cells or macrophages. Here, genes with predicted specificity in these core cell types in multiple tissues are detailed. More information about the specific content and data analysis in the section can be found in the Methods Summary.
Key Publications:
Norreen-Thorsen M et al. (2022), "A human adipose tissue cell-type transcriptome atlas." Cell Rep 40
Öling S et al. (2024), "A human stomach cell type transcriptome atlas" BMC Biol 22
Dusart P et al. (2023), "A tissue centric atlas of cell type transcriptome enrichment signatures." bioRXiv, pre-print
Learn about:
- if a gene is predicted to have cell type specificity within a given tissue
- which genes have a common cell type specificity prediction within each tissue
- the catalogue of genes with predicted specificity in core cell types across tissues
Example:
KRTAP2-1
Keratin associated protein 2-1.
Selective expression in hair follicle cortex cells at the protein level, mRNA specificity prediction in skin: hair follicle cortex cells.
Immune Cell
The Immune Cell section contains single cell information on genome-wide RNA expression profiles of human protein-coding genes covering various B- and T-cells, monocytes, granulocytes and dendritic cells. The transcriptomics analysis covers 18 cell types isolated with cell sorting and includes classification based on specificity, distribution and expression cluster across all immune cells. More information about the specific content and the generation and analysis of the data in the section can be found in the Methods Summary.
Learn about:
- if a gene is enriched in a particular immune cell type (specificity)
- which genes have a similar expression profile across the immune cells (expression cluster)
- the catalogue of genes elevated in each of the immune cell types
Example:
CD82
The expression of the tumor metastasis suppressor CD82 in 18 different types of immune cells and PBMC.
Single nuclei brain
This resource contains brain cell type expression profiles based on single-nuclei RNA sequencing (snRNAseq) data covering 11 brain regions, including 2,5 million cells. The snRNAseq analysis, is based on genome-wide expression data published by Siletti K et al. (2023), representing the human brain with over 3 million cells and 461 clusters. In the Human Protein Atlas, the data is represented by 34 superclusters.
A specificity classification was conducted to determine the number of genes with an elevated expression in these 34 main clusters across the brain regions, as displayed in the pie chart and table on this page. Gene expression and clustering within these different clusters can be further explored at the Human Brain Cell Atlas v1.0, where individual cell types, regions and clusters are provided. The integration of this extensive brain snRNAseq data facilitates easy comparison to the cell type expression profiles across different datasets. The Single cell type resource enables comparison to “whole body” cell types, which includes integrated peripheral tissues and cerebral cortex snRNAseq data from Allen Brain Map. This Single nuclei brain data is expanding the brain cell details further, enabeling an in-depth comparison of clusters and regional variation across the human brain.
More information about the specific content, data generation, and analysis methods can be found in the Methods Summary, and in data details.
Example:
SLC17A7
SLC17A7, a marker for excitatory neurons highly expressed in 12 out of the 21 neuronal clusters.
Subcellular
The subcellular resource of the Human Protein Atlas provides high-resolution insights into the expression and spatiotemporal distribution of proteins encoded by 13534 genes (67% of the human protein-coding genes), as well as predictions for an additional 3491 secreted- or membrane proteins, covering a total of 17025 genes (84 % of the human protein-coding genes). For each gene, the subcellular distribution of the protein has been investigated by immunofluorescence (ICC-IF) and confocal microscopy in up to three different standard cell lines, selected from a panel of 41 cell lines used in the subcellular resource. For some genes, the protein has also been stained in up to three ciliated cell lines and/or in human sperm cells. Upon image analysis, the subcellular localization of the protein has been classified into one or more of 49 different organelles and subcellular structures. In addition, the resource includes an annotation of genes that display single-cell variation in protein expression levels and/or subcellular distribution, as well as an extended analysis of cell cycle dependency of such variations.
The subcellular resource offers a database for detailed exploration of individual genes and proteins of interest, as well as for systematic analysis of proteomes in a broader context. More information about the content of the resouce, as well as the generation and analysis of the data, can be found in the Methods summary.
Learn about:
- The subcellular distribution of proteins in human cell lines.
- The subcellular distribution of proteins in human sperm.
- The proteomes of different organelles and subcellular structures.
- Single-cell variability in the expression levels and/or localizations of proteins.
Example:
CCNB1
Cyclin B1.
The protein localizes to the cytosol in human and mouse cells, and is expressed in a cell cycle-dependent manner. The location has been validated by siRNA mediated gene silencing, analysis of GFP-tagged protein and independent antibodies.
Cancer
This resource contains Cancer information based on mRNA and protein expression data from 31 different forms of human cancer, together with millions of in-house generated immunohistochemically stained tissue sections images and Kaplan-Meier plots showing the correlation between mRNA expression of each human protein gene and cancer patient survival. More information about the specific content and the generation and analysis of the data in the resource can be found in the Methods Summary.
Learn about:
- if the mRNA expression of a gene is prognostic for patient survival in each of the cancer types
- if a gene is enriched in a particular cancer type (specificity)
- the catalogue of genes elevated in each of the cancer types
Example:
MKI67
Marker of proliferation Ki-67.
Nuclear expression in varying fractions of tumor cells in all cancer types at protein level and expressed in all cancers at mRNA level. High expression of this gene is associated with unfavorable prognosis in renal, liver and pancreatic cancer.
Blood
The Blood resource cover different aspects blood proteins and presents blood protein levels in patients with different diseases as well as plasma concentrations of the proteins detected in human blood.
Disease
The Human Disease Blood Atlas contains information on blood protein levels in patients with different diseases, and highlights proteins associated with these diseases using differential expression analysis. This version covers a pan-disease study consisting of 1162 proteins quantified by Proximity Extension Assay (PEA) and 146 proteins quantified by isotope dilution strategies based on the addition of recombinant protein fragment standards – the gold standard of quantitative mass spectrometry. Protein profiles have been quantified across 59 diseases with PEA and 12 diseases with targeted mass spectrometry. More information about the specific content and the generation and analysis of the data in the section can be found in the Methods Summary.
Learn about
- comprehensive and precise protein levels in blood covering 59 diseases
- proteins associated with each of the analyzed diseases
Example:
The proteins predicted by the model to be associated with prostate cancer in the pan-cancer study.
Blood Protein
The proteins in blood, specifically the plasma proteome exibit an extraordinarily dynamic range. This range spans over 10 orders of magnitude between the concentration of the most abundant protein albumin (ALB), which acts as a transporter and helps maintain colloid osmotic pressure, and the rarest proteins detectable today, which include interleukins and tissue leakage proteins. Notably, over 90% of the plasma proteome is comprised of the ten most abundant proteins. Along with albumin, these include fibrinogen, involved in blood clotting, and immunoglobulins, mainly involved in immune processes.
Here we present estimated plasma concentrations of the proteins detected in human blood from mass spectrometry-based proteomics studies, published immune assay data, and a longitudinal study based on proximity extension assay (PEA). More information about the specific content, and the generation and analysis of the data in this section can be found in the Methods Summary.
Learn about:
- the plasma levels of blood proteins in a longitudinal study of healthy individuals
- the levels of plasma proteins using immune assays and mass spectrometry-based proteomics
Example:
CP
Ceruloplasmin.
The violin plot shows the concentration in blood for proteins with different types of function based on immunoassays .The red square in the turquoise Transport category indicates the concentration of the glycoprotein Ceruloplasmin, which is involved in iron transport across the cell membrane.
Cell Line
The Cell line resource contains information on genome-wide RNA expression profiles of human protein-coding genes in 1206 human cell lines, including 1132 cancer cell lines. The transcriptomics analysis includes classification based on specificity analysis across 28 cancer types, distribution and expression cluster analysis across all cell lines and for selected cancer types also analysis of similarity of the cell lines to their corresponding cancer type. More information about the specific content and the generation and analysis of the data in the resource can be found in the Methods summary.
Learn about:
- if a gene is enriched in cellines from a particular cancer type (specificity)
- which genes have a similar expression profile across the cell lines (expression cluster)
- the catalogue of genes elevated in each of the cell lines
- which cell line has the most consistent expression profile to its corresponding TCGA disease cohort (i.e., the best cell lines for cancer study)
- cancer-related pathway and cytokine activity of each cell line
Example:
A4GALT
The RNA expression of the gene A4GALT in 1206 cellines grouped according to origin into 28 cancers, a non-cancerous group including other diseases and an uncategorised group including cell lines resulting from immortalization of normal cells, primary cell lines and induced pluripotent stem cells.
Structure & Interaction
This resource contains information about the structure and interactions of proteins and presents predicted 3D structures as well as protein-protein interaction and metabolic networks.
Structure
The Structure section contains information about the predicted three-dimensional structure of 19904 human proteins and their related isoforms. Interactive 3D protein structures based on predictions generated using the AlphaFold source code are shown with the possibility to highlight selected regions and positions in the structure. The Protein Browser tool displays a variety of features for the different isoforms and can be used to select splice variants and highlight protein related features such as known antigen sequences, transmembrane regions and InterPro domains directly on the structures. The amino acid positions of population variants and variants with known clinical relevance in the Ensembl Variation database as well as predicted benign and pathological variants from AlphaMissense can also be displayed. More information about the specific content and the generation and analysis of the data in the section can be found in the Methods Summary.
Learn about:
- the predicted 3D structure of proteins and their related isoforms
- the antigen structure for the majority of the antibodies
- the predicted structure of selected protein features
- the known and predicted missense variants with clinical significance
- the known population variants and predicted benign missense variants
Example:
EGFR
The predicted structure from AlphaFold of the membrane-protein receptor EGFR.
Interaction
The Interaction section presents interaction networks for 15038 genes based on protein-protein interaction data from IntAct, BioGRID, BioPlex and OpenCell that has been integrated with data related to protein expression, location and classification. More information about the specific content and the generation of the data in this section can be found in the Methods summary.
Learn about:
- the interaction partners of proteins
- the predicted and subcellular location of the proteins in the network
- the expression specificity of the proteins in the network
- the interaction partners expressed in the same cell type for genes with specific expression
Example:
HK3
First-level interactions for the gene HK3 with nodes highlighted according to subcellular location.
Metabolic
The Metabolic section enables exploration of protein function and tissue-specific gene expression in the context of the most curated human metabolic network. For proteins involved in metabolism, a metabolic summary is provided that describes the metabolic subsystems/pathways, cellular compartments, and number of reactions associated with the protein. Over 120 manually curated metabolic pathway maps facilitate the visualization of each protein's participation in different metabolic processes. Each pathway map is accompanied by a heatmap detailing the mRNA levels across 40 different tissue types for all proteins involved in the metabolic pathway. More information about the human metabolic network, including how it was generated and what information it provides, can be found in the Methods summary.
Learn about:
- what pathways/subsystems a metabolic gene is part of
- which genes are nearby in the metabolic network
- how the expression of the genes in a pathway/subsystem varies across different tissues
Example:
HK3
A part of the Fructose and mannose metabolism network showing reactions involving the gene HK3.