The human brain proteome

The function of the brain, defined as the central nervous system, is to receive, process and execute the coordinated higher functions of perception, motion and cognition that signify human life. The cellular components of the underlying and highly complex network of transmitted signals include neurons and supportive glial cells. Brain tissue includes different cells types as well as the space between the cell bodies, often referred to as neuropil, the meshwork of exons, dendrites, synapses and extracellular matrix that embed the central nervous system cells.

Protein-coding genes are classified based on RNA expression in brain from two different perspectives:

  1. A whole-body perspective, comparing gene expression in the brain to peripheral organ and tissue types
  2. A brain-centric point of view comparing gene expression in the various regions of the brain


Brain expression is compared to other organs and tissues by using the highest expression value of all brain regions. For the regional classification the brain is divided into 14 anatomically defined regions, color coded in Figure 1. The transcriptome analysis shows that 76% (n=15331) of all human proteins (n=20162) are expressed in the brain (based on 14 brain regions, spinal cord and corpus callosum). Regional classification was based on 17832 genes are detected in the brain and included in all used external datasets. Out of the genes with regional expression classification, 1064 are categorized as genes with a regionally eleveted expression. Regional specific summary pages including lists of regional elavated genes can be found here: cerebral cortex, olfactory bulb, hippocampal formation, amygdala, basal ganglia, thalamus, hypothalamus, midbrain, cerebellum, pons, medulla oblongata, spinal cord as well as white matter and choroid plexus.

Figure 1. Midsagittal schematic drawing of the different regions of the human brain, color coded according to the 14 regions.

In addition to the basic regional distribution of gene expression in the human brain, a more detailed overview of gene expression is available. This dataset is based on RNAseq analysis of 966 samples covering 13 regions of the brain. In addition, 165 samples covering 17 areas of the prefrontal cortex can be found. Togehter these datasets provide a detailed overview of protein expression in more than 200 subregions of the human brain. Gene expression in each subregion can be explored on the gene summary page.

The brain elevated genes, comparing brain to other organs and tissue types

For all samples analyzed using the illumina sequencing platform (HPA and GTEx) elevated genes for brain and other tissues and organs was calculated. Out of the 15331 genes detected above cut off in the human brain, 2197 genes have an elevated expression in the brain compared to other tissue types. Tissue specificity category defines protein coding genes with elevated expression levels in the brain, while the tissue distribution category indicate transcript detection above cut off (nTPM≥1). The fraction of all protein coding genes in each category is shown Figure 2 and Table 1.

A

B

Figure 2. (A) The distribution of all genes across the five categories based on transcript abundance in brain as well as in all other tissues. (B) The distribution of all genes across the six categories, based on transcript detection (nTPM≥1) in brain as well as in all other tissues.

Brain expression is determined by expression above cut off (nTPM≥1).

  • Detected in single: Detected only in brain (97)
  • Detected in some: Detected in brain and at least one more tissue, but less than one-third of tissues (1364)
  • Detected in many: Detected in at least a third but not all tissues including brain (4971)

Elevated expression in brain compared to other tissue types is divided into three different categories;

  • Tissue enriched: At least four-fold higher mRNA level in brain compared to any other tissues (475)
  • Group enriched: At least four-fold higher average mRNA level in a group of 2-5 tissues compared to any other tissue (457).
  • Tissue enhanced: At least four-fold higher mRNA level in brain compared to the average level in all other tissues (1265).


Table 1. Number of genes in the subdivided categories of elevated expression in the brain (based on transcript abundance) and the tissue distribution (based on expression above cut off) in the brain.

Distribution in the 36 tissues
Detected in singleDetected in someDetected in manyDetected in all Total
Specificity
Tissue enriched 5519917942 475
Group enriched 026416330 457
Tissue enhanced 38314614299 1265
Total 93777956371 2197


Table 2. The 12 genes with the highest level of enriched expression in the brain and the tissue distribution category for the gene. "mRNA (tissue)" shows the transcript level as nTPM, TS-score (Tissue Specificity score) corresponds to the score calculated as the fold change to the second-highest tissue.

Gene
Description
Tissue distribution
mRNA (tissue)
Tissue specificity score
HCRT hypocretin neuropeptide precursor Detected in single 441.9 1531
AVP arginine vasopressin Detected in some 4430.7 748
PMCH pro-melanin concentrating hormone Detected in some 715.2 691
GRM4 glutamate metabotropic receptor 4 Detected in single 276.8 331
BARHL1 BarH like homeobox 1 Detected in single 29.5 295
GABRA6 gamma-aminobutyric acid type A receptor subunit alpha6 Detected in single 203.5 281
MOG myelin oligodendrocyte glycoprotein Detected in some 656.9 253
OXT oxytocin/neurophysin I prepropeptide Detected in some 2690.9 143
FGF3 fibroblast growth factor 3 Detected in single 13.3 133
NEUROD6 neuronal differentiation 6 Detected in single 38.0 128
SLC6A3 solute carrier family 6 member 3 Detected in single 95.8 119
HAPLN2 hyaluronan and proteoglycan link protein 2 Detected in some 617.5 86


Protein localization of genes with elevated expression in the brain compared to other tissues

In-depth analysis of the elevated genes in the brain, using antibody-based protein profiling, allowed us to understand the distribution of the brain specific genes and their protein location. Proteins expressed by the different cell types of the brain were identified among the genes with elevated expression.

Proteins specifically detected in neurons

Neurons are functional entities in the brain and based on location, morphology and neurotransmitter phenotype divided in many subclasses. In the cerebral cortex ELAV-like protein 3 (ELAVL3) is expressed in all neurons. In contrast, glutamate decarboxylase 1 (GAD1), an essential enzyme in the biosynthesis of GABA, is expressed by cortical GABAergic interneurons only. Protocadherin alpha-1 (PCDHA1) is expressed in cerebral cortex and can be detected in a few sparsely distributed interneuron-like neurons.


ELAVL3

GAD1

PCDHA1

Detailed immunohistochemical analysis of proteins with known molecular functions shows that many brain-elevated proteins are involved in synaptic signaling, such as docking of synaptic vesicles (e.g. synaptophysin (SYP)). Also various known post-synaptic proteins including the GABA B receptor subunit 2 (GABBR2) and proteins involved in organizing and maintaining synaptic connections, such as cell adhesion molecule 2 (CADM2) are encountered. These data underline that events associated with synaptic transmission require specialized proteins, most often with an enriched expression level in the brain compared to peripheral tissue types.


SYP

GABBR2

CADM2

Proteins specifically detected in glial cells

Glial cells can be subdivided into astrocytes, oligodendrocytes and microglia based on morphology and functions.

The well-known astrocyte marker GFAP as well as the unexplored gene TAFA1 are detected in astrocytes of both the white and grey matter. In contrast, the water transporter AQP4 is mainly detected in the grey matter and reveals a neuropil-like staining pattern due to the localization of the protein in numerous glia endfeet.


GFAP

TAFA1

AQP4

Several genes expressed in oligodendrocytes are involved in myelination, such as the compact myelin proteins myelin basic protein (MBP) and proteolipid protein 1 (PLP1). In contrast to the oligodendrocyte transcription factor OLIG2, none of the other investigated myelin sh components are brain specific. MBP and PLP1 are enriched but this is mainly due to the sample composition containing 25% densely myelinated white matter. Expression above cut off is found in several peripheral tissue types and immunohistochemical analysis reveals that this expression mainly represents Schwann cells in peripheral nerves.


MBP

PLP1

OLIG2

The third type of glial cells 'populating' the brain is microglia. These cells are derived from hematopoietic stem cells invading the brain during embryonic development or macrophages that enter the brain from the bloodstream later in life. The well-known microglia genes integrin alpha M chain (ITGAM) and allograft inflammatory factor 1 (AIF1) are not specific nor enriched in the brain but are also expressed, for example, in cells populating the lymph node and bone marrow, the main site of hematopoiesis. Based on our immunohistochemistry analysis we can only identify one microglia gene, purinoceptor P2RY12, enhanced in brain tissue, with low expression in lymph node and bone marrow. These data show the close relationship of microglia and hematopoietic cells reflecting the common developmental origin of microglia and macrophages.


ITGAM

AIF1

P2RY12


Regional expression within the brain

The regional organization of the brain anatomy separates the brain into regions, sub regions, nuclei and layers of specialized cells, enabling the specific function of each individual region. For calculating elevated genes in regions of the brain the HPA brain dataset based on the Transcriptomic data from the different regions facilitates additional classification of the expression within brain. Identical strategy, as used for the tissue type classification, was applied to the regional data resulting in regionally elevated genes (separated into regionally enriched, group enriched and regionally enhanced).

  • 1064 genes classified as regionally elevated
  • 217 genes are brain elevated as well as regionally elevated
  • Cerebellum has the most regionally enriched genes (n=64)
  • 411 regionally elevated genes are elevated in other tissues than brain

Figure 3. An interactive network plot of the regionally enriched and group enriched genes connected to their respective enriched region (black circles). Red nodes represent the number of regionally enriched genes and orange nodes represent the number of genes that are group enriched. The sizes of the red and orange nodes are related to the number of genes displayed within the node. Each node is clickable and results in a list of all enriched genes connected to the highlighted edges. The network is limited to group enriched genes in combinations up to 4 genes and 4 regions, but the resulting lists show the complete set of group enriched genes in the particular region.

Table 3, The 13 regions of the brain and numbers of genes detected above cut off, indicating expression in that brain region, as well as number of genes classified as elevated in each region compared to the others based on transcript abundance in the individual regions (max nTPM of sub regions for that specific region is used as representative). Same classification rules are used for the regional classification as the tissue specificity classification based on tissue types


Table 4. The 12 genes with the highest level of regional enriched expression within the brain and the regional distribution category. Regional distribution indicates if protein is detected in a single or multiple regions of the brain. The mRNA column provides the highest nTPM vallue of the region with highest expression. RS-score (Regional Specificity score) corresponds to the score calculated as the fold change to the second highest region. Note that some of these proteins are highly expressed in only a few samples within a brain region and might be linked to disease or cause of death.

Gene
Description
Regional distribution
mRNA (region)
RS-score
GHRH Growth hormone releasing hormone Detected in single 428.5 542
OXT Oxytocin/neurophysin I prepropeptide Detected in many 2524.3 144
TTR Transthyretin Detected in all 302089.2 82
AVP Arginine vasopressin Detected in many 15977.0 76
WFIKKN2 WAP, follistatin/kazal, immunoglobulin, kunitz and netrin domain containing 2 Detected in all 1253.0 71
ENSG00000288570 Novel protein Detected in single 5.2 52
GCG Glucagon Detected in single 13.7 51
NR5A1 Nuclear receptor subfamily 5 group A member 1 Detected in single 37.9 50
HMGCS2 3-hydroxy-3-methylglutaryl-CoA synthase 2 Detected in single 11.3 43
FEZF1 FEZ family zinc finger 1 Detected in many 270.1 38
FOLR1 Folate receptor alpha Detected in all 993.6 38
SLC6A2 Solute carrier family 6 member 2 Detected in some 192.6 35


Proteins elevated in some regions of the brain


PNOC - Cerebral cortex

ADORA2A - Caudate (basal ganglia)

HDC - Hypothalamus


SLC6A3 - Substantia nigra (midbrain)

TPH2 - Dorsal raphe (midbrain)

ARHGEF33 - Cerebellum


Comparing tissue classification with regional expression in the brain

The majority of brain elevated genes are classified as low regional specificity (n=1965) and 217 genes are brain elevated as well as regionally elevated. Among the genes classified as brain elevated and low regional specificity several glial specific proteins are found, for example GFAP and AQP4 as well as MBP. In contrast, neuronal proteins are more often found among the regionally elevated genes, such as ADORA2A and AVP. Interestingly, there are many brain interesting proteins classified as elevated in other tissues than brain, such as ANK1 elevated in skeletal muscle and TFAP2B elevated in epididymis, as well as the widely expressed CRYAB localized to white matter that is elevated in heart and skeletal muscle. This highlights the importance of mapping expression and localization from multiple perspectives to better understand the biology and brain physiology.


ANK1

TFAP2B

CRYAB

Table 5. Overlap between tissue classification, indicating elevated expression in the brain or not, with the regional specificity within the brain. (The regional classification of human brain expression is limited by available external data, thus do not cover all human protein-coding genes.

Regionally elevated Low regional specificity Missing regional classification Total
Elevated in brain 217 1965 15 2197
Elevated in other tissue but expressed in brain 411 4548 91 5050
Low tissue specificity 55 8117 22 8194
Total 1064 16768 128 15441

Gene expression shared between brain and other tissues

There are 457 group enriched genes expressed in the brain. Group enriched genes are defined as genes showing a 4-fold higher average level of mRNA expression in a group of 2-5 tissues, including brain, compared to all other tissues.

In order to illustrate the relation of brain tissue to other tissue types, a network plot was generated, displaying the number of genes shared between different tissue types. The common origin of neuroectoderm is a plausible reason for the relatively high number of genes connecting brain with adrenal gland and pancreas. However, a clear connection for the large number of genes shared between testis and brain could not be revealed, neither by gene ontology analysis or immunohistochemical analysis and further investigations are needed. The network plot reveals that most group enriched genes are shared with the testis (n=69). The large number of group enriched genes related to brain and skeletal muscle is possibly due to shared signaling functions. The group enriched genes shared with pituitary gland is expected since half of the pituitary gland (posterior lobe) originates from the brain and both neuronal and glial cells are located in the gland. Several group enriched genes are shared with the fallopian tube, mainly related to ciliated cells that are found in the ependymal cells of the ventricle walls.

Figure 4. An interactive network plot of the brain enriched and group enriched genes connected to their respective enriched tissues (grey circles). Red nodes represent the number of brain enriched genes and orange nodes represent the number of genes that are group enriched. The sizes of the red and orange nodes are related to the number of genes displayed within the node. Each node is clickable and results in a list of all enriched genes connected to the highlighted edges. The network is limited to group enriched genes in combinations of up to 3 tissues, but the resulting lists show the complete set of group enriched genes in the particular tissue.


AQP4 is implicated in maintanence of brain water homeostasis and group enriched in lung and brain. The encoded protein is expressed in astrocytes and alveolar cells.


AQP4 - lung

AQP4 - cerebral cortex

ATP1B2 is group enriched and shows high expression in retina and brain, located in photoreceptor cells in retina as well as nerve fibers in brain and retina.


ATP1B2 - retina

ATP1B2 - cerebral cortex

PNMA5 is also group enriched and shows expression in testis and brain.


PNMA5 - testis

PNMA5 - cerebral cortex

Ciliated cells in fallopian tube and respiratory epithelium share several proteins with the ciliated ependymal cells in the brain, resulting in several genes classified as group enriched, such as FOXJ1 and RSPH1.


FOXJ1 - caudate

FOXJ1 - fallopian tube

FOXJ1 - bronchus


RSPH1 - caudate

RSPH1 - fallopian tube

RSPH1 - bronchus

Relevant links and publications

Sjöstedt E et al., An atlas of the protein-coding genes in the human, pig, and mouse brain. Science. (2020)
PubMed: 32139519 DOI: 10.1126/science.aay5947

Uhlén M et al., Tissue-based map of the human proteome. Science (2015)
PubMed: 25613900 DOI: 10.1126/science.1260419

Sjöstedt E et al., Defining the Human Brain Proteome Using Transcriptomics and Antibody-Based Profiling with a Focus on the Cerebral Cortex. PLoS One. (2015)
PubMed: 26076492 DOI: 10.1371/journal.pone.0130028

Fagerberg L et al., Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics. Mol Cell Proteomics. (2014)
PubMed: 24309898 DOI: 10.1074/mcp.M113.035600