Brain Single Nuclei - Methods summary

The Single nuclei brain section contains single nuclei RNA sequencing (snRNAseq) data from 11 different brain regions, the original data includes 3 million cells separated into 461 different cell clusters. Here, 2,5 million cells are imported and the clustering is based on 31 superclusters and cell type information, resulting in 34 different cluster cell types. More information about the samples included can be found here.

Source publication

Siletti K et al. (2023) โ€œTranscriptomic diversity of cell types across the adult human brainโ€ Science . 2023 Oct 13;382(6667)

Includes single nuclei RNA sequencing result, based on over 3 million cells from multiple brain regions, and the data is available in the CellxGene browsing tool (The Human Brain Cell Atlas v1.0 ) for exploring individual clusters and samples.


What can you learn from the Single Nuclei Brain section?

Learn about:

  • mRNA and protein expression in brain cell cluster types
  • if a gene is enriched in a particular brain cell cluster (specificity)
  • how the brain specific variation translates when comparing to the whole body cell types


Data overview

Data type Count Description Cover (nr genes)
RNA expression 11 RNA read count for genes per cell across 11 brain regions 19580
RNA expression 260 RNA expression for genes across 260 clusters 19580
RNA expression 34 RNA expression for genes across 34 cluster types 19580

How has the data been analyzed?

Here, the HPA imported the expression profiles and grouped them based on the cell type- strategy (providing bar charts of pooled data representing each cell type cluster and calculating the average normalized protein-coding transcripts per million). The cell type clusters are based on the 31 superclusters, as well as the provided assigned cell types, and the data is shown as 34 different "supercluster cell types". The expression profile of the different clusters are shown for each of the 11 different brain regions. More details, related to number of million reads and number of cells per brain region/UMAP can be found here. The published cerebral cortex data is represented by a larger number of cells and only a random selection of 500 thousand cells is included. In total, expression data for 2526725 brain cells is displayed in the Brain single nuclei resource, for browsing the gene expression and profile easy comparison to cell type expression in peripheral tissues.


Collection of scRNA-seq data

The filtered snRNA-seq data was downloaded at CELLxGENE (https://cellxgene.cziscience.com/collections/283d65eb-dd53-496d-adb7-7570c7caa443). We aggregated together dissections into 11 main brain regions. There were minor tweaks to the dataset, a few superclusters were removed. All cells of a supercluster were removed if they met all of the following three conditions:

  • Supercluster has a low cell count ( n < 30 cells)
  • Cells of superclusters were not clustering together in our regional representation
  • Supercluster name is incongruent for a particular region, i.e. the supercluster name was not consistent with the regionโ€™s expected cell types.

Cell filtering

The exact list of excluded cells is listed down below. After removal these cells, each region went through UMAP dimensionality reduction using scanpy (v 1.10.2) based on python (v 3.10.13). For this purpose, MALAT1 and genes detected in less than 3 cells were filtered out. Data was normalised through scanpyโ€™s sc.pp.normalize_total function and highly variable genes were calculated using the base settings. Neighbourhood graph was computed for 10 neighbours based on the first 40 principal components with scanpyโ€™s sc.pp.neighbours. Based on that, the UMAP coordinates were calculated using scanpyโ€™s sc.tl.umap function on base settings. For the cerebral cortex, we subsampled the dataset at this point by randomly picking 500,000 cells out of 1,345,140, due to technical limitations in on-line visualisation.

Visual description and more details about the pipelines is available at the Single Cell Type method summary.


What is presented in the section?

The data is presented as interactive UMAP plots and summarizing bar plots, displaying the expression of each gene in each cluster or single nuclei, including information on cluster type specificity from a whole brain perspective.

02004006008001,0001,200nTPM
RBFOX3 - SnBrain
0100200300400nTPM
VIP - SnBrain
05001,0001,5002,000nTPM
GFAP - SnBrain
01020304050nTPM
SELE - SnBrain

For every protein-coding gene there is a summarized barplot showing the expression profile acorss the 34 different cluster names. Here, are 4 examples, two neuronal specific markers (RBFOX3, also called NeuN being pan-neuronal, while VIP is selectively expressed by a neuronal subtype and mainly found in one cluster), the astrocyte marker GFAP and endothelial specific SELE. Below, is the cerebral cortex UMAP for these examples shown:


RBFOX3 - cerebral cortex

VIP - cerebral cortex

GFAP - cerebral cortex

SELE - cerebral cortex


How has the classification of all protein-coding genes been done?

A genome-wide classification of the protein-coding genes with regard to single nuclei brain cluster names specificity has been performed. The genes were classified according to specificity into (i) cell type enriched genes with at least fourfold higher expression levels in one cluster/cell type as compared with any other analyzed cluster/cell type; (ii) group enriched genes with enriched expression in a small number of cluster/cell types (2 to 10); and (iii) cluster/cell type enhanced genes with only moderately elevated expression.

List of removed cells

The following is a list of excluded superclusters by region, along with the number (n) of excluded cells.

  • Cerebral cortex: hippocampal CA4 (n = 25), lower rhombic lip (n = 20), hippocampal dentate gyrus (n = 20), midbrain-derived inhibitory (n = 11), cerebellar inhibitory (n = 11), mammillary body (n = 9), choroid plexus epithelial cell (n = 3), ependymal cell (n = 1)
  • Hippocampus: thalamic excitatory (n = 7), midbrain-derived inhibitory (n = 1)
  • Amygdala: thalamic excitatory (n = 2)
  • Basal ganglia: hippocampal CA4 (n = 16), hippocampal CA1-3 (n = 6), cerebellar inhibitory (n = 5), bergmann glia (n = 2)
  • Thalamus: mammillary body (n = 9), cerebellar inhibitory (n = 5), hippocampal CA1-3 (n = 5), lower rhombic lip (n = 2), deep-layer near-projecting (n = 1), bergmann glia (n = 1)
  • Hypothalamus: hippocampal CA1-3 (n = 15), cerebellar inhibitory (n = 13), thalamic excitatory (n = 9), choroid plexus epithelial cell (n = 2), lower rhombic lip (n = 1), bergmann glia (n = 1)
  • Midbrain: mammillary body (n = 12), medium spiny neuron (n = 6), choroid plexus epithelial cell (n = 3), hippocampal CA1-3 (n = 3), deep-layer near-projecting (n = 2), eccentric medium spiny neuron (n = 1)
  • Cerebellum: MGE interneuron (n = 3), choroid plexus epithelial cell (n = 2), medium spiny neuron (n = 1), deep-layer intratelencephalic (n = 1)
  • Pons: CGE interneuron (n = 25), upper-layer intratelencephalic (n = 20) , LAMP5-LHX6 and Chandelier (n = 6), MGE interneuron (n = 3), amygdala excitatory (n = 2), mammillary body (n = 1), deep-layer intratelencephalic (n = 1)
  • Medulla oblongata: CGE interneuron (n = 27), upper-layer intratelencephalic (n = 16), MGE interneuron (n = 7), LAMP5-LHX6 and Chandelier (n = 2)
  • Spinal cord: midbrain-derived inhibitory (n= 3), CGE interneuron (n=2), choroid plexus epithelial cell (n=1), upper-layer intratelencephalic (n= 1)