Spatial Transcriptomics

Spatial transcriptomics Recent development in transcriptomics technologies (detection of RNA) enable quantification of RNA content of tissues and single cells. Especially spatial transcriptomics technologies that detect both transcripts and their spatial location have become available for researchers. The major breakthroughs in the field of spatial transcriptomics have been the spatial resolution (< 1 micrometer) and the possibilities to simultaneously detect all transcripts (genome wide). The major advantages of high-resolution spatial transcriptomics methods compared to single cell or single nuclei transcriptomics are the ability to investigate all cells including rare cells in a single tissue section while maintaining information on the cellular environment and neighboring cells. Especially in the field of neuroscience where the complexity and molecular diversity of the brain form challenges that make it extremely difficult to generate detailed and complete maps of protein expression using ‘bulk’ or single nuclei transcriptomics.

Stereo-seq data on the human protein atlas

The brain resource now contains the first spatial transcriptomics data of the human ‘healthy’ cerebral cortex (frontal cortex). Based on single nuclei transcriptomics we determined the likelihood of transcripts to be expressed in same cell-type. By creating a transcript-to-transcript matrix we can predict the location of transcripts or combining several cell-type markers predict the cell-type for each location in the spatial transcriptomics data.

Figure 1: Integration of Single Nuclei and Spatial Transcriptomics Data for Cell Segmentation and Transcript Location Imputation. A) Co-expression Analysis: Single nuclei transcriptomics data is utilized to assess the co-expression of genes. By scaling these values using a z-score, we determine the likelihood of genes being co-expressed within the same cell. B) RNA Location Imputation: This co-expression information enables imputation of RNA locations. For each protein-coding gene at every tissue spot, we calculate the support for that transcript by summing the z-score values of all neighboring transcripts. C) Identification of Cell-Type Marker Genes: Genes are clustered based on their co-expression profiles (all genes to all genes). These clusters were annotated using known cell-type markers. Clusters containing gene sets enriched in neurons, astrocytes, oligodendrocytes, microglia, and vasculature-associated cells where selected. D) Creating cell-type masks: Using the accumulated gene-gene co-expression data, we calculate the likelihood of each tissue spot belonging to one of the annotated clusters, based on the identity of neighboring transcripts.*


Creating cell-type masks and quantification of protein expression

In the first version of stereo-seq resource the location of vascular cells, astrocytes, oligodendrocytes, microglia and neurons was determined using sets of marker genes with elevated expression in the corresponding cell-types. This enables to link every location within the spatial transcriptomics data to a main cell type. For each protein coding transcript, the number of counts in each cell-type is represented as bars. It should be noted that spatial diffusion and overlapping cells cause a level of noise.

Astrocytes
Oligodendrocytes
Microglia
Neurons
Vascular cells

Figure 2: Assignment of spots to the 5 main cell-types. The colors represnt the masks for the main cell types. These are used to assign pixels to cell types when counting transcripts (bar plots) or imputing expression.

Identifying cell-type elevated transcripts: For all protein coding transcripts the number of counts in each of the 5 cell-type maskes was measured. The counts that are not assigned to a cell-type are considered noise and these are substracted form the counts in each cell-type mask. For each cell-type and for each protein coding transcript the relative abundance or enrichement change is calculated as a counts per million counts. These numbers provide an overview of protein expression in each cell-type and is used for calucating cell-type enriched protein coding transcripts. A protein coding transcripts is considered cell-type elevated if it has a 5-times higher abundance in one cell type compared to the 4 other cell-types. An overview of the most abundant elevated proteins in the five cerebral cortex cell types is shown below. For comparison and validation single nuclei transcriptomics data is available in the single cell resource. A major advantage of spatial transcriptomics is the ability to link protein coding genes to cell types even if the protein distribution does not reveal a cell morphology when using immunohistochemstry (IHC). To illustrate this several IHC examples are shown below the tables.

Top 12 cerebral cortex astrocyte elevated genes: This list of genes is enriched for genes associated with glutamate neurotransmitter metabolism including the excitatory amino acid transporters GLAST (SLC1A3) and GLT (SLC1A2) and the glutamate to glutamine converting enzym GLUL and water homeostasis via aquaporin 4 (AQP4).

Gene Description Abundance
CLU clusterin 20652
SLC1A2 solute carrier family 1 member 2 8196
MT3 metallothionein 3 6964
AQP4 aquaporin 4 5931
SPARCL1 SPARC like 1 5356
ATP1A2 ATPase Na+/K+ transporting subunit alpha 2 4813
GJA1 gap junction protein alpha 1 4799
CPE carboxypeptidase E 4561
CST3 cystatin C 4548
GLUL glutamate-ammonia ligase 4264
SLC1A3 solute carrier family 1 member 3 3564
MT2A metallothionein 2A 2588


AQP4

MT3

SLC1A3

Top 12 cerebral cortex neuron elevated genes: This set of protein coding genes is dominated by proteins expressed in glutatmatergic projection neurons, the most dominant cell type in the cerebral cortex. Neuron elevated genes include the synaptic marker SNAP25 and other known neuronal cell markers like UCHL1, CALM1, NEFL and others.

Gene Description Abundance
SNAP25 synaptosome associated protein 25 4531
PRNP prion protein 3476
UCHL1 ubiquitin C-terminal hydrolase L1 2835
TUBB2A tubulin beta 2A class IIa 2531
NRGN neurogranin 2423
CALM1 calmodulin 1 2414
IDS iduronate 2-sulfatase 2105
NEFL neurofilament light chain 2064
VSNL1 visinin like 1 2058
RTN1 reticulon 1 2050
THY1 Thy-1 cell surface antigen 2025
ENC1 ectodermal-neural cortex 1 2000


CALM1

SNAP25

NRGN

Top 12 cerebral cortex oligodendrocyte elevated genes: This list of transcripts includes several myelin components including PLP1, CNP, MOBP and MAG.

Gene Description Abundance
PLP1 proteolipid protein 1 27310
CRYAB crystallin alpha B 7786
SCD stearoyl-CoA desaturase 4951
CNP 2',3'-cyclic nucleotide 3' phosphodiesterase 4543
QDPR quinoid dihydropteridine reductase 3878
TF transferrin 3571
MOBP myelin associated oligodendrocyte basic protein 3569
CLDND1 claudin domain containing 1 3129
SEPTIN4 septin 4 2994
SELENOP selenoprotein P 2841
MAG myelin associated glycoprotein 2701
CLDN11 claudin 11 2530


CRYAB

CNP

MAG

Top 12 cerebral cortex microglia elevated genes: This list contains several known microglia and macrophage genes in cluding CD74, TSPO and the complement system component C3 and C1QA.

Gene Description Abundance
NLRP1 NLR family pyrin domain containing 1 6325
RPS19 ribosomal protein S19 5285
CTSB cathepsin B 4968
CD74 CD74 molecule 3858
FCGBP Fc gamma binding protein 3420
C3 complement C3 3018
LAPTM5 lysosomal protein transmembrane 5 2778
TSPO translocator protein 2449
HLA-DRA major histocompatibility complex, class II, DR alpha 2247
C1QA complement C1q A chain 2217
CSF1R colony stimulating factor 1 receptor 2165
BOLA2B bolA family member 2B 1703


CD74

HLA-DRA

FCGBP

Top 12 cerebral cortex vasculature elevated genes: This mask contains endothelia (VWF, CLDN5, SLC2A1) , pericytes (IFITM3) and other vasculature associated gene transcripts.

Gene Description Abundance
CLDN5 claudin 5 11730
SLC7A5 solute carrier family 7 member 5 6628
EGFL7 EGF like domain multiple 7 5914
IFITM3 interferon induced transmembrane protein 3 5353
VWF von Willebrand factor 3262
FLT1 fms related receptor tyrosine kinase 1 3047
SLC2A1 solute carrier family 2 member 1 2961
ETS2 ETS proto-oncogene 2, transcription factor 2806
ITM2A integral membrane protein 2A 2745
SLC2A3 solute carrier family 2 member 3 2340
PODXL podocalyxin like 2312
SLC16A1 solute carrier family 16 member 1 2007


FLT1

SLC2A1

CLDN5

Imputation of transcript location

The 1 by 1 cm stereo-seq chip has approximately 400 million points. On average we detect about 100 million transcripts. Plotting the real detected transcripts provides a sparse image with little information about the tissue. For this version we utilized the single nuclei transcriptomics data to calculate and predict the likelihood of a transcript to be located at every location. The color of the pixel indicates the cell-type mask of this predicted location. Proteins with elevated expression in a single cell type are mostly predicted to tissue areas assigned to the cell type. It should be noted that the total number of captured transcripts and area covered is dominated by neurons. When imputing posible transcript location many of the results reveal a neuron dominated picture mainly due to low area covered and low transcript counts for non-neuronal cell-types.


AQP4 - grey matter

CALM1 - grey matter


CLDN11 - white matter


C1QA - grey matter

CLDN5 - grey matter

Figure 3: Imputation of transcript location based on co-expression data. For each pixel that contains transcripts an area with a radius of 5 micrometer is explored to calculate the likelyhood for expression of all protein coding genes at that location. These are not real measurements but predictions.

Current limitation and challenges and future perspectives

The field of spatial transcriptomics is relatively new and with the latest developments we now in theory can perform single cell spatial transcriptomics analysis.

Spatial transcriptomics analysis of brain samples: The major technical challenges in molecular neuroscience have a biological origin. The brain has 1) many cell-types and cell states, 2) cell-types have different sizes and morphologies, 3) cell-types have different levels of overall transcriptional activity, and 4) the difference between sub-types or cell states often is a small fraction of total transcriptome. To create a molecular map of the brain at single cell resolutions we need to 1)capture all cell-types and cell-states in a large enough numbers to generate the necessary statistical power to compare between cell-types or cell-states, 2) create strategies to compare between cell-types with different morphologies and total transcript content, and 3) create sensitive assays to capture the minor but biological relevant differences between sub-types and cell states.

The HPA approach to map the molecular and cellular landscape of the brain: In the field of high-resolution transcriptomics (resolution < single cell) several methods have been developed to group individual counts to single cells. Many of the currently used methods are imaged based and use a nuclear staining to define the cell-center and use a radius approach to link detected transcripts to these cells. This approach works for many tissues especially if the cells in these tissues have similar (round) shapes and overall total number of transcripts. For the analysis of brain, these methods are not well suited and have difficulty capturing cells with low number of transcripts (e.g. microglia, vascular cells) especially if these are near neurons that have high number of transcripts. In the Human Protein atlas project, we therefore use co-expression to link individual spots to a cell-types creating a cell-type mask. In the current version we demonstrate how this can be used to identify protein coding transcripts elevated in one of the five major cell-types and how to impute the possible location of transcripts in the cerebral cortex.

Future perspective: The data presented in the current version is not have a single cell resolution and does not provide information on sub-types (e.g. cortical layers, interneurons, astrocytes etc). The next tasks are 1) to perform regional segmentation (white matter/grey matter, cortical layers), 2) segment individual cells, and 3) capture enough number of cells and numbers of gene counts to define the molecular signatures of all cell types and cell states. With the availability of several mature spatial transcriptomics platforms that are complementary and the development of novel analysis tools our goals are within reach.

Relevant links and publications

Ståhl PL et al., Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science. (2016)
PubMed: 27365449 DOI: 10.1126/science.aaf2403

Chen A et al., Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays. Cell. (2022)
PubMed: 35512705 DOI: 10.1016/j.cell.2022.04.003

Liu L et al., Spatiotemporal omics for biology and medicine. Cell. (2024)
PubMed: 39178830 DOI: 10.1016/j.cell.2024.07.040