Spatial TranscriptomicsSpatial transcriptomics Recent development in transcriptomics technologies (detection of RNA) enable quantification of RNA content of tissues and single cells. Especially spatial transcriptomics technologies that detect both transcripts and their spatial location have become available for researchers. The major breakthroughs in the field of spatial transcriptomics have been the spatial resolution (< 1 micrometer) and the possibilities to simultaneously detect all transcripts (genome wide). The major advantages of high-resolution spatial transcriptomics methods compared to single cell or single nuclei transcriptomics are the ability to investigate all cells including rare cells in a single tissue section while maintaining information on the cellular environment and neighboring cells. Especially in the field of neuroscience where the complexity and molecular diversity of the brain form challenges that make it extremely difficult to generate detailed and complete maps of protein expression using ‘bulk’ or single nuclei transcriptomics. Stereo-seq data on the human protein atlasThe brain resource now contains the first spatial transcriptomics data of the human ‘healthy’ cerebral cortex (frontal cortex). Based on single nuclei transcriptomics we determined the likelihood of transcripts to be expressed in same cell-type. By creating a transcript-to-transcript matrix we can predict the location of transcripts or combining several cell-type markers predict the cell-type for each location in the spatial transcriptomics data. Figure 1: Integration of Single Nuclei and Spatial Transcriptomics Data for Cell Segmentation and Transcript Location Imputation. A) Co-expression Analysis: Single nuclei transcriptomics data is utilized to assess the co-expression of genes. By scaling these values using a z-score, we determine the likelihood of genes being co-expressed within the same cell. B) RNA Location Imputation: This co-expression information enables imputation of RNA locations. For each protein-coding gene at every tissue spot, we calculate the support for that transcript by summing the z-score values of all neighboring transcripts. C) Identification of Cell-Type Marker Genes: Genes are clustered based on their co-expression profiles (all genes to all genes). These clusters were annotated using known cell-type markers. Clusters containing gene sets enriched in neurons, astrocytes, oligodendrocytes, microglia, and vasculature-associated cells where selected. D) Creating cell-type masks: Using the accumulated gene-gene co-expression data, we calculate the likelihood of each tissue spot belonging to one of the annotated clusters, based on the identity of neighboring transcripts.*
Creating cell-type masks and quantification of protein expressionIn the first version of stereo-seq resource the location of vascular cells, astrocytes, oligodendrocytes, microglia and neurons was determined using sets of marker genes with elevated expression in the corresponding cell-types. This enables to link every location within the spatial transcriptomics data to a main cell type. For each protein coding transcript, the number of counts in each cell-type is represented as bars. It should be noted that spatial diffusion and overlapping cells cause a level of noise.
Astrocytes
Oligodendrocytes
Microglia
Neurons
Vascular cells
Figure 2: Assignment of spots to the 5 main cell-types. The colors represnt the masks for the main cell types. These are used to assign pixels to cell types when counting transcripts (bar plots) or imputing expression. Identifying cell-type elevated transcripts: For all protein coding transcripts the number of counts in each of the 5 cell-type maskes was measured. The counts that are not assigned to a cell-type are considered noise and these are substracted form the counts in each cell-type mask. For each cell-type and for each protein coding transcript the relative abundance or enrichement change is calculated as a counts per million counts. These numbers provide an overview of protein expression in each cell-type and is used for calucating cell-type enriched protein coding transcripts. A protein coding transcripts is considered cell-type elevated if it has a 5-times higher abundance in one cell type compared to the 4 other cell-types. An overview of the most abundant elevated proteins in the five cerebral cortex cell types is shown below. For comparison and validation single nuclei transcriptomics data is available in the single cell resource. A major advantage of spatial transcriptomics is the ability to link protein coding genes to cell types even if the protein distribution does not reveal a cell morphology when using immunohistochemstry (IHC). To illustrate this several IHC examples are shown below the tables. Top 12 cerebral cortex astrocyte elevated genes: This list of genes is enriched for genes associated with glutamate neurotransmitter metabolism including the excitatory amino acid transporters GLAST (SLC1A3) and GLT (SLC1A2) and the glutamate to glutamine converting enzym GLUL and water homeostasis via aquaporin 4 (AQP4).
Top 12 cerebral cortex neuron elevated genes: This set of protein coding genes is dominated by proteins expressed in glutatmatergic projection neurons, the most dominant cell type in the cerebral cortex. Neuron elevated genes include the synaptic marker SNAP25 and other known neuronal cell markers like UCHL1, CALM1, NEFL and others.
Top 12 cerebral cortex oligodendrocyte elevated genes: This list of transcripts includes several myelin components including PLP1, CNP, MOBP and MAG.
Top 12 cerebral cortex microglia elevated genes: This list contains several known microglia and macrophage genes in cluding CD74, TSPO and the complement system component C3 and C1QA.
Top 12 cerebral cortex vasculature elevated genes: This mask contains endothelia (VWF, CLDN5, SLC2A1) , pericytes (IFITM3) and other vasculature associated gene transcripts.
Imputation of transcript locationThe 1 by 1 cm stereo-seq chip has approximately 400 million points. On average we detect about 100 million transcripts. Plotting the real detected transcripts provides a sparse image with little information about the tissue. For this version we utilized the single nuclei transcriptomics data to calculate and predict the likelihood of a transcript to be located at every location. The color of the pixel indicates the cell-type mask of this predicted location. Proteins with elevated expression in a single cell type are mostly predicted to tissue areas assigned to the cell type. It should be noted that the total number of captured transcripts and area covered is dominated by neurons. When imputing posible transcript location many of the results reveal a neuron dominated picture mainly due to low area covered and low transcript counts for non-neuronal cell-types.
Figure 3: Imputation of transcript location based on co-expression data. For each pixel that contains transcripts an area with a radius of 5 micrometer is explored to calculate the likelyhood for expression of all protein coding genes at that location. These are not real measurements but predictions. Current limitation and challenges and future perspectivesThe field of spatial transcriptomics is relatively new and with the latest developments we now in theory can perform single cell spatial transcriptomics analysis. Spatial transcriptomics analysis of brain samples: The major technical challenges in molecular neuroscience have a biological origin. The brain has 1) many cell-types and cell states, 2) cell-types have different sizes and morphologies, 3) cell-types have different levels of overall transcriptional activity, and 4) the difference between sub-types or cell states often is a small fraction of total transcriptome. To create a molecular map of the brain at single cell resolutions we need to 1)capture all cell-types and cell-states in a large enough numbers to generate the necessary statistical power to compare between cell-types or cell-states, 2) create strategies to compare between cell-types with different morphologies and total transcript content, and 3) create sensitive assays to capture the minor but biological relevant differences between sub-types and cell states. The HPA approach to map the molecular and cellular landscape of the brain: In the field of high-resolution transcriptomics (resolution < single cell) several methods have been developed to group individual counts to single cells. Many of the currently used methods are imaged based and use a nuclear staining to define the cell-center and use a radius approach to link detected transcripts to these cells. This approach works for many tissues especially if the cells in these tissues have similar (round) shapes and overall total number of transcripts. For the analysis of brain, these methods are not well suited and have difficulty capturing cells with low number of transcripts (e.g. microglia, vascular cells) especially if these are near neurons that have high number of transcripts. In the Human Protein atlas project, we therefore use co-expression to link individual spots to a cell-types creating a cell-type mask. In the current version we demonstrate how this can be used to identify protein coding transcripts elevated in one of the five major cell-types and how to impute the possible location of transcripts in the cerebral cortex. Future perspective: The data presented in the current version is not have a single cell resolution and does not provide information on sub-types (e.g. cortical layers, interneurons, astrocytes etc). The next tasks are 1) to perform regional segmentation (white matter/grey matter, cortical layers), 2) segment individual cells, and 3) capture enough number of cells and numbers of gene counts to define the molecular signatures of all cell types and cell states. With the availability of several mature spatial transcriptomics platforms that are complementary and the development of novel analysis tools our goals are within reach. Relevant links and publications Ståhl PL et al., Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science. (2016) Chen A et al., Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays. Cell. (2022) Liu L et al., Spatiotemporal omics for biology and medicine. Cell. (2024) |