Single cell type - Methods summary

The Single cell type part of the Single cell resource contains single cell RNA sequencing (scRNAseq) data from 31 major healthy tissues and organs and 557 individual cell type clusters. More information about the included tissues can be found here.

Key publication

Karlsson M et al. (2021) โ€œA single cell type transcriptomics map of human tissuesโ€ Sci Adv 28;7(31): abh2169


What can you learn?

Learn about:

  • mRNA and protein expression in single cell types
  • if a gene is enriched in a particular cell type (specificity)
  • which genes have a similar expression profile across cell types (expression cluster)


Data overview

Data type Count Description Cover (nr genes)
RNA expression 31 RNA read count for genes per cell across 31 tissues 20082
RNA expression 557 RNA expression for genes across 557 clusters 20082
RNA expression 81 RNA expression levels per gene and cell type 20082

How has the data been generated?


Collection of scRNA-seq data

The scRNA-seq dataset was retrieved from published studies based on healthy human tissues. We performed meta-analysis of literature on scRNA-seq and searched single cell databases, including the Single Cell Expression Atlas (https://www.ebi.ac.uk/gxa/sc/home), the Human Cell Atlas (https://www.humancellatlas.org), the Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/), the Tabula Sapiens (https://tabula-sapiens-portal.ds.czbiohub.org/), the Allen Brain Atlas (https://portal.brain-map.org/) and the European Genome-phenome Archive (https://www.ebi.ac.uk/ega/). To avoid technical bias and to ensure the single cell datasets can best represent the corresponding tissues, we applied the following criteria for data selection: (1) Single cell RNA sequencing was performed on single cell suspension from tissues without pre-enrichment of cell types; (2) Datasets included >3,000 cells and 20 million read counts; (3) Pseudo-bulk gene expression profiles were highly correlated with bulk RNA-seq profiles. In total, datasets from 30 tissue types and human blood were included.

Immunohistochemistry on tissue microarrays

For confirming scRNA-seq profiles and cell type specificity at the protein level, antibody-based protein expression profiling of normal human tissue types was generated using immunohistochemistry (IHC) on tissue microarrays (TMAs), as described more in detail in the the Tissue section.



How has the data been analyzed?

Quantified raw sequencing data were downloaded from the corresponding depository database based on the accession number provided by the study. Unfiltered data were used as input for downstream analysis with in-house pipeline using Single-Cell Analysis in Python, where the data was considered valid if: i) a cell has at least 200 genes; and ii) a gene is expressed in at least 10% of the cells. By pooling the data from each cell type cluster and calculating the average normalized protein-coding transcripts per million, it is possible to visualize expression profiles for each gene in each cluster at both a genome-wide and single cell type level. Each of the 557 individual cell type different cell type clusters were manually annotated choosing the main cell type based on an extensive survey of well-known tissue and cell-type specific markers, including both markers from the original publications, and additional markers used in pathology diagnostics.

Gene expression normalization:

The total read counts for all genes in each cluster was calculated by adding up the read counts of each gene in all cells belonging to the corresponding cluster. Finally, the read counts were scaled to transcripts per million protein-coding genes (pTPM) for each of the single cell clusters and then normalized (nTPM) using Trimmed mean of M values (TMM) to allow for between-cluster comparisons. The calculation of the nTPM matrix can be described as follows (formula (1)), where x represents the pTPM expression matrix, and i and j represents gene id and cluster id, respectively:

x ^ ij = TMM ( x ij , reference = median_column i ); median_column i = median ( x i , 1 , x i , 2 , โ€ฆ , x i , n ) (1)

To generate expression values per cell type, firstly, clusters were aggregated per cell type by calculating the weighted mean nTPM in all cells with the same cluster annotation within a tissue, as shown in formula (2):

x ^ j = โˆ‘ j = 1 n w j x j โˆ‘ j = 1 n w j (2)

where x is the vector of nTPM expression values of cluster j; n is the number of clusters that have the same cluster annotation within a tissue; w is the cell counts of cluster j. Then, the values for the same cell types in different tissues were mean averaged to a single aggregated value. Only clusters with medium and high reliability were included and clusters containing mixed cell types, Neutrophils and Platelets were excluded due to their low RNA content. Additionally, log10(nTPM + 1) transformed values were used to calculate the โ€œtauโ€ specificity score. Tau is defined 24 as follows, where x is vector of nTPM expression values across tissues:

ฯ„ = โˆ‘ i = 1 n 1 - x ^ i n - 1 ; x ^ i = x i max 0 โ‰ค i โ‰ค n x i (3)


What is presented?

The data is presented as interactive UMAP plots and summarizing bar plots, displaying the expression of each gene in each cluster or single cell type, including information on cell type specificity from a body-wide perspective. The data is linked to protein expression profiles in the Tissue section, presenting the single cell type specificity as high-resolution histological images.


How has the classification of all protein-coding genes been done?

A genome-wide classification of the protein-coding genes with regard to single cell type specificity has been performed using between-sample normalized data. The results can serve as a reference for researchers interested in expression profiles in any of all the main cell types. The genes were classified according to specificity into (i) cell type enriched genes with at least fourfold higher expression levels in one cell type as compared with any other analyzed cell type; (ii) group enriched genes with enriched expression in a small number of cell types (2 to 10); and (iii) cell type enhanced genes with only moderately elevated expression. Finally, a new classification based on expression clusters has recently been introduced in which all genes are clustered based on expression similarity across all cell types. The results are presented as an UMAP cluster plot (see figure) and an interactive version is available here.

UMAP1UMAP2