Transcription factor landscape

For a living cell to function in its environment, a large number of regulatory processes are needed, including regulation of cell proliferation, cell differentiation and cell death. The underlying mechanisms include regulation of gene expression, and an important class of regulatory proteins are transcription factors that determine when genes are switched on and off. Here we explore the 1485 human transcription factors and their expression landscape across different cell types of the human body, as well as cancer cell lines.

When studying the expression landscape of transcription factors in different cell types, 1188 transcription factor genes show some level of elevated expression in one or a group of cell types compared others. This analysis is based on the specificity categories of gene expression, and four different cell centric datasets: cell types representing the whole body, cell types compared within the brain, circulating immune cells, and a comparison between cancer cell lines.

  • A general comparison of tissues representing the human body shows that 1155 transcription factor genes show elevated expression in one or several cell types, where 97 are classified as cell type enriched
  • When comparing gene expression within the cell types representing the human brain, 503 transcription factors show elevated expression, and out of those 82 are classified as cell type enriched
  • Within different immune cell lineages, 250 transcription factor genes are highlighted with elevated expression, out of which 55 are classified as immune cell lineage enriched
  • The specificity classification of genes expressed in cancer cell lines showed that 457 transcription factors show elevated expression in one or a group of cancer cell lines, and 99 of those are classified as cancer cell line enriched


NKX3-1 - Prostate

NKX3-1

TFAP2B - Cerebellum

TFAP2B

Here, we provide an overview of the expression landscape of transcription factors, separated into the four datasets. Click the respective cell types for more details and overview of the transcription factors with enriched expression profiles.

Single cell types representing the body

Single Cell Type transcriptomics data based on single cell RNA sequencing (scRNAseq) data from 31 human tissues, including peripheral blood mononuclear cells (PBMCs), representing 81 different cell types grouped into 15 main cell type groups.

Out of the 81 cell types representing 31 tissues, germ cells is the cell type group with most transcription factors classified as cell type enriched (31), followed by neuronal cells with 15 transcription factors classified as cell type enriched.

TRANSCRIPTION FACTORS - SINGLE CELL TYPE

Single nuclei with human brain details

Brain single nuclei transcriptomics data is based on single nuclei RNA sequencing (snRNAseq) data representing 11 brain regions, based on 461 cell clusters, and 31 superclusters, here shown as 34 superclusters/cell types based on the published data (Siletti K et al. (2023)) and their cell type classification.

This brain single nuclei dataset further expands the representation of the human brain. Brain cells are represented in the whole body comparison of cell types (above), but limited to cerebral cortex and 76533 cells. The overlap between the datasets are further discussed on the respecive cell type pages.

When comparing the gene expression of cells within the brain the heterogenous group of glial cells holds the most transcription factors with enriched expression (248), indicating the wide range of cells that are referred to as glial cells (Astrocytes, Bergmann glia, Microglia, Oligodendrocytes, OPCs and ependymal cells). The different clusters of neurons, throught the different brain regions, includes several transcription factors with an elevated expression profile.

TRANSCRIPTION FACTORS - SINGLE NUCLEI BRAIN REGION

Immune cell details

Immune cell type transcriptomics data, based on flow-sorted cells from blood, covering 18 cell types grouped into 6 immune cell lineages.

This immune cell dataset further expands the representation and description of the different immune subtypes, with expression comparison between the different immune cell types present in circulating blood. Immune cells are included in the whole body comparison of cell types (above), but limited to main types of immune cells and focuses on immune cells resident in different tissue types.

The number of transcription factors classified as enriched when comparing expression profiles of circulting immune cells, granulocytes has the highest number (59) compared to other immune cell lineages.

TRANSCRIPTION FACTORS - IMMUNE CELL TYPE
Cell type group EnrichedGroup
enriched
EnhancedTotal
elevated
Granulocytes 5936398
Monocytes 930039
T-cells 1824143
B-cells 918128
NK-cells 517022
Dendritic cells 1227039

Cancer cell lines representing cancer types

Cancer cell line transcriptomics data provides RNA expression profiles of human protein-coding genes in 1132 cancer cell lines, representing 28 different cancer types. The cell line specificity is classified by comparing the expression profiles across the 28 grouped cancer cell lines.

Three cancer types stand out when comparing the expression profiles of transcription factors; Testis cancer (14), Neuroblastoma (15) and Bone cancer (15) are the three cancers with highest number of transcription factors classified as cancer cell line enriched. Important to note is that tesits cancer is represented by one cell line, while both neuroblastoma and bone cancer is represented by a mean expression value of several representative cell lines. Some cancers show very low numbers of elevated ranscritpion factors, such as Bladder cancer and Pancreatic cancer.

TRANSCRIPTION FACTORS - CELL LINE

Regulation of gene expression

Transcription factors are regulatory proteins, and they are considered to be the most diverse and important mechanism of gene regulation. According to the TF class database and with data in both the UniProt and Ensembl databases, 1485 human genes are classified as transcription factors. They have DNA-binding domains that bind, specifically and with extreme affinity, to consensus DNA sequences and thereby activate (or in rare cases inhibit) transcription. Transcription factors are classified into families either based on the highly conserved sequences of the DNA binding domains, or on their three-dimensional protein structure. These structural motifs result in their specificity for the consensus sequence and the major classes include 772 proteins with zinc-coordinating DNA-binding domains (zinc-finger proteins), 171 proteins with basic domains (helix-loop-helix and leucine-zipper factors), and 389 proteins with helix-turn-helix domains (homeodomain factors). In Table 1, the transcription factors are classified according to structural motif as in the TFclass database.

Table 1. Structural classification of transcription factors.

Structural motif Number of genes
Yet undefined DNA-binding domains 19
Basic domains 171
Zinc-coordinating DNA-binding domains 772
Helix-turn-helix domains 389
Other all-alpha-helical DNA-binding domains 46
alpha-Helices exposed by beta-structures 13
Immunoglobulin fold 62
beta-Hairpin exposed by an alpha/beta-scaffold 14
beta-Sheet binding to DNA 5
beta-Barrel DNA-binding domains 3

Examples of transcription factors from different structural classes

The zinc-finger is a structural motif in which one or more zinc ions stabilize the protein fold as exemplified by the three-dimensional schematic representation of the estrogen receptor ESR1 (purple with zinc-ions in red) binding to DNA. ESR1 is a nuclear hormone receptor, here shown to be expressed in glandular cells and cells in endometrial stroma of the uterus by staining with the antibody CAB000037.



Zink-finger

The structural motif known as the leucine-zipper consists of a leucine repeat region, which forms an alpha helix with a hydrophobic region responsible for dimerization. Here exemplified by a three-dimensional schematic representation of the proto-oncogene JUN (purple) binding as a homodimer to DNA.

JUN is a basic leucine-zipper factor, here shown to be expressed in glandular cells of the colon by immunohistochemical staining by using the antibody CAB007780.



Leucine-zipper


JUN

The helix-turn-helix motif is a DNA binding motif composed of two α-helices, which make contacts with DNA and are joined by a short turn. The three-dimensional schematic representation shows the transcription factor GBX1 (purple) binding to DNA. GBX1 is a homeo-domain factor, here shown to be expressed in follicle cells of the ovary by using the antibody HPA055783.



Helix-turn-helix

Relevant links and publications

Uhlén M et al., Tissue-based map of the human proteome. Science (2015)
PubMed: 25613900 DOI: 10.1126/science.1260419

Karlsson M et al., A single-cell type transcriptomics map of human tissues. Sci Adv. (2021)
PubMed: 34321199 DOI: 10.1126/sciadv.abh2169

Siletti K et al., Transcriptomic diversity of cell types across the adult human brain. Science. (2023)
PubMed: 37824663 DOI: 10.1126/science.add7046

Uhlen M et al., A genome-wide transcriptomic analysis of protein-coding genes in human blood cells. Science. (2019)
PubMed: 31857451 DOI: 10.1126/science.aax9198

Jin H et al., Systematic transcriptional analysis of human cell lines for gene expression landscape and tumor representation. Nat Commun. (2023)
PubMed: 37669926 DOI: 10.1038/s41467-023-41132-w