The druggable proteome

Almost all pharmaceutical drugs today act by targeting proteins in the human body and affect their activity. Antagonists are drugs that inactivate the protein target, while drugs that activate the protein target are called agonists. Target proteins involved mainly belong to four protein families i.e. enzymes, transporters, ion channels and receptors. The current FDA approved drugs are directed to 854 separate human proteins that are directly related to the mechanism of action for the drug according to Drugbank. In the table below these proteins are categorized based on function into some major protein families (Table 1).

Table 1. Classification of targets for FDA approved drugs. The numbers will not add up to the target number, since protein targets can belong to more than one of these selected classes or none at all.

Protein class Number of genes
Enzymes 323
Transporters 294
Voltage-gated ion channels 61
G-protein coupled receptors 110
Nuclear receptors 21
CD markers 90

Defining the druggable proteome

A drug exerts its effect by interfering with any of the four types of macromolecules in the human body, i.e. proteins, polysaccharides, lipids and nucleic acids. Almost all approved drugs on the market today are directed against protein targets, since issues like toxicity and low specificity are more related to the three latter types. There are approximately 20000 human protein-coding genes, but not all proteins are suitable for drug interactions and even fewer are appropriate drug targets. The druggable proteome could be defined as the fraction of proteins which have the ability to bind a small molecule or antibody with required affinity, adequate chemical properties, and at the same time are potential drug targets i.e. linked to a disease.

Specificity and location of drug targets

Suitable drug targets should have a critical role in the disease process with less significant involvement in other important processes to limit potential side-effects, have an expression pattern allowing for drug efficacy by for example showing tissue-specific expression, and have structural and functional properties allowing for drug specificity.

Expression profiles for the protein targets of FDA approved drugs were made using transcriptomics data from 81 cell types from 31 human tissues and used for specificity classification as shown in Figure 1. As expected the majority of the targets show elevated expression including the 305 with cell type or group enriched expression, but here are also 11 not detected genes. These represent e.g.proteins expressed in tissues not present in the single cell data, like adrenal gland and pituitary gland.

Figure 1. Single cell type specificity and distribution of targets for FDA approved drugs based on classification of RNA expression profiles in single cell transcriptomics data. Gene lists are obtained by clicking in the pie charts.

Most drugs act on proteins involved in signal transduction, since almost all known diseases are linked to some dysfunction in these pathways. Signal transduction is the process of converting external signals at the cell membrane to specific responses inside the cell, which may result in e.g. gene expression, cell division, or cell death. In Figure 2, the distribution of cellular compartments for the targets, based on majority decision methods for transmembrane region and signal peptide predictions, and a comprehensive annotation of predicted secreted genes, indicates that 68% of the targets are membrane-bound or secreted.

Figure 2. Cellular localization of targets for FDA approved drugs based on a variety of transmembrane and signal peptide prediction methods. Lists of the target genes are obtained by clicking in the pie chart.

Antibody-based drugs that usually cannot penetrate the plasma membrane of the cell are mostly directed against targets on the cell surface e.g. receptors, while small molecule drugs that can diffuse into cells act on targets found inside the cell. Among the FDA approved drugs directed against the above mentioned 854 proteins, the vast majority is small molecule drugs, as can be seen in the Venn diagram in Figure 3.

Figure 3. Venn diagram of the type of drugs directed against the 854 protein targets for FDA approved drugs. Lists of the target genes are obtained by clicking the numbers in the Venn diagram.

Examples of drug targets

TSHR

The thyroid stimulating hormone receptor protein (TSHR) is the target for the synthetic agonist Thyrotropin Alfa (brand name Thyrogen), which is a thyroid stimulating hormone used for detection of residual or recurrent thyroid cancer. TSHR is a G-protein coupled receptor localized in the cell membrane of glandular cells in the thyroid gland, as shown by immunohistochemical staining using the antibody CAB000473.

LIPF

The protein Gastric triacylglycerol lipase (LIPF) is the target for the small-molecule antagonist Orlistat (brand names Alli and Xenical), which is used to treat obesity by preventing the absorption of fats from the diet. LIPF is an enzyme that hydrolyzes triglycerides into absorbable free fatty acids in the intestine, here shown in the glandular cells in stomach by immunohistochemical staining with the antibody HPA045930.

CACNA1S

The protein CACNA1S is one of the targets for a number of calcium channel blockers that acts as vasodilators and are used as antihypertensive agents. CACNA1S is a voltage-sensitive calcium channel protein and the immunohistochemical staining with the antibody HPA048892 shows high expression in skeletal muscle where the protein plays an important role in excitation-contraction coupling.

FOLH1

The enzyme FOLH1 is the target for the antibody-based drug Capromab, which is used for diagnosis of prostate cancer and detection of intra-pelvic metastases. FOLH1 has both folate hydrolase and N-acetylated-alpha-linked-acidic dipeptidase (NAALADase) activity and is involved in prostate tumor progression. Immunohistochemical staining with the antibody HPA010593 shows strong staining of glandular cells in the prostate.

Potential drug targets

As stated earlier, dysfunction in signal transduction networks is present in most diseases and therefore knowledge of key signal transduction components and their links to disease could potentially constitute a base for identifying novel drug targets. By analyzing for example sequence properties, protein families, structural folds, biochemical aspects, similarity to other proteins and associated pathways of known targets, you might be able to make predictions that can be used to screen the genome for druggable proteins.

Currently, there are 4906 genes in the UniProt database having experimental evidence for being involved in various disease conditions, including cancer, neurologic, systemic and cardiovascular disease. Around 1757 of these might be interesting to investigate as potential drug targets in that they belong to known drug target protein classes i.e. enzymes, transporters, receptors and ion-channels, and are not yet targets for FDA approved or experimental drugs in the Drugbank database. The specificity classification of the potential drug targets based on single cell type data is shown in Figure 4.

Figure 4. Single cell type specificity and distribution of potential drug targets based on classification of RNA expression profiles in single cell transcriptomics data. Gene lists are obtained by clicking in the pie charts.

Relevant links and publications

UhlĂ©n M et al., Tissue-based map of the human proteome. Science (2015)
PubMed: 25613900 DOI: 10.1126/science.1260419

Wishart DS et al., DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res. (2006)
PubMed: 16381955 DOI: 10.1093/nar/gkj067

Database with drug data linked to drug target information - DrugBank