The membrane proteome

Proteins that are located in the cellular membranes play a crucial role in many physiological and pathological processes. The functions of membrane proteins are diverse and include ion channel activity or transport of other molecules across the membrane, enzymatic processes, anchoring of other proteins and receptor signaling. A large fraction of the clinically approved treatment regimes today use drugs directed towards (or consisting of) cell surface-associated membrane proteins. Out of the 854 protein targets with known pharmacological action for approved drugs on the market at present 494 membrane-bound. See the Druggable Proteome page for more details.

The estimated size of the membrane proteome

The pie chart in Figure 1 shows the predicted location of all human protein-coding genes. Approximately 28% of the 20162 genes have protein isoforms with at least one predicted transmembrane region suggesting location in one of the numerous membrane systems in the cell. Several genes code for multiple isoforms with alternative locations, including 228 genes with both secreted and membrane-bound isoforms. In total there are 5573 genes predicted to code for membrane proteins and for 1822 there is also experimental evidence for membrane related locations in the Subcellular resource, as exemplified in Figure 2.

Figure 1. The number of all human protein-coding genes predicted to be (1) intracellular, (2) membrane-spanning (3) secreted and (4) membrane-spanning and secreted protein isoforms, where the latter consists of a group of genes with multiple splice variants with at least one secreted and one membrane-spanning isoform.


EGFR - A-431

LMNB1 - MCF-7

SLC30A6 - A-431

Figure 2. Different membrane systems in the cell exemplified by EGFR in the plasma membrane, LMNB1 in the nuclear membrane and SLC30A6 in the Golgi membrane visualized using immunofluorescence (ICC-IF) and confocal microscopy.

What is a membrane protein?

Membrane proteins constitute one of the largest and most important classes of proteins. A membrane protein is associated or attached to the membrane of a cell or an organelle inside the cell and can be classified as either peripheral or integral. Peripheral membrane proteins are associated with the membrane by being bound to either peripheral regions of the membrane or to integral membrane proteins, but they do not fully span the membrane. Integral membrane proteins contain alpha-helical or beta-barrel structures which are hydrophobic and therefore can span the entire lipid bilayer and are linked by extramembranous loop regions.


Figure 3. Different classes of membrane proteins.

The alpha-helical integral membrane proteins form the major category of membrane proteins and are found in all types of biological membranes and will be the main focus here. Their key roles as transporters and receptors explain why they represent approximately 58% of all currently approved drug targets and hence their immense importance for the pharmacological industry. Many important receptors and cell surface molecules are found in the list of human cell differentiation molecules (CD-markers). G-protein coupled receptors (GPCRs) , which contain seven transmembrane (TM) segments and include 743 of the human protein-coding genes, comprise the largest group of membrane protein drug targets.


C5AR1

CYSLTR2

DSC2

Figure 4. Immunohistochemistry-based images from the CD marker C5AR1 in gall bladder, the G-protein coupled receptor CYSLTR2 in placenta and DSC2 in esophagus.

Prediction of transmembrane protein topology and signal peptides

Developing a better understanding of membrane protein structure and function is of immense importance for both biological and pharmacological purposes. Since membrane proteins are difficult to crystallize and severely underrepresented in structural databases, computational prediction of membrane protein structure has been crucial for continued studies of these key molecules. Most membrane protein prediction methods have focused on the topology of a-helical membrane proteins, i.e. the prediction of the position of the transmembrane (TM) segments in the protein sequence and their orientation relative to the membrane as illustrated below in the schematic view of the topology of an alpha-helical membrane protein with four transmembrane segments and extracellular N- and C-terminals.


The TM segments are identified based on features such as length, amino acid property and hydrophobicity, and many prediction methods are based on machine-learning or deep learning techniques. Here, a selection of eight prediction algorithms was used to create a majority decision-based method (MDM), using the combined results from the chosen tools, to estimate the human membrane proteome. Each protein with at least one TM segment with overlapping predictions by four out of the eight methods is considered a membrane protein. Table 1 shows the individual results in number of predicted protein-coding genes by each method, as well as the MDM prediction.

Table 1. Prediction of the human membrane proteome by eight different prediction methods for membrane protein topology as well as the majority decision-based method MDM and a method specialized in prediction of GPCRs.

Protein class Number of genes Number of proteins Source
Predicted membrane proteins 5573 17561 MDM
DeepTMHMM predicted membrane proteins 5064 16339 DeepTMHMM
MEMSAT3 predicted membrane proteins 7505 23235 MEMSAT3
MEMSAT-SVM predicted membrane proteins 6459 20546 MEMSAT-SVM
Phobius predicted membrane proteins 5883 18427 Phobius
SCAMPI predicted membrane proteins 6560 19551 SCAMPI
SPOCTOPUS predicted membrane proteins 7826 25107 SPOCTOPUS
THUMBUP predicted membrane proteins 7290 23205 THUMBUP
TMHMM predicted membrane proteins 5648 17661 TMHMM
GPCRHMM predicted membrane proteins 856 1536 GPCRHMM

The N-terminal signal sequences that are found in most secreted proteins and some types of membrane proteins are often called signal peptides (SP). A signal peptide is primarily identified by a short hydrophobic alpha-helix combined with a number of features that enables computational prediction based on the amino acid sequence of the protein. There are also a number of methods which incorporate a SP prediction model into their TM topology prediction algorithm to enables more reliable results when it comes to distinguishing between the two features. Here the existence of signal peptides was based on a whole-proteome scan using five methods for signal peptide prediction: SignalP6.0, Phobius, SPOCTOPUS, DeepTMHMM and DeepSig, which all have been shown to give reliable prediction results in a comparative analysis. Similarly to the MDM, a majority decision-based method for secreted proteins (MDSEC) was constructed using results from the five different prediction methods. All proteins with a predicted SP by at least three of the five methods were considered to have a signal peptide. Since signal peptides are found both in secreted proteins and in certain types of membrane proteins, proteins with a predicted SP in combination with a predicted TM region according to the MDM were considered membrane-spanning and excluded from the group of predicted secreted proteins.The remaining predicted secreted proteins were further annotated in order to exclude genes that are predicted to reside in intracellular locations such as ER or Golgi, despite having a signal peptide prediction.The resulting numbers of genes encoding a predicted secreted protein are shown in Table 2.

Table 2. Prediction of the human secretome by three different prediction methods for signal peptides as well as the MDSEC and the final prediction resulting from the annotation of the human secretome.

Protein class Number of genes Number of proteins Source
Predicted secreted proteins 1895 4998 HPA
Secreted proteins predicted by MDSEC 2758 6454 HPA
DeepTMHMM predicted secreted proteins 3227 7246 DeepTMHMM
DeepSig predicted secreted proteins 2559 5894 DeepSig
SignalP predicted secreted proteins 2491 5799 SignalP
Phobius predicted secreted proteins 3616 8477 Phobius
SPOCTOPUS predicted secreted proteins 3947 8982 SPOCTOPUS

Classification of the human proteome

The combined results from analyses of the predicted membrane proteome and secretome were used to map the distribution of potential membrane proteins and secreted proteins in the human proteome. The protein isoforms of all human genes were annotated using the three categories: (i) secreted, (ii) membrane and (iii) intracellular (i.e., proteins with no predicted SP/TM features). Note that proteins classified as membrane may be located in intracellular membranes such as the endoplasmic reticulum or Golgi. Each of the human protein-coding genes were subsequently classified into those with all isoforms belonging to one of these groups or genes encoding protein isoform belonging to two or all three categories. The results show that 36% of the human predicted genes have at least one protein isoform which is membrane-spanning or secreted.


Figure 5. Venn diagram showing the overlap between the number of genes that are intracellular, membrane-spanning, secreted, or with isoforms belonging more than one of the three categories.

Examples of protein classes including membrane proteins

There are a number of important protein classes involving membrane proteins. In Table 3, some examples of such classes are presented.

Table 3. A selection of classes related to the membrane proteome and secretome.

Protein class Number of genes Number of proteins Source
CD markers 384 1005 UniProt
Voltage-gated ion channels 132 355 IUPHAR-DB
Transporters 2138 5396 TCDB
GPCRs excl olfactory receptors 391 776 UniProt

Relevant links and publications

Fagerberg L et al., Prediction of the human membrane proteome. Proteomics. (2010)
PubMed: 20175080 DOI: 10.1002/pmic.200900258

UhlĂ©n M et al., Tissue-based map of the human proteome. Science (2015)
PubMed: 25613900 DOI: 10.1126/science.1260419