The secretome

A secretory protein can be defined as a protein which is actively transported out of the cell. In humans, cells such as endocrine cells and B-lymphocytes are specialized in the secretion of proteins, but all cells in the body secrete proteins to a varying degree. Proteins that are secreted from the cell play a crucial role in many physiological and pathological processes. Medically important secreted proteins include cytokines, coagulation factors, growth factors and other signaling molecules. A large fraction of the clinically approved treatment regimes today use drugs directed towards (or consisting of) secreted proteins and among the FDA approved drugs currently on the market (see the Druggable Proteome for more details) there are 854 protein targets with known pharmacological action of which 120 are secreted proteins. In addition to being a rich source of new therapeutics and drug targets, a large fraction of the blood diagnostic tests used in the clinic are directed towards secreted proteins, emphasizing the importance of this class of proteins for medicine and biology.

Defining the secretome

The human secretome includes all potentially secreted proteins identified based on annotation of subcellular location in UniProt as well as on predictions of signal peptide and transmembrane regions. The signal peptide is found in most secreted proteins but also in some types of membrane proteins and therefore the presence of trans-membrane regions can be used to distinguish membrane proteins from the secreted proteins. A whole-proteome scan of all ensembl transcripts was performed using majority decision methods for signal peptide prediction (MDSEC) and membrane region prediction (MDM). All proteins with a predicted SP by the MDSEC and no predicted TM region according to the MDM were considered secreted. The analysis predicts 2520 genes (12% of all human protein-coding genes) to encode at least one predicted secreted transcript or have one isoform with subcellular location "Secreted" in Uniprot.


Secreted to extracellular matrix

Secreted to blood

Secreted in other tissues

Secreted in brain

Secreted - unknown location

Secreted to digestive system

Secreted in male reproductive system

Intracellular and membrane

Immunoglobulin genes

Secreted in female reproductive system

Figure 1. Classification of the human secretome. More data for the individual categories can be explored by clicking on the icons,and corresponding gene lists are obtained by clicking on the numbers in the pie chart.

Classification of the secretome

The secreted genes were then systematically annotated based on literature, bioinformatics and experimental data from different sources including HPA, UniProt, GTEx and FANTOM,with the aim to determine their involvement in local or systemic secretion and to investigate their spatial distribution in the human body. Each gene was annotated to a single location with the primary goal to decide if active secretion into blood could be expected. This resulted in ten different categories that can be explored in Figure 1. In summary, 785 proteins were classified as secreted to blood, with a fraction of these having other main location, 523 were annotated as secreted to local compartments, e.g. male or female reproductive tissues, brain or a group of other tissues including the eye and the skin. 94 proteins including salivary and pancreatic enzymes were classifies as secreted to the digestive system and another 236 proteins, including laminins, collagens, elastin and fibronectin, were suggested to be involved in the forming and function of the extracellular matrix. 619 proteins were found to be intracellular and membrane bound proteins including e.g. ER/Golgi residing proteins, mitochondrial proteins, lysosomal proteins and membrane-associated proteins. 116 genes were believed to be secreted but the final location could not be determined based on available data and they were classified as having unknown location. The last category includes the 147 predicted secreted genes encoding some of the constant, variable, joining and diversity regions of immunoglobulin genes. After excluding the genes annotated as intracellular or membrane-bound we suggest that the human secretome consists of 1901 genes having at least one secreted protein variant. More data regarding each of the different secretome categories, can be obtained by clicking on the icons in Figure 1.

Abundance and specificity

The secretome is a rather heterogenous group of proteins regarding both abundance and tissue specificity. One of the most important secretory organs is the liver, which produces a large number of the protein secreted into the blood such as albumin, fibrinogen and transferrin. Highly abundant secreted proteins also include pancreatic enzymes (PRSS1, CELA3A, AMY2A) and other digestive enzymes expressed in salivary gland (PRR4, STATH, ZG16B) or stomach (PGA3, PGA4), and the family of defensin proteins secreted by glandular cells in epididymis (DEFB118, DEFB106A and DEFB129). More information on the estimated abundance in blood plasma for proteins belonging to the Secreted in blood category can be found here for immunoassay data and here for MS data.


CELA3A

CPA1

AMY1B

Figure 2. Immunohistochemistry-based images from the secreted proteins CELA3A (Chymotrypsin-like elastase family, member 3A) in pancreas, CPA1 (Carboxypeptidase 1) in pancreas and AMY1B (amylase alpha 1B) in salivary gland.


Gene expression profiles within the different secretome categories were classified based on transcriptomics analysis across all major organs and tissue types in the human body as shown in the bar plot in Figure 3. Read more about the different categories of expression here.

0%10%20%30%40%50%60%70%80%90%100%Secreted to bloodSecreted to extracellular matrixSecreted to digestive systemSecreted in male reproductive systemSecreted in female reproductive systemSecreted in brainSecreted in other tissuesIntracellular and membraneSecreted - unknown locationImmunoglobulin genesNot detectedLow tissue specificityTissue enhancedGroup enrichedTissue enriched

Figure 3. The tissue specificity of genes belonging to the different secretome categories. Read more about RNA based specificity categorisation here. Gene lists corresponding to different specificity categories in the secretome groups can be obtained by clicking in the bar chart.

Prediction of transmembrane protein topology and signal peptides

Developing a better understanding of membrane protein structure and function is of immense importance for both biological and pharmacological purposes. Since membrane proteins are difficult to crystallize and severely underrepresented in structural databases, computational prediction of membrane protein structure has been crucial for continued studies of these key molecules. Most membrane protein prediction methods have focused on the topology of a-helical membrane proteins, i.e. the prediction of the position of the transmembrane (TM) segments in the protein sequence and their orientation relative to the membrane as illustrated below in the schematic view of the topology of an alpha-helical membrane protein with four transmembrane segments and extracellular N- and C-terminals.


The TM segments are identified based on features such as length, amino acid property and hydrophobicity, and many prediction methods are based on machine-learning or deep learning techniques. Here, a selection of eight prediction algorithms was used to create a majority decision-based method (MDM), using the combined results from the chosen tools, to estimate the human membrane proteome. Each protein with at least one TM segment with overlapping predictions by four out of the eight methods is considered a membrane protein. Table 1 shows the individual results in number of predicted protein-coding genes by each method, as well as the MDM prediction.

Table 1. Prediction of the human membrane proteome by eight different prediction methods for membrane protein topology as well as the majority decision-based method MDM and a method specialized in prediction of GPCRs.

Protein class
Number of genes
Number of proteins
Source
Predicted membrane proteins 5573 17561 MDM
DeepTMHMM predicted membrane proteins 5064 16339 DeepTMHMM
MEMSAT3 predicted membrane proteins 7505 23235 MEMSAT3
MEMSAT-SVM predicted membrane proteins 6459 20546 MEMSAT-SVM
Phobius predicted membrane proteins 5883 18427 Phobius
SCAMPI predicted membrane proteins 6560 19551 SCAMPI
SPOCTOPUS predicted membrane proteins 7826 25107 SPOCTOPUS
THUMBUP predicted membrane proteins 7290 23205 THUMBUP
TMHMM predicted membrane proteins 5648 17661 TMHMM
GPCRHMM predicted membrane proteins 856 1536 GPCRHMM

The N-terminal signal sequences that are found in most secreted proteins and some types of membrane proteins are often called signal peptides (SP). A signal peptide is primarily identified by a short hydrophobic alpha-helix combined with a number of features that enables computational prediction based on the amino acid sequence of the protein. There are also a number of methods which incorporate a SP prediction model into their TM topology prediction algorithm to enables more reliable results when it comes to distinguishing between the two features. Here the existence of signal peptides was based on a whole-proteome scan using five methods for signal peptide prediction: SignalP6.0, Phobius, SPOCTOPUS, DeepTMHMM and DeepSig, which all have been shown to give reliable prediction results in a comparative analysis. Similarly to the MDM, a majority decision-based method for secreted proteins (MDSEC) was constructed using results from the five different prediction methods. All proteins with a predicted SP by at least three of the five methods were considered to have a signal peptide. Since signal peptides are found both in secreted proteins and in certain types of membrane proteins, proteins with a predicted SP in combination with a predicted TM region according to the MDM were considered membrane-spanning and excluded from the group of predicted secreted proteins.The remaining predicted secreted proteins were further annotated in order to exclude genes that are predicted to reside in intracellular locations such as ER or Golgi, despite having a signal peptide prediction.The resulting numbers of genes encoding a predicted secreted protein are shown in Table 2.

Table 2. Prediction of the human secretome by three different prediction methods for signal peptides as well as the MDSEC and the final prediction resulting from the annotation of the human secretome.

Protein class
Number of genes
Number of proteins
Source
Predicted secreted proteins 1895 4998 HPA
Secreted proteins predicted by MDSEC 2758 6454 HPA
DeepTMHMM predicted secreted proteins 3227 7246 DeepTMHMM
DeepSig predicted secreted proteins 2559 5894 DeepSig
SignalP predicted secreted proteins 2491 5799 SignalP
Phobius predicted secreted proteins 3616 8477 Phobius
SPOCTOPUS predicted secreted proteins 3947 8982 SPOCTOPUS

The secretory pathway

Most secreted proteins are secreted via the secretory pathway, as illustrated below, but interestingly some proteins including cytokines such as interleukin 1β (IL1B) and mitogens such as fibroblast growth factor 2 (FGF2) are secreted in a non-classical manner without entering the ER/Golgi-pathway. In the secretory pathway proteins with a signal sequence that guides them to the endoplasmic reticulum (ER) are transported from the ER through the Golgi apparatus via vesicles to arrive at the surface of the cell. The signal sequence targeting proteins for secretion, called a signal peptide, is a short, hydrophobic N-terminal sequence which is inserted into the ER membrane and subsequently cleaved off from the protein. Membrane proteins may also contain a SP, but most often the N-terminal transmembrane (TM) region function as the signal sequence. The ER signal sequences are recognized by chaperone proteins which guide the synthesizing ribosomes to the rough ER where translocation of the protein sequence occurs in a protein complex named the translocon. Membrane proteins are transferred to the lipid bilayer of the ER membrane via the translocon whereas secretory proteins are transported into the ER lumen. Once inside the ER lumen, other chaperone proteins make sure that the protein is folded and assembled correctly and the oxidative environment enables formation of disulfide bonds, addition of carbohydrates and proteolytic cleavages. The proteins that pass the ER quality control are transported via vesicles to the Golgi apparatus, where they are further modified in important processes such as glycosylation and phosphorylation. The Golgi is also responsible for sorting of proteins for transport to their final destination, which most often is the plasma membrane, lysosomes or secretion out from the cell.


Figure 4. Overview of the secretory pathway.

Relevant links and publications

Uhlén M et al., The human secretome. Sci Signal. (2019)
PubMed: 31772123 DOI: 10.1126/scisignal.aaz0274

Fagerberg L et al., Prediction of the human membrane proteome. Proteomics. (2010)
PubMed: 20175080 DOI: 10.1002/pmic.200900258

Uhlén M et al., Tissue-based map of the human proteome. Science (2015)
PubMed: 25613900 DOI: 10.1126/science.1260419