Secreted proteins

Secreted proteins, together forming the secretome, can be defined as proteins that are actively transported out of the cell. In humans, cells such as endocrine cells and B-lymphocytes are specialized in protein secretion, but all cells secrete proteins to a certain extent. Proteins that are secreted from the cell play a crucial role in many physiological, developmental and pathological processes and are important for both intercellular and intracellular communication. In addition to being a rich source of new therapeutics and drug targets, a large fraction of the blood diagnostic tests used in the clinic are directed towards secreted proteins, emphasizing the importance of this class of proteins for medicine and biology. Medically important secreted proteins include cytokines, coagulation factors, growth factors and other signaling molecules. We predict that 1895 proteins, or 9% of the human proteome, are secreted based on results from multiple prediction methods.

Function of the secretory pathway

The most common pathway for transportation of proteins out of the cell is the secretory pathway (Figure 1). Newly synthetized proteins are transported from the endoplasmic reticulum (ER), passing the Golgi apparatus and packed into vesicles. The vesicles are then transported to the plasma membrane, where membrane fusion results in release of the proteins into the extracellular space (exocytosis). The signal sequence that targets proteins destined for secretion to the ER is called a signal peptide (SP) and consists of a short hydrophobic N-terminal sequence (von Heijne G. (1985)). However, SPs can also be present in proteins that are destined for compartments in the secretory pathway, such as the ER, the Golgi apparatus, different kind of vesicles and the plasma membrane. The SP is recognized by chaperone proteins, together forming a signal recognition particle (SRP), that guides the synthesizing ribosomes to the rough ER. At the ER membrane co-translational translocation of the newly synthesized peptide occurs with the help of a protein complex referred to as the translocon, resulting in release of the newly synthesized protein into the ER lumen (Johnson AE et al. (1999)). Proteins that pass the quality control in the ER lumen are transported via vesicles to the Golgi apparatus, where they are further modified and sorted for transport to their final destination, most commonly the plasma membrane, lysosomes or secretion to the extracellular space.


Figure 1. Overview of the secretory pathway.

The functions of secreted proteins are diverse, but cell signaling is an important example. Signaling between or within cells via secreted signaling molecules can be paracrine, autocrine, endocrine or neuroendocrine depending on the target. Among the most important signaling proteins are cytokines, kinases, hormones and growth factors (Farhan H et al. (2011)).

A large fraction of the clinically approved treatment regimens today use drugs directed towards (or consisting of) secreted proteins or cell surface-associated membrane proteins. Out of the 854 protein targets with known pharmacological action for approved drugs on the market at present (Wishart DS et al. (2006)), 120 are predicted to be secreted.

Secreted proteins are often enriched in the organelles of the secretory pathway (ER, Golgi apparatus, vesicles), before they are released to the extracellular matrix. This enables a detection of the protein by IF, although their final destination lies outside of the cell. In Figure 2, IF images of three predicted secreted proteins are shown.


CHGB - SH-SY5Y

SCG3 - SH-SY5Y

NPY - SH-SY5Y

Figure 2. Examples of three different predicted secreted proteins are shown in the neuron-like SH-SY5Y cell line: CHGB and SCG3 are found in secretory vesicles, while NPY is enriched in the Golgi apparatus.

Prediction of secreted proteins

Secreted proteins can often be identified based on their SPs, which have a number of features suitable for computational prediction models. The SP is typically 15-30 amino acids long and primarily recognized by a short hydrophobic and mostly positive N-terminal alpha-helix (n-region) combined with a hydrophobic h-region and a C-terminal polar uncharged c-region (Emanuelsson O et al. (2007)). There are many algorithms which use these features to predict the presence of SPs in proteins, and there are also a number of methods which incorporate a SP prediction model into transmembrane (TM) topology prediction algorithms, to allow for more reliable results when it comes to distinguishing an SP and a TM segment.

The human 'secretome' can be defined as all genes encoding at least one secreted protein and has been analyzed here by performing a whole-proteome scan using three methods for SP prediction: SignalP4.0 (Petersen TN et al. (2011)), Phobius (Käll L et al. (2004)) and SPOCTOPUS (Viklund H et al. (2008)), which have all been shown to give reliable prediction results in comparative analyses. A majority decision-based method (MDSEC) has been constructed using the results from the three different SP prediction methods to obtain a list of predicted secreted proteins (Uhlén M et al. (2015)). We selected all proteins with a predicted SP by at least two of the three methods and annotated these further in order to exclude genes that are predicted to reside in intracellular locations such as ER or Golgi, despite having a signal peptide prediction. In order to achieve this, the genes predicted to encode proteins with an SP were filtered using the majority decision-based method (MDM) for membrane protein topology prediction (Fagerberg L et al. (2010)). All proteins with a predicted SP in combination with a predicted TM region according to the MDM were considered membrane-spanning and therefore not secreted. The resulting numbers of genes encoding predicted secreted proteins based on the three methods as well as the majority-decision based method and the result from annotation of the secretome are shown in Table 1. The resulting lists of predicted secreted proteins as well as predicted membrane proteins were used in our classification of the human proteome.

Table 1. Prediction of the human secretome by three different prediction methods for signal peptides as well as the MDSEC and the final prediction resulting from manual annotation.

Protein class Number of genes Number of proteins Source
Predicted secreted proteins 1895 4998 HPA
Secreted proteins predicted by MDSEC 2758 6454 HPA
SignalP predicted secreted proteins 2491 5799 SignalP
Phobius predicted secreted proteins 3616 8477 Phobius
SPOCTOPUS predicted secreted proteins 3947 8982 SPOCTOPUS

Expression levels of secreted proteins in tissue

An analysis of tissue distribution categories based on RNA-sequencing data shows that a larger fraction of the genes encoding secreted proteins belongs to the tissue enhanced, tissue enriched or group enriched genes, compared to all genes presented in the subcellular resource (Uhlén M et al. (2015)) (Figure 3). Only a relatively small portion of the genes in the secretome show low tissue specificity. This is in agreement with the tissue specific functions for many secreted proteins. The secreted class contains many of the most abundantly expressed genes and the highest expression levels of secreted proteins are found in pancreas and salivary gland.

Figure 3. Bar plot showing the percentage of genes in different tissue specificity categories for secreted protein-coding genes, compared to all genes in the subcellular resource. Asterisk marks a statistically significant deviation (p≤0.05) in the number of genes in a category based on a binomial statistical test. Each bar is clickable and gives a search result of proteins that belong to the selected category.

Relevant links and publications

von Heijne G., Signal sequences. The limits of variation. J Mol Biol. (1985)
PubMed: 4032478 

Johnson AE et al., The translocon: a dynamic gateway at the ER membrane. Annu Rev Cell Dev Biol. (1999)
PubMed: 10611978 DOI: 10.1146/annurev.cellbio.15.1.799

Farhan H et al., Signalling to and from the secretory pathway. J Cell Sci. (2011)
PubMed: 21187344 DOI: 10.1242/jcs.076455

Wishart DS et al., DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res. (2006)
PubMed: 16381955 DOI: 10.1093/nar/gkj067

Emanuelsson O et al., Locating proteins in the cell using TargetP, SignalP and related tools. Nat Protoc. (2007)
PubMed: 17446895 DOI: 10.1038/nprot.2007.131

Petersen TN et al., SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Methods. (2011)
PubMed: 21959131 DOI: 10.1038/nmeth.1701

Käll L et al., A combined transmembrane topology and signal peptide prediction method. J Mol Biol. (2004)
PubMed: 15111065 DOI: 10.1016/j.jmb.2004.03.016

Viklund H et al., SPOCTOPUS: a combined predictor of signal peptides and membrane protein topology. Bioinformatics. (2008)
PubMed: 18945683 DOI: 10.1093/bioinformatics/btn550

Uhlén M et al., Tissue-based map of the human proteome. Science (2015)
PubMed: 25613900 DOI: 10.1126/science.1260419

Fagerberg L et al., Prediction of the human membrane proteome. Proteomics. (2010)
PubMed: 20175080 DOI: 10.1002/pmic.200900258