Human Disease Blood Atlas - Method Summary

Summary

A comprehensive characterization of the blood proteome profiles in patients with various diseases can contribute to a better understanding of the disease etiology, resulting in earlier diagnosis, risk stratification and better monitoring of the disease progression. Precision Medicine thus aims to allow for an individualized diagnosis, treatment and monitoring of patients, including the use of molecular tools such as genomics, proteomics and metabolomics. In the first version of the Disease Blood Atlas, a pan-cancer study covering 12 major cancer types is reported.

Key publication

Álvez MB et al. (2023) "Next generation pan-cancer blood proteome profiling using proximity extension assay" Nat Commun 14, 4308 (2023).

What can you learn from the Disease Blood Atlas?

Learn about

  • comprehensive and precise protein levels in blood covering all major diseases
  • proteins associated with each of the analyzed cancers

How has the Proximity Extension Assay data been generated?

Next Generation Blood Profiling has been used combining antibody-based proximity extension assay with next generation sequencing (Wik L et al. (2021)) to allow the exploration of the protein concentrations in blood from patients with different cancers.Plasma profiles of 1463 proteins from more than 1400 cancer patients representing altogether 12 common cancer types (Figure 1) were measured in minute amounts of blood plasma collected at the time of diagnosis and before treatment. To investigate the cancer-specific proteome profiles, differential expression analyses were performed by comparing each cancer to all other cancers in the study. For the male and female cancers, only samples with the same sex were compared. The up- and down-regulated proteins in each cancer are summarized in the volcano plots displayed in the sections for the different cancers, and highlighting the most significantly differentially expressed proteins. The results for all cancer patients for each protein target are presented on the individual gene pages.


Figure 1. Age distribution and number of patients for each cancer type included in the study.

AI-based disease prediction models was used to identify sets of proteins associated with each of the analyzed cancers. The aim of the protein panel is to distinguish plasma protein profiles from different cancers and by combining the results from all cancer types, a panel of proteins (see Table 1 below) was selected suitable for the identification of the 12 different cancer types.To identify proteins relevant for each cancer type, a disease prediction model was built for each cancer type respectively, using all measured proteins (n= 1463) and 70% of the cancer patients as the training set. The control group in each model was composed of all the other cancer samples and was subsampled to include a similar number of patients to the modelled cancer. Here, we show the results obtained from the algorithm regularized generalized linear model (glmnet), which gives an estimation of the overall importance of each protein to the model (range 0-100%). The lollipop plots found in the sections for the different cancers types show the top 10 most important proteins resulting from the model for the classification of that specific cancer type.




Figure 2. Overview of the workflow used to identify a pan-cancer biomarker panel for cancer classification.

Table 1. The 83 proteins used for identification of 12 different cancer types .

Cancer Protein Importance p.adjusted NPX fold change
Acute myeloid leukemia CD244 100.0 8.8e-14 1.5
Acute myeloid leukemia FLT3 98.7 9.8e-23 3.3
Acute myeloid leukemia TNFSF13B 60.4 4.6e-10 1.8
Breast cancer PRTG 64.9 1.1e-10 0.2
Breast cancer LAMP3 58.1 8.5e-3 0.2
Breast cancer SDC4 56.1 1.5e-10 0.6
Breast cancer OXT 56.1 9.6e-4 0.6
Breast cancer HSD11B1 53.2 4.6e-3 0.1
Breast cancer BTC 52.3 4.0e-4 0.4
Breast cancer LPL 51.7 6.6e-5 0.2
Breast cancer MSMB 51.6 2.9e-2 0.2
Cervical cancer GLO1 78.1 7.5e-5 0.4
Cervical cancer CHRDL2 75.6 7.7e-6 0.4
Cervical cancer FCGR3B 69.1 4.8e-4 0.2
Cervical cancer CRNN 67.9 3.3e-5 0.5
Cervical cancer AGER 60.5 3.1e-4 0.2
Show allShow less

How has the Targeted Proteomics data been generated?

Targeted proteomics is a bottom-up proteomics approach that uses proteases, most commonly trypsin, to digest proteins into peptides that can be measured by liquid chromatography-tandem mass spectrometry (LC-MS/MS). This quantitative strategy is an excellent tool for performing measurements with high reproducibility and precision, making it appropriate for quantifying proteins in cells, tissues and blood.

Targeted proteomics, as opposed to the widely used data-dependent acquisition (DDA), also known as shotgun proteomics, works with a defined collection of peptides and builds on prior knowledge about the analytes. Generally, a peptide quantification can be either relative or absolute. Relative quantification is a method for describing the amount of an analyte in proportion to another measurement of the same analyte across several biological samples or across two groups, as in case-control studies. Absolute concentrations can be measured by the addition of heavy-labelled standards in known amounts during the sample preparation workflow. Using heavy labelled standards can also considerably increase consistency and precision and it can be done at a large scale by adding either isotope-labeled peptides or protein standards.

A quantitative strategy based on heavy isotope-labeled PrESTs was originally developed as a collaborative effort between Professor Matthias Mann and Professor Mathias Uhlén (Zeiler M et al. (2012)). They introduced the multiplex PrEST-SILAC quantitative approach. This quantitative workflow was based on shotgun proteomics and had the benefit of being relatively simple to execute and straightforward to work with. The addition of stable isotope labeled (SIS) PrESTs, combined with a mass spectrometry readout, can be used in almost any MS setup and analysis modes, including both targeted (SRM, MRM, PRM, DIA) and untargeted (DDA) modes of operation. The standards are added to the sample at the initial stage in the proteomics workflow and therefore, they can account for potential digestion biases as they generate the the same prototypic peptides (Figure 3) and mimic the exact amino acid repertoire of the endogenous protein. This is otherwise a common source of errors that is affecting almost every LC-MS/MS sample preparation workflow and can be very hard to control for unless the protein standard is cleaved together with the endogenous sample.


Figure 3. The standard's N-terminal sequence enables affinity purification and measurement. The C-terminal portion contains 50–150 human amino acids. Each standard contains numerous tryptic peptides that can be used to measure an unknown sample's target protein.

Each SIS-PrEST standard is fully labeled with 13C and 15N enriched arginine and lysine, and the protein sequence used for quantification span shorter amino acid sequences (50-150 aa) representative of the target protein of interest (Figure 4).

In the Disease Atlas, 273 SIS-PrESTs were spiked in known concentrations directly into undepleted human blood plasma from 1,469 cancer patients. The spiked amount were tuned to be as close to a 1:1 ratio with the endogenous proteins as possible. This increases the analytical precision during a one-point calibration-based quantification of the endogenous proteins. The quantitative peptides were selected using the lowest coefficient of variation and highest frequency of detection as selection criteria, while the single best-performing peptide per protein was used.


Figure. 4. Targeted Proteomics workflow using SIS-PrESTs. Production of Standards: PrESTs from the human protein atlas are labeled in high-throughput with heavy Arginine (Arg10) and Lysine (Lys8) amino acid residues. Each PrEST fragment can be individually quantified by the common Q-Tag sequence (also used for purification). Assay Generation: Heavy peptides originating from the PrEST sequence are used to establish targeted assays. The quantitative range is defined, and the protein level in healthy plasma is determined in a pool of healthy volunteers. Targeted Proteomics: SIS-PrESTs are spiked directly into non-depleted human plasma collected from cancer patients and act as internal standards throughout the workflow. Quantitative Mass Spectrometry: Endogenous peptides from each patient is measured together with the spiked internal standard. The known amount of spiked standard is used to calculate the absolute concentration of each protein analyte.

What is presented in the section?

The protein levels for all cancer patients for each protein target, together with information on whether the target is upregulated in any of the diseases and/or is included in any disease prediction model, are presented on the individual gene summary pages in the Human Protein Atlas.