Human Disease Blood Atlas - Method Summary Summary Key publications What can you learn from the Disease Blood Atlas?Data overview How was the Proximity Extension Assay data generated?How was the Targeted Proteomics data generated?What is presented in the section?

Human Disease Blood Atlas - Method Summary

Summary

A comprehensive characterization of the blood proteome profiles in patients with various diseases can contribute to a better understanding of the disease etiology, resulting in earlier diagnosis, risk stratification and better monitoring of the disease progression. Connecting the dynamics of the plasma proteome to functionality across conditions could work as a window into their biology and mechanisms and broaden the horizon for new treatments. Precision Medicine thus aims to allow for an individualized diagnosis, treatment and monitoring of patients, including the use of molecular tools such as genomics, proteomics and metabolomics. Technologies such as Proximity Extension Assay and Targeted Mass Spectrometry are well equipped to do this. In the first version of the Human Disease Blood Atlas, a pan-cancer study covering 12 major cancer types was reported. In the current version, protein profiles for 59 diseases are presented.

Key publications

Álvez MB et al. (2023) "Next generation pan-cancer blood proteome profiling using proximity extension assay" Nat Commun 14, 4308 (2023).

Kotol D et al. (2023) "Absolute quantification of pan-cancer plasma proteomes reveals unique signature in multiple myeloma" Cancers 15(19), 4764 (2023).

What can you learn from the Disease Blood Atlas?

Learn about

comprehensive and precise protein levels in blood covering 59 diseases
proteins associated with each of the analyzed diseases

Data overview

Data type	Count	Data	Coverage (nr genes)
Protein expression	59	Differential expression analysis across 59 diseases	1162

How was the Proximity Extension Assay data generated?

Next Generation Blood Profiling was performed by combining antibody-based proximity extension assay with next generation sequencing (Wik L et al. (2021)). This method enables the multiplex exploration of protein concentrations in blood from patients with different diseases. Plasma profiles of 1162 proteins from more than 6000 patients representing altogether 59 diseases (Figure 1) were measured in minute amounts of blood plasma collected at the time of diagnosis and before treatment. The diseases in this study belong to different classes, including cardiovascular, metabolic, cancer, psychiatric, autoimmune, infectious, and pediatric diseases.

Figure 1. Overview of pan-disease blood proteome profiling study.

To investigate disease-specific proteome profiles, differential expression analyses were conducted with the following comparisons:

Disease vs. Healthy samples: comparing each disease to healthy controls.
Disease vs. Diseases from the same class: comparing each disease to others within the same disease class.
Disease vs. All other diseases: comparing each disease to all other diseases in the study.

The models were generated using the limma R package (Ritchie ME et al. (2015)), with the folloring model covariates:

Age and sex adjustments: For general diseases, both age and sex were included as covariates in the model to control for their potential effects on protein expression.
Sex-specific diseases: For diseases identified as sex-specific, comparisons were only made between samples of the same sex. Sex was not included as a covariate in these analyses to focus on the differential expression related to the disease itself.
Pediatric diseases: In cases where pediatric diseases were compared to healthy controls, age was not included as a covariate due to the perfect correlation between age and disease status (e.g., all pediatric cases were very young and healthy controls were older). Including age as a covariate in this scenario would confound the analysis, as age directly impacts the disease classification.

This approach ensures that our analyses account for relevant biological variables while addressing specific issues related to data correlations and sample characteristics. Additionally, control samples were matched to the number of cases based on sex and age to ensure a balanced comparison and reduce potential biases in the analysis.

The up- and down-regulated proteins in each disease are summarized in the volcano plots displayed in the sections for the different diseases, and highlighting the most significantly differentially expressed proteins. The results for all diseased patients for each protein target are presented on the individual gene pages.

How was the Targeted Proteomics data generated?

Targeted proteomics is a bottom-up approach where proteases, most commonly trypsin, are used to digest proteins into peptides that can be measured by liquid chromatography-tandem mass spectrometry (LC-MS/MS). This strategy is an excellent tool for performing measurements with high reproducibility and precision, making it appropriate for quantifying proteins in cells, tissues, and blood.

Targeted proteomics, as opposed to the widely used data-dependent acquisition (DDA), also known as shotgun proteomics, works with a defined collection of peptides and builds on prior knowledge about the analytes. Generally, peptide quantification can be either relative or absolute. Relative quantification is a method for describing the amount of an analyte in proportion to another measurement of the same analyte across several biological samples or across two groups, as in case-control studies. On the other hand, absolute concentrations can be obtained by spiking samples with known amounts of heavy-labelled standards during the sample preparation workflow. Using isotope-labeled peptides or protein standards can also considerably increase consistency and precision and it can be done at a large scale.

A quantitative strategy based on heavy isotope-labeled PrESTs was originally developed as a collaborative effort between Professor Matthias Mann and Professor Mathias Uhlén (Zeiler M et al. (2012)). They introduced the multiplex PrEST-SILAC quantitative approach. This quantitative workflow was based on shotgun proteomics and had the benefit of being relatively simple to execute and straightforward to work with. The addition of stable isotope labeled (SIS) PrESTs, combined with a mass spectrometry readout, can be used in almost any MS setup and analysis mode, including both targeted (SRM, MRM, PRM, DIA) and untargeted (DDA) modes of operation. The standards are added to the sample at the initial stage in the proteomics workflow and, therefore, they can account for potential digestion biases as they generate the same prototypic peptides (Figure 3) and mimic the exact amino acid repertoire of the endogenous protein. When protein standards are not cleaved together with the endogenous proteins from the sample, this bias is a common source of errors that affects almost every LC-MS/MS sample preparation workflow and can be very hard to control for.

Figure 3. The standard's N-terminal sequence enables affinity purification and measurement. The C-terminal portion contains 50–150 human amino acids. Each standard contains numerous tryptic peptides that can be used to measure an unknown sample's target protein.

Each SIS-PrEST standard is fully labeled with 13C and 15N enriched arginine and lysine, and the protein sequence used for quantification span shorter amino acid sequences (50-150 aa) representative of the target protein of interest (Figure 4).

In the Disease Atlas, 273 SIS-PrESTs were spiked in known concentrations directly into undepleted human blood plasma from 1,469 cancer patients. The spiked amounts were tuned to be as close to a 1:1 ratio with the endogenous proteins as possible. This increases the analytical precision during a one-point calibration-based quantification of the endogenous proteins. The quantitative peptides were selected using the lowest coefficient of variation and highest frequency of detection as selection criteria, while the single best-performing peptide per protein was used.

Figure. 4. Targeted Proteomics workflow using SIS-PrESTs. Production of Standards: PrESTs from the human protein atlas are labeled in high-throughput with heavy Arginine (Arg10) and Lysine (Lys8) amino acid residues. Each PrEST fragment can be individually quantified by the common Q-Tag sequence (also used for purification). Assay Generation: Heavy peptides originating from the PrEST sequence are used to establish targeted assays. The quantitative range is defined, and the protein level in healthy plasma is determined in a pool of healthy volunteers. Targeted Proteomics: SIS-PrESTs are spiked directly into non-depleted human plasma collected from cancer patients and act as internal standards throughout the workflow. Quantitative Mass Spectrometry: Endogenous peptides from each patient is measured together with the spiked internal standard. The known amount of spiked standard is used to calculate the absolute concentration of each protein analyte.

What is presented in the section?

The protein levels for all cancer patients for each protein target, together with information on whether the target is upregulated in any of the diseases and/or is included in any disease prediction model, are presented on the individual gene summary pages in the Human Protein Atlas.