The lung adenocarcinoma proteome

Lung cancer is the most prevalent cancer in the world and the leading cause of cancer-related deaths. Smoking is accepted as the major risk factor, responsible for 70-90% of all lung cancer cases, although the etiology of lung cancer appears multifactorial with both environmental and genetic factors playing a role. Lung cancer patients have a poor outcome with a 5-year survival rate of 13.6% among men and 19.4% among women across all stages. The poor prognosis is partly explained by late diagnosis, but also by lack of effective treatments.

Based on histology, lung cancer is primarily divided into small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC). SCLC originates from neuroendocrine cells and accounts for approximately 15% of all primary lung cancers. This extremely rapidly proliferating cancer is generally treated with chemotherapy with an initial good response which unfortunately in most cases is followed by resistance to treatment and poor survival outcome.

NSCLC is suggested to originate from bronchogenic or alveolar cells. It is the most common form of primary lung cancer and represents approximately 80-85% of all lung cancer cases. Based on histology, NSCLC can further be divided into different subtypes, with adenocarcinoma and squamous cell carcinoma being most common. Treatment for NSCLC is mainly based on the tumor extent. In principle, limited stage tumors are surgically treated, sometimes with the addition of chemotherapy and radiotherapy whereas tumors with advanced stages are palliatively treated with a combination of cytotoxic drugs and recently developed targeted drugs. Unfortunately, the treatment effect is limited and the majority of patients experience only modest survival prolongation.

Here, we explore the lung adenocarcinoma proteome using TCGA transcriptomics data and antibody-based protein data. 2162 genes are suggested as prognostic based on transcriptomics data from 497 patients; 1738 genes are associated with unfavorable prognosis and 425 genes are associated with favorable prognosis.

TCGA data analysis

In this metadata study we used data from TCGA where transcriptomics data was available from 497 patients with lung adenocarcinoma. The total dataset included 269 female and 228 males. Most of the patients (317 patients) were still alive at the time of data collection. The stage distribution was stage i) 267 patients, stage ii) 118 patients, stage iii) 80 patients, stage iv) 25 patients and 7 patients with missing stage information.

Unfavorable prognostic genes in lung adenocarcinoma

For unfavorable genes, higher relative expression levels at diagnosis give significantly lower overall survival for the patients. There are 1738 genes associated with an unfavorable prognosis in lung adenocarcinoma, among these potential prognostic genes there are 245 genes that were validated in a separate study. In Table 1, the top 20 most significant genes related to an unfavorable prognosis are listed.

S100A16 is a gene associated with an unfavorable prognosis in lung adenocarcinoma. The best separation is achieved by an expression cutoff at 261 TPM which divides the patients into two groups with 31% 5-year survival for patients with high expression versus 46% for patients with low expression, p-value: 2.17e-7. The protein S100A16 is encoded by S100A16 and is a calcium-binding protein that enhances adipogenesis and reduces insulin-stimulated glucose uptake. Immunohistochemical staining using an antibody targeting S100A16 (HPA045841) shows a differential expression pattern in lung adenocarcinoma samples.

p<0.001
S100A16 - survival analysis

S100A16 - high expression

S100A16 - low expression

ANLN is another gene associated with an unfavorable prognosis in lung adenocarcinoma in two separate independent cohorts. The best separation is achieved by an expression cutoff at 13 TPM which divides the patients into two groups with 29% 5-year survival for patients with high expression versus 46% for patients with low expression, p-value: 9.34e-8. The TCGA data analysis was validated in a separate study with the p-value: 1.34e-4. The protein Anillin is encoded by ANLN and is required for the cytokinesis. Anillin may also play a role in the bleb assembly during metaphase and anaphase of mitosis. Immunohistochemical staining using an antibody targeting ANLN (CAB062547) shows a differential expression pattern in lung adenocarcinoma samples.

p<0.001
ANLN - survival analysis

ANLN - high expression

ANLN - low expression

Table 1. The 20 genes with highest significance associated with an unfavorable prognosis in lung adenocarcinoma.

Gene Description Predicted location mRNA (cancer) p-value Prognostic
DKK1 Dickkopf WNT signaling pathway inhibitor 1 Secreted 10.5 4.83e-10 validated
SLC2A1 Solute carrier family 2 member 1 Membrane, Intracellular 16.5 4.02e-9 validated
EIF5A Eukaryotic translation initiation factor 5A Intracellular 195.7 8.62e-9 potential
TACC3 Transforming acidic coiled-coil containing protein 3 Intracellular 20.6 3.67e-8 validated
MCM5 Minichromosome maintenance complex component 5 Intracellular 32.1 1.29e-7 potential
Show allShow less

Favorable prognostic genes in lung adenocarcinoma

For favorable genes, higher relative expression levels at diagnosis give significantly higher overall survival for the patients. There are 425 genes associated with a favorable prognosis in lung adenocarcinoma, among these potential prognostic genes there are 5 genes that were validated in a separate study. In Table 2, the top 20 most significant genes related to a favorable prognosis are listed.

CLIC6 is a gene associated with a favorable prognosis in lung adenocarcinoma. The best separation is achieved by an expression cutoff at 12 TPM which divides the patients into two groups with 44% 5-year survival for patients with high expression versus 31% for patients with low expression, p-value: 5.50e-5. Immunohistochemical staining using an antibody targeting CLIC6 (HPA065285) shows a differential expression pattern in lung adenocarcinoma samples.

p<0.001
CLIC6 - survival analysis

CLIC6 - high expression

CLIC6 - low expression

SFTPB is another gene associated with a favorable prognosis in lung adenocarcinoma. The best separation is achieved by an expression cutoff at 846 TPM which divides the patients into two groups with 46% 5-year survival for patients with high expression versus 30% for patients with low expression, p-value: 1.14e-4. Immunohistochemical staining using an antibody targeting SFTPB (CAB002440) shows a differential expression pattern in lung adenocarcinoma samples.

p<0.001
SFTPB - survival analysis

SFTPB - high expression

SFTPB - low expression

Table 2. The 20 genes with highest significance associated with a favorable prognosis in lung adenocarcinoma.

Gene Description Predicted location mRNA (cancer) p-value Prognostic
GNG7 G protein subunit gamma 7 Intracellular 2.3 7.66e-6 potential
EPHX1 Epoxide hydrolase 1 Intracellular 301.5 1.22e-5 potential
PPP1R13B Protein phosphatase 1 regulatory subunit 13B Intracellular 8.4 1.68e-5 potential
GIMAP7 GTPase, IMAP family member 7 Intracellular 13.6 1.96e-5 potential
FUT1 Fucosyltransferase 1 (H blood group) Membrane 3.9 3.35e-5 potential
Show allShow less

CPTAC relative protein expression data

Proteins that are significantly down- or upregulated in lung adenocarcinoma compared to normal tissue is illustrated in a vulcano plot using tandem mass tag (TMT) mass spectrometry data from the CPTAC dataset based on the analysis of 111 tumor samples and 102 normal samples.

In lung adenocarcinoma, 2837 and 3040 genes are down- (blue) and upregulated (red) compared to normal tissue, respectively. In Table 3, the top 20 most significant genes are listed.

Figure 1. Proteins highlighted in blue are significantly downregulated in cancer tissue, while those in red are significantly upregulated when compared to normal tissue. Gray points represent non-significant proteins based on the log2 (fold change). Wilcox rank test with adjusted p values.

Table 3. The 20 genes with the highest significance associated with a downregulated or upregulated protein expression in lung adenocarcinoma compared to normal tissue.

Gene Description Predicted location Log2 fold change p-value Regulation
SHANK3 SH3 and multiple ankyrin repeat domains 3 Intracellular -1.24 4.52e-36 Downregulated
EHD2 EH domain containing 2 Intracellular -1.93 4.65e-36 Downregulated
CAVIN1 Caveolae associated protein 1 Intracellular -1.79 4.65e-36 Downregulated
HSPA12B Heat shock protein family A (Hsp70) member 12B Intracellular -1.72 4.65e-36 Downregulated
EHD4 EH domain containing 4 Intracellular -1.1 4.65e-36 Downregulated
Show allShow less

The lung adenocarcinoma transcriptome

The transcriptome analysis shows that 69% (n=13990) of all human genes (n=20162) are expressed in lung adenocarcinoma. All genes were classified according to the lung adenocarcinoma-specific expression into one of five different categories, based on the ratio between mRNA levels in lung adenocarcinoma compared to the mRNA levels in the other 16 analyzed cancer tissues.

Figure 2. The distribution of all genes across the five categories based on transcript abundance in lung adenocarcinoma as well as in all other cancer tissues.

165 genes show some level of elevated expression in lung adenocarcinoma compared to other cancers (Figure 1). The elevated category is further subdivided into three categories as shown in Table 3.

Table 4. The number of genes in the subdivided categories of elevated expression in lung adenocarcinoma.

Distribution in the 31 cancers
Detected in singleDetected in someDetected in manyDetected in all Total
Specificity
Cancer enriched 43100 17
Group enriched 023245 52
Cancer enhanced 833505 96
Total 12598410 165

Additional information

The histological classification of NSCLC is important for treatment options. The most common subtype of NSCLC is adenocarcinoma, comprising around 40% of all lung cancers. Adenocarcinoma is characterized by glandular formation, production of mucin and expression of thyroid transcription factor-1. It is the predominant histological type among younger men, women of all ages and in former and never smokers.

Squamous cell carcinoma, the second most common subtype of NSCLC, is suggested to originate from metaplastic squamous epithelia in the bronchial tree. It is defined by a variable degree of squamous differentiation, such as keratinization or intercellular bridging. This subtype is strongly associated with cigarette smoking. Large cell carcinoma accounts for 5-10% of all lung cancers and is a heterogenous group with no evidence of squamous or adenocarcinoma differentiation.

In addition to these three main subtypes of NSCLC, other less common e.g adenosquamous carcinoma and sarcomatoid carcinoma comprise the remaining NSCLC cases.

Relevant links and publications

Uhlen M et al., A pathology atlas of the human cancer transcriptome. Science. (2017)
PubMed: 28818916 DOI: 10.1126/science.aan2507

Cancer Genome Atlas Research Network et al., The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet. (2013)
PubMed: 24071849 DOI: 10.1038/ng.2764

UhlĂ©n M et al., Tissue-based map of the human proteome. Science (2015)
PubMed: 25613900 DOI: 10.1126/science.1260419

Lindskog C et al., The lung-specific proteome defined by integration of transcriptomics and antibody-based profiling. FASEB J. (2014)
PubMed: 25169055 DOI: 10.1096/fj.14-254862

Histology dictionary - Lung cancer