The colon adenocarcinoma proteome

Colorectal cancer is the third most common cancer in the world and the fifth leading cause of cancer-related mortality. Environmental factors, including meat consumption, have been identified as important risk factors. The overall mortality is approximately 50%. The surgical stage at diagnosis is the most important factor for predicting prognosis and the survival rate varies greatly depending on the stage. The 5-year survival rate is more than 90% for stage I and less than 10% for stage IV. Most colorectal cancer cases are detected at an advanced stage. Bleeding and hematochezia are two of the most common symptoms associated with rectal lesions.

Colorectal cancer is considered to originate from normal colon epithelium that develops into precursor lesions termed adenomas that subsequently may progress to invasive colorectal adenocarcinomas with metastatic potential. Colorectal cancer is divided into two subtypes, colon adenocarcinomas (COAD) and rectum adenocarcinomas (READ), depending on the site of the tumor.

Here, we explore the colon adenocarcinoma proteome using TCGA transcriptomics data and antibody-based protein data. 606 genes are suggested as prognostic based on transcriptomics data from 254 patients; 287 genes are associated with unfavorable prognosis and 321 genes are associated with favorable prognosis.

TCGA data analysis

In this metadata study, we used data from TCGA where transcriptomics data was available from 254 patients in total, 111 females and 143 males. A majority of the patients (193 patients) were still alive at the time of data collection. The stage distribution was stage i) 39 patients, stage ii) 98 patients, stage iii) 72 patients, stage iv) 35 patients and 10 patients with missing stage information.

Unfavorable prognostic genes in colon adenocarcinoma

For unfavorable genes, higher relative expression levels at diagnosis give significantly lower overall survival for the patients. There are 287 genes associated with an unfavorable prognosis in colon adenocarcinoma, among these potential prognostic genes there are 2 genes that were validated in a separate study. In Table 1, the top 20 most significant genes related to an unfavorable prognosis are listed.

ARHGAP4 is a gene associated with unfavorable prognosis in colon adenocarcinoma. The best separation is achieved by an expression cutoff at 23 TPM which divides the patients into two groups with 41% 5-year survival for patients with high expression versus 78% for patients with low expression, p-value: 1.15e-4. Immunohistochemical staining using an antibody targeting ARHGAP4 (HPA001012) shows a differential expression pattern in colon adenocarcinoma samples.

p<0.001
ARHGAP4 - survival analysis

ARHGAP4 - high expression

ARHGAP4 - low expression

DPP7 is another gene associated with an unfavorable prognosis incolon adenocarcinoma in two separate independent cohorts. The best separation is achieved by an expression cutoff at 64 TPM which divides the patients into two groups with 44% 5-year survival for patients with high expression versus 73% for patients with low expression, p-value: 4.71e-5. The TCGA data analysis was validated in a separate study with the p-value: 2.34e-4. Immunohistochemical staining using an antibody targeting DPP7 (HPA021282) shows a differential expression pattern in colon adenocarcinoma samples.

p<0.001
DPP7 - survival analysis

DPP7 - high expression

DPP7 - low expression

Table 1. The 20 genes with highest significance associated with an unfavorable prognosis in colon adenocarcinoma.

Gene Description Predicted location mRNA (cancer) p-value Prognostic
DPP7 Dipeptidyl peptidase 7 Secreted 60.0 4.71e-5 validated
NUDT1 Nudix hydrolase 1 Intracellular 48.9 6.81e-5 potential
PCBP3 Poly(rC) binding protein 3 Intracellular 1.8 3.44e-4 potential
LZTS1 Leucine zipper tumor suppressor 1 Intracellular 1.5 5.08e-4 validated
AIMP2 Aminoacyl tRNA synthetase complex interacting multifunctional protein 2 Intracellular 45.9 6.75e-4 potential
Show allShow less

Favorable prognostic genes in colon adenocarcinoma

For favorable genes, higher relative expression levels at diagnosis give significantly higher overall survival for the patients. There are 321 genes associated with a favorable prognosis in colon adenocarcinoma, among these potential prognostic genes there are 3 genes that were validated in a separate study. In Table 2, the top 20 most significant genes related to a favorable prognosis are listed.

ABCD3 is a gene associated with a favorable prognosis in colon adenocarcinoma. The best separation is achieved by an expression cutoff at 15 TPM which divides the patients into two groups with 74% 5-year survival for patients with high expression versus 28% for patients with low expression, p-value: 6.83e-9. Immunohistochemical staining using an antibody targeting ABCD3 (HPA032026) shows a differential expression pattern in colon adenocarcinoma samples.

p<0.001
ABCD3 - survival analysis

ABCD3 - high expression

ABCD3 - low expression

DSC2 is another gene associated with a favorable prognosis in colon adenocarcinoma in two separate independent cohorts. The best separation is achieved by an expression cutoff at 12 TPM which divides the patients into two groups with 68% 5 year survival for patients with high expression versus 45% for patients with low expression, p-value: 8.02e-4. The TCGA data analysis was validated in a separate study with the p-value: 2.92e-4. Immunohistochemical staining using an antibody targeting DSC2 (HPA011911) shows a differential expression pattern in colon adenocarcinoma samples.

p<0.001
DSC2 - survival analysis

DSC2 - high expression

DSC2 - low expression

Table 2. The 20 genes with highest significance associated with a favorable prognosis in colon adenocarcinoma.

Gene Description Predicted location mRNA (cancer) p-value Prognostic
SORT1 Sortilin 1 Membrane, Intracellular 22.1 9.78e-6 potential
ARFIP1 ADP ribosylation factor interacting protein 1 Intracellular 20.8 1.74e-5 potential
USP53 Ubiquitin specific peptidase 53 Intracellular 7.7 2.41e-5 potential
SHQ1 SHQ1, H/ACA ribonucleoprotein assembly factor Intracellular 9.3 3.68e-5 potential
ZNF207 Zinc finger protein 207 Intracellular 78.7 6.06e-5 potential
Show allShow less

CPTAC relative protein expression data

Proteins that are significantly down- or upregulated in colon adenocarcinoma compared to normal tissue is illustrated in a vulcano plot using tandem mass tag (TMT) mass spectrometry data from the CPTAC dataset based on the analysis of 97 tumor samples and 100 normal samples.

In colon adenocarcinoma, 1923 and 1887 genes are down- (blue) and upregulated (red) compared to normal tissue, respectively. In Table 3, the top 20 most significant genes are listed.

Figure 1. Proteins highlighted in blue are significantly downregulated in cancer tissue, while those in red are significantly upregulated when compared to normal tissue. Gray points represent non-significant proteins based on the log2 (fold change). Wilcox rank test with adjusted p values.

Table 3. The 20 genes with the highest significance associated with a downregulated or upregulated protein expression in colon adenocarcinoma compared to normal tissue.

Gene Description Predicted location Log2 fold change p-value Regulation
NCAM1 Neural cell adhesion molecule 1 Secreted, Membrane, Intracellular -2.08 8.09e-34 Downregulated
SCGN Secretagogin, EF-hand calcium binding protein Intracellular -2.34 8.60e-34 Downregulated
CADM3 Cell adhesion molecule 3 Membrane, Intracellular -1.78 8.60e-34 Downregulated
ANK2 Ankyrin 2 Intracellular -1.2 8.60e-34 Downregulated
DDX18 DEAD-box helicase 18 Intracellular 0.81 8.87e-34 Upregulated
Show allShow less

The colon adenocarcinoma transcriptome

The transcriptome analysis shows that 67% (n=13454) of all human genes (n=20162) are expressed in colon adenocarcinoma. All genes were classified according to the colon adenocarcinoma-specific expression into one of five different categories, based on the ratio between mRNA levels in colon adenocarcinoma compared to the mRNA levels in the other 16 analyzed cancer tissues.

Figure 2. The distribution of all genes across the five categories based on transcript abundance in colon adenocarcinoma as well as in all other cancer tissues.

191 genes show some level of elevated expression in colon adenocarcinoma compared to other cancers (Figure 1). The elevated category is further subdivided into three categories as shown in Table 3.

Table 4. The number of genes in the subdivided categories of elevated expression in colon adenocarcinoma.

Distribution in the 31 cancers
Detected in singleDetected in someDetected in manyDetected in all Total
Specificity
Cancer enriched 0000 0
Group enriched 042531 96
Cancer enhanced 234545 95
Total 2761076 191

Additional information

Appropriate diagnosis and staging are crucial for determining the best choice of treatment. The surgical stage represents a classification system based on the extent and depth of tumor growth. Stage I colorectal cancer shows invasive growth into the anatomical layers of the the large intestine, but the tumor has not spread beyond the tissue of origin. Stage II colorectal cancer shows extended growth through the outer layer of the large intestine (peritoneum) and may have extended into nearby organs, but has not spread to any lymph node. Stage III colorectal cancer has spread to nearby lymph nodes but not yet metastasized to distant sites in the body. Finally, in Stage IV colorectal cancer the tumor has spread to distant organs such as the liver, lungs, or other sites. The Dukes classification is an older and less complicated staging system that predates the TNM system, and translates so that Duke A= Stage I, Duke B= Stage II, Duke C= Stage III and Dukes D= Stage IV.

Early colorectal cancer, where tumor spread is restricted to large intestine, is treated surgically and chemotherapy is used for more advanced stages where the tumor has spread to other organs. Anti-EGFR treatment is one recently introduced therapy. Epidermal growth factor receptor (EGFR) is commonly expressed in colorectal tumors and monoclonal antibodies inhibiting EGFR demonstrate clinical efficacy in patients with tumors that do not harbor downstream activating KRAS mutations. Today KRAS mutation status is analyzed routinely before starting anti-EGFR treatment.

The vast majority of colorectal cancer are adenocarcinomas, with less than 10% of the cancers being distinguished by an abundant secretion of mucin. The tumors are classified according to the degree of morphological differentiation into well, moderately and poorly differentiated. About 80% are well or moderately differentiated with a growth pattern consisting of tumor cells that form irregular glandular structures present at different layers of the bowel wall. Poorly differentiated colorectal cancer show no, or only slight, glandular formation. Generally poor differentiation is associated with poor prognosis, however there is no firmly established system for measuring the grade of differentiation. Therefore, treatment decisions are based on the surgical stage and not morphological features. Apart from adenocarcinomas, endocrine tumors can also arise within the colorectal mucosa. Squamous and adenosquamous tumors are exceedingly rare.

In addition to microscopical examination of biopsies, immunohistochemistry can be used to determine the colorectal origin of a metastasis or to visualize the spread of tumor cells in surrounding tissues. Tumors of colorectal origin are immunoreactive toward cytokeratin 20, CDX-2, SATB2 and cadherin-17. Chromogranin-A antibodies can be used to distinguish endocrine tumors in the bowel from adenocarcinomas.

Relevant links and publications

Uhlen M et al., A pathology atlas of the human cancer transcriptome. Science. (2017)
PubMed: 28818916 DOI: 10.1126/science.aan2507

Cancer Genome Atlas Research Network et al., The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet. (2013)
PubMed: 24071849 DOI: 10.1038/ng.2764

UhlĂ©n M et al., Tissue-based map of the human proteome. Science (2015)
PubMed: 25613900 DOI: 10.1126/science.1260419

Gremel G et al., The human gastrointestinal tract-specific transcriptome and proteome as defined by RNA sequencing and antibody-based profiling. J Gastroenterol. (2015)
PubMed: 24789573 DOI: 10.1007/s00535-014-0958-7

Histology dictionary - Colorectal cancer