The breast invasive carcinoma proteome

Breast cancer is the most common invasive cancer form in women worldwide and the leading cause of cancer-related mortality in women. The global age-adjusted incidence rate for breast cancer is 124 per 100,000 women per year. Male breast cancer is exceedingly rare and accounts for around 1% of cases. Although the rate of breast cancer diagnosis increased during the 1990's, it has decreased since the year 2000 and the overall breast cancer death rate has dropped steadily in the western world. The majority of breast cancers develop sporadically, but for 5-10% of patients there is a hereditary component. The most well known genes associated with increased breast cancer risk are BRCA1 and BRCA2. Women with abnormal BRCA1 or BRCA2 experience up to a 60% risk to develop breast cancer by the age of 90. Other risk factors include early menarche and late menopause. Pregnancy has been reported to decrease risk, probably due to the changes in breast tissue.

Breast cancer forms in tissues of the breast, usually the ducts (tubes that carry milk to the nipple) and lobules (glands where the milk is produced). Based on the presumed site of origin and morphology, breast cancer is broadly classified as ductal or lobular cancers.

Here, we explore the breast invasive carcinoma proteome using TCGA transcriptomics data and antibody-based protein data. 990 genes are suggested as prognostic based on transcriptomics data from 1022 patients; 562 genes are associated with unfavorable prognosis and 432 genes are associated with favorable prognosis.

TCGA data analysis

In this metadata study, we used data from TCGA where transcriptomics data was available from 1022 patients in total. The dataset included 1010 females and 12 males. Most of the patients (881 patients) were still alive at the time of data collection. The stage distribution was stage i) 175 patients, stage ii) 574 patients, stage iii) 233 patients, stage iv) 19 patients, and 20 patients with missing stage information.

Unfavorable prognostic genes in breast invasive carcinoma

For unfavorable genes, higher relative expression levels at diagnosis give significantly lower overall survival for the patients. There are 562 genes associated with an unfavorable prognosis in breast invasive carcinoma, among these potential prognostic genes there are 12 genes that were validated in a separate study. In Table 1, the top 20 most significant genes related to an unfavorable prognosis are listed.

PTGES3 is a gene associated with an unfavorable prognosis in breast invasive carcinoma in two separate independent cohorts. The best separation is achieved by an expression cutoff at 288 TPM which divides the patients into two groups with 77% 5-year survival for patients with high expression versus 85% for patients with low expression, p-value: 1.32e-4. The TCGA data analysis was validated in a separate study with the p-value: 7.01e-4. Immunohistochemical staining using an antibody targeting PTGES3 (HPA038673) shows a differential expression pattern in breast invasive carcinoma samples.

p<0.001 0.00.10.20.30.40.50.60.70.80.91.0 0 1 2 3 4 5 6 7 8 91011121314151617181920212223
PTGES3 - survival analysis

PTGES3 - high expression

PTGES3 - low expression

PHAX is a gene associated with an unfavorable prognosis in breast invasive carcinoma in two separate independent cohorts. The best separation is achieved by an expression cutoff at 13 TPM which divides the patients into two groups with 79% 5-year survival for patients with high expression versus 85% for patients with low expression, p-value: 1.54e-4. The TCGA data analysis was validated in a separate study with the p-value: 4.56e-4. Immunohistochemical staining using an antibody targeting PHAX (HPA070326) shows a differential expression pattern in breast invasive carcinoma samples.

p<0.001 0.00.10.20.30.40.50.60.70.80.91.0 0 1 2 3 4 5 6 7 8 91011121314151617181920212223
PHAX - survival analysis

PHAX - high expression

PHAX - low expression

Table 1. The 20 genes with highest significance associated with an unfavorable prognosis in breast invasive carcinoma.

Gene
Description
Predicted location
mRNA (cancer)
p-value
Prognostic
SLC35A2 Solute carrier family 35 member A2 Membrane, Intracellular 46.5 4.64e-7 potential
LRP11 LDL receptor related protein 11 Membrane, Intracellular 22.5 6.75e-7 potential
AIMP1 Aminoacyl tRNA synthetase complex interacting multifunctional protein 1 Secreted, Intracellular 44.7 8.31e-7 potential
PCMT1 Protein-L-isoaspartate (D-aspartate) O-methyltransferase Membrane, Intracellular 67.8 1.98e-6 validated
MCTS1 MCTS1 re-initiation and release factor Intracellular 5.7 3.33e-6 potential
Show allShow less

Favorable prognostic genes in breast invasive carcinoma

For favorable genes, higher relative expression levels at diagnosis give significantly higher overall survival for the patients. There are 432 genes associated with a favorable prognosis in breast invasive carcinoma, among these potential prognostic genes there are 3 genes that were validated in a separate study. In Table 2, the top 20 most significant genes related to a favorable prognosis are listed.

KYAT3 is a gene associated with a favorable prognosis in breast invasive carcinoma. The best separation is achieved by an expression cutoff at 16 TPM which divides the patients into two groups with 87% 5-year survival for patients with high expression versus 73% for patients with low expression, p-value: 5.21e-5. Immunohistochemical staining using an antibody targeting KYAT3 (HPA027168) shows a differential expression pattern in breast invasive carcinoma samples.

p<0.001 0.00.10.20.30.40.50.60.70.80.91.0 0 1 2 3 4 5 6 7 8 91011121314151617181920212223
KYAT3 - survival analysis

KYAT3 - high expression

KYAT3 - low expression

Table 2. The 20 genes with highest significance associated with a favorable prognosis in breast invasive carcinoma.

Gene
Description
Predicted location
mRNA (cancer)
p-value
Prognostic
CXCL2 C-X-C motif chemokine ligand 2 Secreted 2.9 8.78e-5 validated
RPL27A Ribosomal protein L27a Intracellular 2445.2 1.97e-4 potential
SAV1 Salvador family WW domain containing protein 1 Intracellular 20.1 5.12e-4 potential
NTRK2 Neurotrophic receptor tyrosine kinase 2 Membrane, Intracellular 12.4 6.64e-4 potential
SOCS3 Suppressor of cytokine signaling 3 Intracellular 31.8 6.73e-4 potential
Show allShow less

The breast invasive carcinoma transcriptome

The transcriptome analysis shows that 72% (n=14456) of all human genes (n=20162) are expressed in breast invasive carcinoma. All genes were classified according to the breast invasive carcinoma-specific expression into one of five different categories, based on the ratio between mRNA levels in breast invasive carcinoma compared to the mRNA levels in the other 16 analyzed cancer tissues.

Figure 1. The distribution of all genes across the five categories based on transcript abundance in breast invasive carcinoma as well as in all other cancer tissues.

226 genes show some level of elevated expression in breast invasive carcinoma compared to other cancers (Figure 1). The elevated category is further subdivided into three categories as shown in Table 3.

Table 3. The number of genes in the subdivided categories of elevated expression in breast invasive carcinoma.

Distribution in the 31 cancers
Detected in singleDetected in someDetected in manyDetected in all Total
Specificity
Cancer enriched 1414103 41
Group enriched 037303 70
Cancer enhanced 8325322 115
Total 22839328 226

Additional information

The staging of breast cancer is based on the presence of local and/or distant spread. Localized, disease (Stage I) comprises approximately 60% of cases, while in about 5% the cancer has spread to distant organs such as the liver and bone (Stage IV). Approximately 35% are Stage II or III, indicating tumor spread to regional lymph nodes.

All breast cancers may be differentiated histologically into three grades utilizing the Nottingham Grading System (NGS), also termed the Elston-Ellis grading system, by evaluating three tumor parameters. Parameters evaluated in this system are (i) extent of tubular differentiation, (ii) nuclear pleomorphism and (iii) mitotic activity assessed by counting mitotic figures in ten high power fields. Each parameter is given a score of 1 to 3 and the score of all three components are added together to a final score e.g. 1+1+1=3. The lowest final scores of 3, 4 and 5 represent well-differentiated tumors (Grade I) associated with better survival. The highest possible score is 9 (3+3+3=9) reflecting a poorly differentiated (Grade III) tumor associated with poor overall survival. In a large proportion of breast cancer, precursor lesions such as intraductal carcinoma are present adjacent to the invasive component of the tumor. Such regions of non-invasive cancer are denoted as cancer in situ and are important to recognize in the diagnostic procedure.

Immunohistochemistry is used routinely on all breast cancers to gain important information about the prognosis as well as for predicting response to specific anticancer therapies. The most commonly used antibodies include antibodies detecting the estrogen a receptor (ER, ESR1), progesterone receptor (PR, PGR), HER-2 (ERBB2) and the proliferation marker Ki-67 (MKI67). The tumor stage and grade, as well as results from immunohistochemistry, are used to personalize treatment options.

Relevant links and publications

Uhlen M et al., A pathology atlas of the human cancer transcriptome. Science. (2017)
PubMed: 28818916 DOI: 10.1126/science.aan2507

Uhlén M et al., Tissue-based map of the human proteome. Science (2015)
PubMed: 25613900 DOI: 10.1126/science.1260419

Tao Z et al., Breast Cancer: Epidemiology and Etiology. Cell Biochem Biophys. (2015)
PubMed: 25543329 DOI: 10.1007/s12013-014-0459-6

Key TJ et al., Epidemiology of breast cancer. Lancet Oncol. (2001)
PubMed: 11902563 DOI: 10.1016/S1470-2045(00)00254-0

Histology dictionary - Breast cancer