Help

Help texts are available for all data pages via the "i" in the table headlines.

If you do not find the answer to your question in the available help text or the FAQ below, do not hesitate to contact us: contact@proteinatlas.org

Search

Specific information with examples on how to use the search function can be found on the search help page

Programmatic data access

If you want to programmatically access data more information can be found on the following page

FAQ

Data questions

General questions

Submission questions




How can I download your data?

There are numerous ways you can download data:

  • Most of the data (complete or partial) is available as CSV/TSV, RDF, or XML files in the downloadable data section.
  • Obtain a subset of the data as XML, RDF and TAB files containing data from the result of a search. Use the links located at the far right in the table header of the search result.
  • Fetch a single entry of data/images as a TSV, RDF, or XML file for one gene by URL: for instance https://www.proteinatlas.org/ENSG00000121410.xml

See the downloadable data section for more details.



Is the primary data based on manual evaluation of antibody staining available for download?

The protein scores of the specific cell types are available in the xml files.



Is it possible to save the result of a search?

The best way to save or download a specific search result from the Human Protein Atlas is to use the TAB-file option located at the far right in the table header of the search result list.



Is it possible to extract a list of a certain protein class such as all secreted proteins?

The list of specific protein classes as for instance predicted secreted proteins, can be found by using the protein atlas search: protein_class:Predicted+secreted+proteins. From the search result page you can then download the list of genes as tab separated values by clicking "TAB" located at the far right in the table header.



Is there an efficient way to download the subcellular images for a set of proteins?

There are some ways you can download the images (only .jpg) and all of them depend on that you write a script or setup some tool to parse the XML files that we provide on the site.

First obtain an XML file from any of the ways as described in the downloadable data section. The image link for subcellular images resides in the element for each antibody in the subAssay/data/cellLine and subAssay/data/assayImage/imageUrl elements. The image in the element is named after which channels that are toggled on and may include nucleus (blue), microtubuli (red), antibody (green) and endoplasmatic reticulum (yellow), in this order. Toggled off channels are left out of the filename according to this naming scheme.

Example:
<imageUrl size="large">
   http://www.proteinatlas.org/images/21616/193_D2_1_blue_red_green.jpg
</imageUrl>



Do you provide a parser script that we can use to convert the XML file into a more readable format?

No, unfortunately not. The easiest way to extract the information is probably to use XML transformation such as XSLT.



Can we look at your RNA-seq raw data?

TPM-tables of the RNA data can be found on our download page.
Due to GDPR legislation we do currently not provide human tissue RNA-Seq data in raw format. We are working on making the raw RNA-Seq data available through a secure federated EGA repository where access can be managed. Timelines for this is however uncertain.



Is it possible to use your data for commercial purposes?

Yes, please see the licence & citation page for full details.



Is it possible to obtain more detailed patient data?

Unfortunately not, the available data is already listed on the homepage (gender, age, diagnosis and grade, if known) and since all material is anonymized there is no possibility to find further information than what we are already showing.



I cannot find antibody/images/data I previously saw in the Human Protein Atlas, why?

We update the Human Protein Atlas regularly based on careful curation of the data. Curation involves, among other things, removal of data/antibody/images which do not meet our quality criteria. When possible we replace data/antibody/images with more recent improved versions.

Please visit Release history to find out in which version the antibody of your interest was removed or old images were replaced with more recent ones. You can access previous versions of the Human Protein Atlas by typing vX.proteinatlas.org in your web browser and replace X with the version relevant for you.



I am interested in buying one of the antibodies you have tested, where can I find it?

Start by looking at the antibodies used in the "antibody/antigen" tag; provider and product name are listed there, follow the provider-link and search for the product name (HPA antibodies are directly linked).



Can I get the protocol used for staining the antibody?

Yes, you can download the protocol for both IHC and IF.



How are the reliability scores set?

The guidelines for the reliability scores are specific for each sub-atlas. You can read more about the guidelines for the Tissue Atlas here and the Cell Atlas here. The reliability score is based on consistency with internal RNA-seq data, support from external sources (UniprotKB/Swiss-Prot), as well as staining pattern similarity between independent antibodies.



How is the protein expression scored?

Protein expression score is based on immunohistochemical data manually scored with regard to staining intensity (negative, weak, moderate or strong) and fraction of stained cells (<25%, 25-75% or >75%). Each combination of intensity and fractions is automatically converted into an protein expression level score as follows: negative - not detected; weak <25% - not detected; weak combined with either 25 - 75% or 75% - low; moderate <25% - low; moderate combined with either 25 - 75% or 75% - medium; strong <25% - medium, strong combined with either 25 - 75% or 75% - high. In addition to this, protein expression values are manually adjusted as necessary when evaluated by our expert annotators. More information about knowledge-based annotation is available in the Assays & annotation section.

An comprehensive summary of immunohistochemical primary data is available for each gene on the “Primary data page”, e.g. https://www.proteinatlas.org/ENSG00000254647-INS/tissue/primary+data



The stained image is brown, but the expression score says "not detected", why?

The images on the atlas are the raw data that show the immunohistochemical staining of each antibody. The expression score describes a knowledge-based best estimate of the true protein expression. This means that an expert has manually analyzed all images and assessed if the staining is likely to be the true protein expression or not. If the expert determined that the staining seen is not the true protein expression, the score will be set to "not detected".

More information about annotation of antibody staining is available in the assays & annotation section Immunohistochemistry - tissues: Annotation.

More information about knowledge-based annotation of protein expression is available in the assays & annotation section Immunohistochemistry - tissues: Knowledge-based annotation.



We would like to include images (results) from the Human Protein Atlas in our publication, is that possible?

Yes, the use of data and images from the Human Protein Atlas in publications and presentations is permitted as long as the licence & citation conditions are met.



How do I cite the Human Protein Atlas?

Please cite one of our publications as stated in the Data usage policy.



Are you able to provide tissue sample for testing?

No, we are not.



The gene that I am interested in is still missing protein expression data, when can that data be expected to be included in the database?

We usually release new data 1-2 times every year and are trying our best to continue exploring the uncharacterized targets.All genes have general gene information and RNA expression data . If you have questions regarding a specific gene please email contact@proteinatlas.org and we might be able to tell you if there are any unpublished results.



In the Tissue Atlas, some tissues are denoted 1 and 2, e.g. stomach 1 and stomach 2. What do the numbers mean?

  • Stomach 1 & stomach 2, endometrium 1 & endometrium 2, soft tissue 1 and 2:
  • For these tissues 1 and 2 simply means that instead of the usual three samples, there are six samples (2x3). For stomach and endometrium, six samples allows us to map variation in different parts of the stomach and in endometrium samples from different stages of the menstrual cycle to a greater extent compared to when only using three samples. There are two sets of soft tissue since the cell types annotated here are often difficult to sample and having six cores instead of three increases the possibility to include as many of the cell types as possible.

  • Skin 1 and skin 2:
  • There is difference between skin 1 and skin 2: skin 1 contain skin that has been exposed to sun, while skin 2 contains skin sample from the vulva or anal area. We analyze three samples for skin 1 and three samples for skin 2.



Are the tissue samples in the Tissue Atlas from healthy individuals?

The original tissue samples are not from healthy individuals; they are from a biobank that contains patient material. The parts of the samples used in our analyses are mainly from normal tissue surrounding the pathological tissue that has been removed from e.g. cancer patients. After we collected these samples from the biobank, they were scrutinized by a certified pathologist to make sure that they are classified as normal based on histology. This means that the images shown are histologically normal, but they are in many cases sampled close to a diseased area.



How do I interpret the Kaplan-Meier plots in the Cancer Atlas?

The Kaplan-Meier plot shows the probability of survival at different time-points e.g. the probability of surviving 5 years while having the disease. The plot describes that at the start, i.e. at time-point zero, 100% of patients were alive. Over time, with each death, the number of patients alive decreases, which leads to a drop in the curve. This in turn means that with each death the probability of survival for remaining patients decreases. So, those patients who have lived 10 years have lower probability of survival than those who have lived 5 years. In order to see if the expression of a certain gene affects the survival rate, the patient material has been divided into two groups, with high or low expression. If a gene is unfavourable, the group with highest expression will have lower survival rate, i.e. the curve will drop earlier. For more information about Kaplan-Meier plots, including how probability is calculated, we would recommend this article



My company is interested in submitting antibodies, how should we proceed?

Please read the Antibody submission section for details on how to submit antibodies. If you have any additional questions, please email contact@proteinatlas.org



Can researchers submit their antibodies for validation?

Yes, researchers and research teams are welcome to submit their in-house antibodies for testing. Academic collaborators are also able to ask for delayed release of the data and images on the public Human Protein Atlas portal if publication based on the results are being prepared. The decision will be made by the Human Protein Atlas priority committee and is normally not granted for longer than one year.