Structure - Methods summary

Summary

The Structure section contains information about the three-dimensional structure of human proteins. Interactive 3D protein structures based on predictions generated using the AlphaFold source code are shown for most human proteins and their related isoforms. The Protein Browser tool can be used to select among the different isoforms and display protein related features such as known antigen sequences, transmembrane regions and InterPro domains on the structures. The amino acid positions of population variants and variants with known clinical relevance in the Ensembl Variation database as well as benign and pathological missense variants predicted by AlphaMissense can also be displayed.

Key publications

Jumper J et al. (2021) "Highly accurate protein structure prediction with AlphaFold" Nature 596(7873):583-589.

Varadi M et al. (2022) "AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models" Nucleic Acids Research 50(D1):D439-D444.

Cheng J et al. (2023) "Accurate proteome-wide missense variant effect prediction with AlphaMissense" Science Vol 381, Issue 6664

What can you learn from the Structure section?

Learn about:

  • the predicted 3D structure of proteins and their related isoforms
  • the antigen structure for the majority of the antibodies
  • the predicted structure of selected protein features
  • the known and predicted missense variants with clinical significance
  • the known population variants and predicted benign missense variants

How has the data been generated?

The predicted 3D protein structures have been generated in-house based on the AlphaFold source code developed by Deepmind. The AI-system Alphafold is a machine learning approach in which the primary amino acid sequence and aligned sequences of homologues together with physical and biological knowledge about related protein structures are incorporated into the design of a deep learning algorithm to directly predict the 3D structure of a protein. Structure predictions have been made for all protein isoforms that contain exclusively standard amino acids and have a length between 6 aa and 4000 aa (the script is available here).

Missense variant data is included from two sources. The known population and clinical variants data is incorporated from the Ensembl Variation database, with clinical relevance being based on the clinical significance terms "pathogenic" and "likely pathogenic". The predicted benign and pathological missense variants are integrated from AlphaMissense with "likely benign" variants being defined as having a pathogenicity score <0.05 and "likely pathogenic" as having a pathogenicity score >0.95.

All structures are displayed using the NGL Viewer.

What is presented in the section?

In the Structure part of the gene summary page predicted 3D protein structures can be displayed and explored. The structures for the different protein isoforms can be selected and displayed by clicking on the transcript names on top of the ProteinBrowser tool, next to which the schematic structure of the different splice variants including exons, introns and UTRs are displayed. By clicking on e.g. antigens, InterPro domains and membrane regions in the schematic protein view in the ProteinBrowser, these features will be highlighted in the 3D structure. The sliders beside the structure allow highlighting of positions for different kinds of missense variants as well as coloring of the structure according to the AlphaFold confidence measure or residue index.