Inside UniProt

#UsingUniProt - DisCanVis interpreting genomic variation data

Norber Deutsch, Mátyás Pajkos, Gábor Erdős and Zsuzsanna Dosztányi

Department of Biochemistry, Institute of Biology, ELTE Eötvös Loránd University, Budapest, Hungary.

In recent years a wealth of information has become available about genetic variations that underlie various diseases, especially cancer. However, the interpretation of these variations is far from trivial. Many of the observed mutations do not directly contribute to the disease development or have no known function associated with it. Protein level information often proves to be essential to guide us towards biologically relevant cases.

Currently, genome and protein level information are available through distinct resources with very limited overlap. Databases, such as COSMIC or CbioPortal serve as entry points for accessing genetic variations (Tate et al. 2019; Gao et al. 2013). However, the central resource for protein level information is the UniProtKB database which provides a rich source of annotations about the structural and functional properties of proteins. In order to help the interpretation of genetic variations, researchers from the Dosztányi lab at the Eötvös Loránd University in Budapest developed a novel web-based visualization tool, called DisCanVis (http://discanvis.elte.hu), which can bring these two worlds together (Deutsch et al. 2023).

When we consider a genetic variation, one of the basic questions we can ask is whether the variation affects a known structure. This can be important to model the impact of the mutations and to find drug molecules that can compensate for the effect of mutations. However, we now know that a significant portion of the human genome encodes proteins whose native state cannot be characterized by a single well-defined structure (Dyson and Wright 2005). These so-called intrinsically disordered protein and protein regions (IDPs/IDRs) can only be represented by an ensemble of different conformations that cannot be accurately captured even by the recent AlphaFold method which achieved a significant breakthrough in predicting the structures of globular proteins (Jumper et al. 2021; Ruff and Pappu 2021) .

IDRs carry out important functions in many regulatory signaling processes. The key to this is their dynamic nature which enables them to form interactions that can be quickly turned on or off depending on cellular cues. Such interactions are also critical for many proteins that are known to be involved in cancer, such as p53. However, the direct role of IDRs in cancer is much less well-understood.

One of the main focuses of DisCanVis is to help the study the role of IDRs in cancer and to help to answer questions such as: What is the role of disordered regions in cancer proteins? In which cases do cancer mutations specifically target intrinsically disordered regions? What kind of annotations can be found for the mutated regions?

DisCanVis is built over 18,000 human proteins that are shared between COSMIC and Uniprot databases. The server combines cancer and other disease variations with protein level functional and structural annotations collected through the UniProt database. A key element of collected features is related to protein disorder and includes annotations of experimentally verified disordered regions, state-of-the-art prediction methods to reliably assess protein disorder and the locations of known functional regions within IDPs, such as short linear motif sites, disordered binding regions, and post-translational modifications. Altogether, more than 30 different features are collected and projected along the sequence for the visualization. Entries can be searched by UniProt accession number, protein or gene names.

Figure 1 (The concept behind DisCanVis.)

We present the usage of DisCanVis through the example of β-catenin (P35222). This protein is a key regulator of cell growth and survival through the Wnt signaling pathway. β-catenin is frequently mutated in various types of cancer, including colorectal, liver, breast, and lung cancers. Mutations are enriched in a short region within a disordered segment at the N-terminal part of the protein. The region corresponds to a β-TrCP binding motif, which, under normal conditions, is recognized by the β-TrCP E3 ligase which regulates the degradation of beta-catenin. The mutations of the binding motif interfere with the proper degradation of beta-catenin, resulting in its pathological accumulation in the cell, which can lead to the activation of genes that drive uncontrolled cell growth and tumor formation (Bugter, Fenderico, and Maurice 2021).

Figure 2 (DisCanVis visualization for β-catenin)

The visualization can be separated into three sections: The first section is the header which shows the full-length protein, giving an overview of the mutations and structural state of the protein along the sequence. Below that the genome level annotations are presented indicating mutations collected from various cancer samples and other disease mutations. This is followed by the protein level annotations, including known structures, domains, and the disordered specific annotations. Annotated functional sites are also indicated, including known short linear motifs. The yellow box indicates mutation hotspots which in this case highlights the β-TrCP binding motif and phosphorylation sites which are critical for cancer development.

In addition to inspecting individual proteins, users can also carry out analyses by focusing on specific subsets of proteins through pre-compiled tables. These can be sorted and filtered enabling users to collect examples with existing annotations of protein disorder and associated functions, or discover currently uncharacterized examples with likely disease relevance. For example, it is possible to browse tables of known cancer drivers, experimentally verified disordered proteins and known linear motif sites or to explore proteins with a given Gene Ontology term. Users can also find proteins with a given type of short linear motif with the largest number of disease mutations, or to find regions that are enriched in mutations in a yet unclassified cancer driver.

DisCanVis combines the wealth of information on genetic variations with the highly valuable annotations largely expertly curatedmanually collected in the UniProt database. Through this, hopefully it will prove to be a valuable tool for advancing our understanding of intrinsically disordered proteins and to gain important insights into their roles in cancer.

Further information about the usage and functionalities can be found either in the web server or in the publication which can be found here:

Protein Science Volume32, Issue1
https://doi.org/10.1002/pro.4522

REFERENCES

Bugter, Jeroen M., Nicola Fenderico, and Madelon M. Maurice. 2021. “Mutations and Mechanisms of WNT Pathway Tumour Suppressors in Cancer.” Nature Reviews. Cancer 21 (1): 5–21.

Deutsch, Norbert, Mátyás Pajkos, Gábor Erdős, and Zsuzsanna Dosztányi. 2023. “DisCanVis: Visualizing Integrated Structural and Functional Annotations to Better Understand the Effect of Cancer Mutations Located within Disordered Proteins.” Protein Science: A Publication of the Protein Society 32 (1): e4522.

Dyson, H. Jane, and Peter E. Wright. 2005. “Intrinsically Unstructured Proteins and Their Functions.” Nature Reviews. Molecular Cell Biology 6 (3): 197–208.

Gao, Jianjiong, Bülent Arman Aksoy, Ugur Dogrusoz, Gideon Dresdner, Benjamin Gross, S. Onur Sumer, Yichao Sun, et al. 2013. “Integrative Analysis of Complex Cancer Genomics and Clinical Profiles Using the cBioPortal.” Science Signaling 6 (269): l1.

Jumper, John, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, et al. 2021. “Highly Accurate Protein Structure Prediction with AlphaFold.” Nature 596 (7873): 583–89.

Ruff, Kiersten M., and Rohit V. Pappu. 2021. “AlphaFold and Implications for Intrinsically Disordered Proteins.” Journal of Molecular Biology 433 (20): 167208.

Tate, John G., Sally Bamford, Harry C. Jubb, Zbyslaw Sondka, David M. Beare, Nidhi Bindal, Harry Boutselakis, et al. 2019. “COSMIC: The Catalogue Of Somatic Mutations In Cancer.” Nucleic Acids Research 47 (D1): D941–47.

Wednesday, May 17, 2023