Wednesday, October 15, 2014

Introducing Annotation Scores in UniProt

We are pleased to introduce you to annotation scores on the UniProt website! We have recently started providing annotation scores for all UniProtKB entries. Annotation scores are a five point heuristic score. An annotation score of 5 points is associated with the best-annotated entries, and a 1-point-score denotes an entry with rather basic annotation. A 5-point annotation score would look like:



Annotation scores can help you quickly gauge the annotation content in a protein entry. For example, you could see which is the best-annotated protein in a family. We hope the scores will be useful in helping you narrow down to your entries of interest.

You can view annotation scores in the ‘Status’ line on all UniProtKB protein entry pages, as shown below.



You can also add annotation scores to your search results table through the ‘Columns’ button.


How are they used?

There are several contexts in which annotation scores can be used:
  • UniProtKB
    The annotation scores can help you to get a quick idea of the relative level of annotation of the entries in your search results. Please note that search results are not ranked by the annotation score, but by a query score that considers not only the annotation scores of the entries that match your query, but also how often (and where) your query term(s) appear in a matching entry and across the whole database, and the importance of a term according to the total number of terms. For this reason, the best-ranked entries are not necessarily those with the highest annotation scores.
  • UniRef
    We will be using annotation scores to select the representative member of a UniRef cluster.

How are they computed?

  • Different UniProtKB annotation types (e.g. protein names, gene names, functional annotations (comments) and sequence annotations (features), GO annotations, cross-references) are scored either by presence or by number of occurrences. Annotations with experimental evidence score higher than equivalent predicted/inferred annotations, thereby favoring expert literature-based curation over automatic annotation.
  • The score of an individual entry is the sum of the scores of its annotations.
  • The score of a proteome is the sum of the scores of the entries that are part of the proteome.

Next time you’re looking at a UniProt protein, look out for annotation scores. We welcome your feedback. Would you apply these scores in your work? Would you like to see them in your UniProtKB search results by default? Write in and let us know!