E-pRSA

Embeddings improve the prediction of residue Relative Solvent Accessibility in protein sequence

Helper

From the Home page of the website, the user can submit one input sequence in FASTA format by pasting the text in the panel and pressing the button "Submit".
In order to be able to submit a job, the user must specify an ID line starting with a '>' character, followed by the protein sequence which must include from 50 to 5000 valid residues (ARNDCQEGHILKMFPSTWYVUXZB).
When the input has problems, the input box turns red and it will notify what it is necessary to change; when the box becomes green, the "Submit" bottom can be pressed.

To test the predictor, the "Load example FASTA" button allows the submission of a test sequence.
Alternatively, a FASTA file can be uploaded. Please note that the same requirements will apply and the "Submit" button will not be available if there is something wrong with the input.

When "Batch Predictions" is activated, a multiFASTA file containing up to 1,000 sequences (each no longer than 5,000 residues) can be uploaded.

Upon submission a new page is activated. The page automatically refreshes every 10 seconds and it displays the status of your job (either queued or running).
The user can bookmark the page (the results will be available at the same URL once they are ready) and save the Universally Unique Identifier of the job using the "Copy" button.

If you did not bookmark the page, you can use the Job ID from the Results page to search for your job at any time. Like in the Home page, you can only start the search with a valid id and you will be notified if it is not present in our database.

Reading the results

Once the results are ready, they will be made available to the result page. There are three main sections in this page:

  • Job information
  • On the top of the page some general information about the job are present, including the Job ID, the date of submission and completion, the protein ID, the protein length, the counts and percentages of exposed vs buried predictions, and the count and percentage of predicted interaction sites obtained with ISPRED-SEQ (a previously developed method).

    In case of a Batch Job, the number of proteins and total residues submitted are also shown, together with a button to download the results. The file will be in a tab-separated format including 6 columns: i) protein ID, ii) numbering of residues, iii) residue type, iv) binary classification, v) putative RSA, vi) putative Interaction Site (for exposed residues only).

  • Feature Viewer
  • Results are visualized with the neXtProt feature viewer. The first line displays the residues of the sequence; the second and third lines show the output of E-pRSA, respectively the putative RSA and the binary classification (Exposed or Buried). The last line shows predictions done with ISPRED-SEQ on putative exposed residues, highlighting residues that are likely Protein-Protein Interaction Sites.

    Hovering the mouse over the graphs, highlights the position of residues. By selecting a rectangular area, part of the sequence is zoomed. A right click on the graph resets the zoom back to normal. Alternatively, the buttons on the top-right of the viewer zoom-in, zoom-out, allow moving along the protein sequence or taking a screenshot of the selected area.

  • Data Tables
  • At the bottom of the screen, information relative to job is described. On the left side of the table there is a list of filters that can be select and combined for statistics. On the top-right of the table, pressing "Download TSV" results will be displayed in the table on a Tab-Separated-Value format file, with an additional column on the left side containing the protein ID. This can be useful in case of multiple file combination.

The Feature Viewer and the Data Tables are only shown for single sequence predictions.