E-pRSA - About

Datasets

Download training set
Download blind test sets used for benchmark
Download cross-validation split

Both the training and the blind test sets are available as tsv files containing five columns, including:

UniProt: The UniProt ID of the protein
PDB: The PDB chain that was used to compute RSA values
Pos: A progressive numbering of the residues
Res: The residue type
Class: Classification of the residues. 0: Buried residues (RSA < 20%). 1: Exposed residues (RSA >= 20%). -1: residues missing in the PDB file, thus lacking a computed RSA. -2: neighbours of residues belonging to the -1 class, thus having an RSA impossible to estimate correctly.
RSA: Computed RSA. Real number from 0 (completely buried) to 1 (maximally exposed) OR negative value with the same meaning of the class

In the Blind_Test_Sets folder, all the blind test sets used for benchmark (MM165, MM23, CASP12, CASP14) are available.

In the cross_validation folder, 11 text files (split0-split9 and test.txt) are available, each containing the protein IDs belonging to the corresponding subset used in cross-validation.