E-pRSA

Embeddings improve the prediction of residue Relative Solvent Accessibility in protein sequence

About this method

E-pRSA is a method for predicting the Relative Solvent Accessibility of residues in a protein chain without requiring previous knowledge of the 3-dimensional structure.
The target sequence is first processed by two different and complementary PLMs, ProtT5 and ESM2, to generate a concatenated vector of 1280+1024=2304 features for each residue. A sliding window of 31 residues is then processed by the network, consisting of a Convolutional layer with 2304 filters followed by a stack of 3 dense networks. The output consists of a single value between 0 and 1, representing the putative RSA of the residue. A threshold of 20% is also adopted to distinguish Buried and Exposed residues.
The training dataset, including 6,552 proteins, was split into 10 equally sized subsets for performing a 10-fold cross-validation. Proteins included in two different subsets share less than 25% sequence identity over a minimum of 40% coverage. The blind test sets are structural similarity-reduced with respect to the training datasets of all methods included in the benchmark.

How to cite

E-pRSA: Embeddings Improve the Prediction of Residue Relative Solvent Accessibility in Protein Sequence
Matteo Manfredi, Castrense Savojardo, Pier Luigi Martelli, Rita Casadio
https://doi.org/10.1016/j.jmb.2024.168494