ISPRED-SEQ
A Deep-learning based method for the prediction of Interaction Sites starting from protein sequence.
Part of the Bioinformatics Sweeties collection.
About this method
ISPRED-SEQ is a method for predicting the occurrence of Interaction Sites on a protein starting from its sequence.
The input sequence is firstly embedded using ProtTrans T5 and ESM-1v in order to produce a vectorial representation of 2304 features for each residue. This entirely substitutes the need for traditionally hand-crafted features such as physico-chemical properties of the residues or sequence profiles derived from multiple sequence alignments.
Our model consider for each prediction a window of 31 residues centered on the residue of interest. The architecture is composed of a 1-Dimensional Convolutional layer that reduces each features from a window to a single value, followed by three cascading Dense layers with 128, 32, 1 output neurons respectively. The output of the last layer is then used to perform the prediction and to reconstruct the Reliability Index.
After a rigorous training session using a 10-fold cross-validation to decide the hyper-parameters of the best model, we then tested its performance on a blind test set, achieving 0.34 Matthews Correlation Coefficient and outperforming similar methods.
How to cite
ISPRED-SEQ: Deep neural networks and embeddings for predicting interaction sites in protein sequences
Matteo Manfredi, Castrense Savojardo, Pier Luigi Martelli, Rita Casadio
https://doi.org/10.1016/j.jmb.2023.167963