Sequence-based Predictor of Protein-Protein Interactions

How to cite PPI-Detect:

S. Romero‐Molina, Y. B. Ruiz‐Blanco, M. Harms, J. Münch, E. Sanchez‐Garcia. PPI‐Detect: A support vector machine model for sequence‐based prediction of protein–protein interactions J. Comput. Chem. 2019, 1‐10. DOI: 10.1002/jcc.25780

The input of PPI-Detect web server:

To execute PPI-Detect, at least two sequences (FASTA format) most be provided by separate.

For example:

Sequences A

You can either "Enter a sequence(s)" or "Upload a file" with the lines:

Sequences B

You must provide a file with the sequences to combine, here PB and PC:

Then, the interaction likelihood will be computed for all the combinatorial pairs between the two sets of sequences:

The output of PPI-Detect web server:

A table with the next information for each protein-protein pair:

  • Instance: Name of the instance (protein-protein pair)
  • Prediction: The prediction of interaction likeliness
  • Score: The probability of occurring an interaction
  • Analysis of the projection of the predicted case into the applicability domain (AD) of the PPI model.
  • AD 1st-99th: The case is Out of the AD when at least a descriptor value is outside the range defined by the 1st and the 99th percentiles of the training data.
  • AD 100th: The case is Out of the AD when at least a descriptor value is outside the range of the training data for the PPI model.

Example files:



Download files examples

Example output:

The server shows next table, that summarizes all the information provided in the output files, plus a link to download them:

# Instance Prediction Score AD 1st & 99th AD 100th
0PF00189PF00163 Interaction0.578OutOut
1PF01248PF01599 Not interaction0.158OutOut
2PF01248PF01246 Not interaction0.183OutOut
3PF00163PF00281 Not interaction0.181OutOut
4PF00163PF01479 Not interaction0.379OutOut


  • 5th and 6th columns indicates if the sequence is within the applicability domain (AD) of the model.

PPI-Detect was built with a nonredundant benchmarking dataset of PPI gathered from three comprehensive, curated and publicly available databases. These databases contain information about pairs of protein domains with proven interactions (3did and iPfam), and domain pairs with very little chances of being involved in an interaction (Negatome 2.0).

We split the dataset into training and test sets. The interacting domains are the positive cases and the noninteracting domains are the negative cases.

Training: This subset includes 3491 pairs (1613 positive and 1878 negative). download

Testing: This subset includes 836 pairs of domains (309 positive and 527 negative).

To estimate the performance of the final model, we grouped the test data by degrees of difficulty:

  • Very hard subset. It gathers pairs of individual domains not present in the training data. It contains 103 domain pairs (57 positive and 46 negative). download
  • Mid-hard subset. It comprises domain pairs where only one of the domains is present in the training data. It contains 307 domain pairs (102 positive and 205 negative). download
  • Easy subset. It comprises pairs where both domains are present in the training data. It includes 426 domain pairs (150 positive and 276 negative). download

The files contain only the pairs of domains, to obtain the sequences click here.