Build Database Docking Decoys Info

 
 


Data source

The database of protein-protein co-crystallized structures is built on the basis of biological unit files from PDB (Biounit files). More information about biounit and data selection can be obtained here.

Identifying associated unbound structures

Unbound structures are separate structures that are also co-crystallized in a complex. If the unbound structure is not found in PDB, it is simulated based on the bound one using Langevin dynamics simulations (Kirys et al (2015), BMC Bioinformatics, 16:243).

Algorithm

  • The structure of a complex is resolved by X-ray diffraction only.
  • Obsolete PDB entries are excluded.
  • Chains contain at least 20 residues.
Limitations

A major problem in compiling representative databases of protein-protein complexes is the lack of credible criteria for distinguishing complexes existing in vivo from crystal packing artifacts. The in vivo complexes have to be strong enough to be formed at the biological concentration of monomers with no help of the crystal lattice. However, the experimental data reflecting these properties are not available in many cases. In addition, the practical applicability of existing binding energy-estimating computational procedures to systematic separation of "strong" (biologically relevant) and "weak" (artifacts of crystallization) complexes is not obvious. However, functional considerations, including evolutionary factors, may provide additional help in discriminating crystal packing complexes.

Constraints for generating representative datasets in the easy mode

  • Sequence identity between complexes < 30%
  • Homomultimers (sequence identity between chains > 70%) deleted