The database of protein-protein co-crystallized structures is built on the basis of biological unit files from PDB
More information about biounit and data selection can be obtained
Identifying associated unbound structures
Unbound structures are separate structures that are also co-crystallized in a complex.
If the unbound structure is not found in PDB, it is simulated based on the bound one
using Langevin dynamics simulations (Kirys et al (2015), BMC Bioinformatics, 16:243).
- The structure of a complex is resolved by X-ray diffraction only.
- Obsolete PDB entries are excluded.
- Chains contain at least 20 residues.
A major problem in compiling representative databases of protein-protein complexes is the lack
of credible criteria for distinguishing complexes existing in vivo
from crystal packing artifacts. The in vivo complexes have to be strong enough to be
formed at the biological concentration of monomers with no help of the crystal lattice. However,
the experimental data reflecting these properties are not available in many
cases. In addition, the practical applicability of existing binding energy-estimating computational
procedures to systematic separation of "strong" (biologically relevant) and "weak" (artifacts of crystallization)
complexes is not obvious. However, functional considerations, including evolutionary
factors, may provide additional help in discriminating crystal packing complexes.
Constraints for generating representative datasets in the easy mode
- Sequence identity between complexes < 30%
- Homomultimers (sequence identity between chains > 70%) deleted