Development and assessment of scoring functions for protein docking requires a set of docking poses, some of which have to be close to the native structure, while the rest would be false-positive ones (decoys). Dockground provides such sets of docking decoys (scoring benchmarks) for experimentally determined and modeled structures. The first set of experimentally determined structures consists of 100 non-native and at least one near-native match (ligand RMSD to the native structure < 5Å) generated by GRAMM-X for 61 unbound complexes in the Dockground docking benchmark 2.
A larger decoy set 2 is derived from 396 unbound complexes in the Dockground docking benchmark 4. To eliminate bias in testing of the scoring functions due to clustering and distribution of the energy values, we avoided the clustering of the decoys and made sure their energies/scores were similar to those of the near-native matches. For each complex, we generated 300,000 low-resolution docking predictions by GRAMM. Since the scoring benchmark is supposed to assess users' scoring functions, the matches were unscored/unrefined, ranked by the shape complementarity alone, as implemented in the GRAMM global scan. The unbound structures for 43 complexes did not yield near-native matches and were excluded from the set. The 99 incorrect docking matches, with ranking similar to the near-native one, were selected with a maximally spread spatial distribution by an iterative procedure based on the angles between vectors connecting centers of mass of the receptor and the ligand in the predicted poses. The incorrect models were selected among similarly ranked positions around the near-native match and outside 5° angle with the vectors of any other selected match. If the full set of incorrect matches was not selected in the first iteration, in the next iteration, the sub-list around the near-native match was further expanded by 50 positions and the minimum allowed angle between the vectors was halved. For 323 complexes, the full set of decoys was selected in the first iteration.
Experimentally determined protein structures account only for a fraction of proteins. Most proteins in docking studies will themselves be models, with generally lower structural accuracy. The Dockground docking decoy set of protein models was generated for models with various degrees of structural accuracy. For each protein-protein complex from Dockground models benchmark set 2.0, 99 incorrect and one near-native (acceptable or better by the CAPRI criteria) docking matches were generated by GRAMM, for each accuracy level of the individual protein structures.