Pocket identification
We utilised the Cryo-EM spike protein structure (PDB code: 6VSB), as a source for our studies to locate the presence of “druggable” pockets where small molecules could bind. The protein atomic coordinates were uploaded to the DoGSiteScorer server [11] to identify potential pockets within the structure. The server provides a druggability score that ranges from 0 to 1, 1 being the most druggable.
Docking methods
Three approaches to docking the pharmaceutical agents into the identified druggable pocket in the Covid-19 spike protein were employed to act to corroborate the output produced. These were Autodock vina [12], Smina [13] and Ledock [14] . The centre position of the pocket was calculated and a search grid space, defined by those coordinates ±12 Å in the x, y and z coordinate directions from that centre, was created and supplied to each of the docking methods. By default in each of the packages used, the drugs were free to explore their rotamer space to optimise their binding into the pocket. However, whilst it is customary to allow side chain flexibility in the residues lining the pocket when undertaking in silico docking studies, given that the resolution of the structure used here was only 3.5 Å, no flexible side chains were defined.
The number of poses considered by all three methods used was left as default because this was thought to be optimal for this study. For Ledock, a maximum of 20 docking poses was returned per conformer, with an additional pose root mean square deviation (RMSD) cutoff of 1.0 Å applied to reduce redundancy of returned poses. For Autodock vina and Smina, a maximum of 9 poses with a maximum energy range of 3 kcal/mol between best and worst were returned.
A 1049-member library of compounds was derived from the Drugs-lib dataset available from the virtual screening webserver MTIOpenScreen [17] from an initial study we undertook on the S protein model from I-Tasser. This dataset consists of approved drugs and research chemicals refined from the “drug” subset of the ChEMBL database [18], the “approved” subset of DrugBank version 5.0.10 [19], the DrugCentral online compendium [20] and the “approved” SuperDrug2 database version 2.0 [21].
Initially Autodock vina and Ledock were the methods of choice as these have been considered as the best non_commercial docking packages available [22]. However, after pairing the poses between the two methods according to their structural similarity when in the docking site (as in the RMSD Calculations below) the corresponding docking scores showed only a weak correlation (0.36, p < 0.0001)) and so Smina, which is reported as having a different scoring scheme from Autodock vina [13], was used to see if this improved the correlation of the docking scoring to Ledock. In fact, the scoring correlation between Smina and Autodock vina proved to be very high and so only Autodock vina and Ledock data were used in the docking scoring as a result.
RMSD calculations
For a given drug, poses were obtained that fit the pocket. To establish the best superposition of these different poses from each method, firstly, pairwise RMSDs were calculated using the following equation:
$$ RMSD=\sqrt{\frac{1}{n}\sum \limits_{i=1}^n{\left({x}_i-{y}_i\right)}^2} $$
where xi and yi are comparable atoms from two of the methods, and n is the number of atoms in the given drug. The best overall superposition from the three methods was then calculated by taking the average of these pairwise RMSD superpositions. From these calculations, the poses from each of the methods with the best overall superposition, and hence, highest degree of similarity, would have the lowest average RMSD values (avRMSD).
Overall druggability analysis and ranking
An extensive amount of data was generated tabulating the output information from the docking packages of the druggable compounds with their structural characteristics. In generalised overview these correspond to the docking output scores from each of the methods, avRMSD values between the poses chosen for each compound, hydrogen bonding data, and the JOELib Native descriptor set, calculated using the ChemMine webserver [23], which include molar refractivity, polar surface area, and frequencies of atom and selected group types among other geometric and chemical properties. Drugs within the search group were also classified by chemical compound class and, where possible, by subclass through classification using the ClassyFire webserver [24].
A compound with a pose that scored highly in the two scoring regimes, and with good superposition agreement between docking methods, as determined from their avRMSD values, was considered more likely to be a meaningful data point. As these individual terms were on different scales, their significant feature was their explicit “order” rather than “value”. As a result each component, scoring value, and avRMSD, was ranked according to their percentile position within their individual scales. This therefore ordered the results by position rather than by value. Thus, the following combined score, ComboPC, was created to rank the compounds applying equal weight to the estimated docking scores from each docking method and to the agreement between the superpositions from each of the methods, as measured by avRMSD:
$$ ComboPC=\frac{1}{2}\left(\frac{\left( LedockPC+ vinaPC\right)}{2}+ RMSDPC\right) $$
Where LedockPC is the percentile rank of the docking score of the pose as docked by Ledock, vinaPC is the percentile rank of the docking score of the pose as docked by vina, and RMSDPC is the percentile rank of the avRMSD between the poses from each docking method. Only the scores from Ledock and vina, but not Smina, were chosen for inclusion in the ComboPC score as the scores from vina and Smina were found to be highly correlated (0.97, p < 0.001)).