Literature included in SPRD
Experimental details in the SPRD were assembled as 5500 entries in the database from a library of 5140 manuscripts containing SPR studies. The library was compiled through using key words in different search engines. We used “Biacore” keyword in PubMed (https://pubmed.ncbi.nlm.nih.gov/). We also included manuscripts that were evaluated in five review articles on SPR [26,27,28,29,30]. These searches resulted in ~ 8200 publications. Review articles and papers with theoretical data without experimental validation were excluded. Manuscripts that did not describe SPR experiments with enough detail were also excluded. Only the manuscripts written in English were included. We were limited with availability of full text manuscript based on subscription list of Georgetown University library. Therefore, the final number of manuscripts we could include in the library was 5140 in November 2020.
Data entry
Each manuscript in the library was read by one of the authors and technical details about the SPR experiment were entered and managed in Research Electronic Data Capture (REDCap) electronic data capture tool hosted at Georgetown University [31, 32]. If available, the following details were recorded for each manuscript; name of the ligand(s), name of the analyte (s), ligand’s protein tag, analyte’s protein tag, ligand class (protein, DNA, RNA, SM etc.), analyte class (protein, DNA, RNA, SM etc.), type of sensorchip, name of SPR instrument, ligand immobilization or capture method, immobilization or capture buffer, immobilization or capture running buffer, ligand immobilization or capture level, kinetics running buffer, kinetics regeneration solution, ka, kd, and KD. If ligands were captured using intermediate molecules immobilized on the sensor surface first, we also captured the name, immobilization method, immobilization buffer, immobilization running buffer, and immobilization level of the intermediate molecule. For each set of data entry, we provided the PubMed unique identifier number (PMID) and a hyperlink to it. If there were more than two entirely different interaction types found in the same publication, we captured each information as an independent entry. Therefore, the SPRD has 5500 entries from the library of 5140 different publications. If the publication reported different mutants of the same ligand or analyte, we only captured information related to the wild-type form.
Utility and discussion
SPR experiments can be designed to get binary results to evaluate if a ligand directly binds to an analyte or not. They can also be designed to screen analyte libraries composed of SMs, peptides or oligos. A significant number of experiments seek to identify the kinetics parameters (ka, kd, KD) of the specific analyte-ligand interaction. Both the binary results and the kinetics parameters determined by any SPR experiment may significantly vary depending on buffer conditions, chip types, and ligand immobilization procedures. For example, the KD values of interactions between wilt-type Eap45 GLUE domain and ubiquitin were found to be ~ 411 μM and ~ 261 μM in phosphate and tris buffers, respectively [19]. Likewise, the KD value for Factor H (fH) binding to C3b was found to be about three times higher using a CM5 chip (~ 2.2 μM) when compared to using a C1 chip (~ 0.7 μM), using the same ligand coupling chemistry and buffer condition [22]. Unlike CM5, the C1 chip does not have a dextran matrix coated on the sensor surface [33]. It is to be noted that the lower ligand immobilization level used for the C1 chip (140 RU) as compared to the CM5 chip (384 RU) might also be the contributing factors for the change in the KD values in this study [22]. Moreover, when PD-1was immobilized on a CM5 chip by amine coupling [23] or biotin-tagged PD-1 was captured on a streptavidin (SA) coated chip [24], calculated KD value for binding to PD-L1 changed approximately 28 fold (~ 0.9 μM [23] vs. ~ 25 μM [24]). In this example, difference in the sources of interacting partners (PD-1 and PD-L1) may have contributed to the difference in KD values too. There is a variation in reported KD values for antigen-antibody interactions when immobilization method of antibody (ligand) using standard amine coupling chemistry changed to the capture of the antibody (ligand) using a secondary antibody immobilized on the same chip surface [34]. Therefore, the results from the same ligand and analyte interaction can be completely different based on simple experimental conditions. A user who wants to start a new SPR experiment for an analyte-ligand pair faces a number of choices for the chip, capture method, and buffer conditions. Testing all possible combinations of these factors is not practical and not feasible in some cases. Therefore, we decided to generate the SPRD to assist future users of SPR technology to take full advantage of the collective knowledge in SPR literature.
When it was clearly stated in the original publication, we recorded the following details related to SPR experiments in the SPRD:
Reference
We present the title of the publication, PMID and the hyperlink to PubMed for each data entry. SPRD users can reach to the original publication to see the entire details of the SPR experiments and relevant information on the source of key materials.
Ligand name
The name of the ligand was entered as it appeared in the original paper. If the ligand is a protein, its alternative names can be searched in GeneCards hyperlink (www.genecards.org) that is provided in the SPRD results page. If the ligand name was not provided in the original manuscript, the entry was listed under “Undefined”.
Other ligands used
If a publication had multiple ligands that were immobilized in the same experiments using a similar coupling chemistry, we captured their names in the same record. However, the SPRD does not have detailed kinetics information for each ligand. We recommend users to follow the reference for details.
Analyte name
This corresponds to the name of the analyte that was flowed over the ligand-immobilized surface. The database entry has subsequent information related to the binding of this single analyte to the ligand for each entry. If the analyte is a protein, its alternative names can be searched in GeneCards hyperlink (www.genecards.org) that is provided in the SPRD. If the analyte name was not provided in the original manuscript, the entry was listed under “Undefined”.
Other analytes used
If a publication presented multiple analytes that were also flowed over the same ligand-immobilized surface, we captured their names. The SPRD does not have detailed kinetics information for each analyte. We refer users to the original manuscript for details.
Ligand tag
Some ligands had tags for easier purification or immobilization. Commonly used protein purification tags are 6xHis, GST, FLAG and Mcy. When they were used, their names were included under this category.
Analyte tag
They were captured similar to ligand tags.
Ligand class
Each ligand was assigned to class of molecules, which included protein, DNA, RNA, SM, peptide, and antibody. If the class of the ligand did not fit any of these, it was entered as “other”. If the ligand class was not provided in the original manuscript, the entry was listed under “Undefined”.
Analyte class
Analyte classes were captured similar to ligands. If the analyte class was not provided in the original manuscript, the entry was listed under “Undefined”.
Sensorchip used
We recorded the type of sensorchip used in the SPR experiments as mentioned in each publication. The types of sensorchips included C1, CM3, CM4, CM5, CM7, HPA, L1, NTA, PEG, and SA chip. Since majority of the experiments were using Biacore instruments, the chip categories were matched to its manufacturer’s nomenclature. If the type of the sensorchip was not any of the types mentioned above, we entered the chip type(s) as “other”. If the chip information was not provided in the original manuscript, the entry was listed under “Undefined”.
Instrument used
We recorded the name of the commercial biosensor instrument that was used to run the SPR experiment.
Immobilization or capture method
If the immobilization or capture method was mentioned in the publication, we recorded that information. These methods included, immobilization using the amine coupling chemistry (amine coupling), the capture of His-tagged ligand on a nickel chelated NTA surface (Ni2+-NTA capture) or anti-His antibody immobilized surface (anti-His capture), the capture of GST-tagged ligands on an anti-GST antibody immobilized surface (anti-GST capture), the capture of biotin-tagged ligands on a streptavidin- or neutravidin-coated surface (biotin-capture), immobilization using thiol coupling chemistry (thiol coupling), immobilization using maleimide chemistry (maleimide coupling), and capture of ligands on ligand specific antibody immobilized surface (antibody capture). If the immobilization or capture method mentioned in the publication was not any of the above-mentioned methods, we recorded that information as “other”. If the capture method was not explained in the original manuscript, the entry was listed under “Undefined”.
Immobilization or capture buffer
We recorded the composition and pH information of the buffer in which the ligand was diluted for immobilization step.
Immobilization or capture running buffer
We recorded the composition and pH information of the buffer, which runs in the background during immobilization or capture of the ligand. In many experiments, this buffer was different from the immobilization or capture buffer in which the ligand was diluted directly.
Immobilization or capture level
We recorded the response amplitude obtained when ligands were immobilized or captured on the sensor surface. The unit of this amplitude varies depending on the SPR-based biosensor instrument. In Biacore instruments, which were the most common tool, it was response unit (RU).
Intermediate molecule
In some experiments, ligands were captured on the surface on which another molecule was already immobilized. For example, in anti-His capture of His-tagged ligand, the anti-His antibody was first immobilized on the chip surface and then the His-tagged ligand was flowed through. Here the anti-His antibody was considered as the intermediate molecule. If this molecule was required in a particular experiment and based on the availability of the information in the publication, we recorded class, immobilization method, immobilization buffer and pH, immobilization running buffer and pH, and immobilization level for the intermediate molecule as we have done that for ligands.
Kinetics running buffer
This is the buffer that was used to dilute analyte before it was injected into SPR instruments and the same buffer that runs in the background. We recorded the composition and pH of the kinetics running buffer.
Kinetics regeneration solution
Very strong analyte-ligand interactions have small kd values, which indicates a very slow dissociation of analyte form the ligand. Since the sensor surface has to be regenerated for the next cycle of the analyte injection, a regeneration solution is used to dissociate the analyte-ligand complex without damaging the activity of the ligand on the chip surface [25]. We recorded the composition and pH of the regeneration solution.
ka, kd, and KD
We recorded the association rate constant (ka or kon), dissociate rate constant (kd or koff), and the equilibrium dissociation constant (KD) of the interactions between the ligand and analyte.
SPRD web portal
We designed and implemented a publicly accessible and searchable web portal available to query current resources and information in the collected and curated data repository based on different experimental factors and data elements captured in the database. The portal features additional capabilities that include:
-
Interactive dashboards
-
Search based on the recorded features (ligand, analyte, chip used, immobilization method used or reference)
-
Access publications in PubMed – search based on regular expression, matching characters or PubMed’s Identifier (PMID)
-
Submit and report errors (forms)
-
Submit new entries – for the team to assess and validate before making them public.
Figure 1 illustrates the workflow and different components developed and implemented for the project. The portal can be accessed via http://www.sprdatabase.info.
Data collection
To collect and manage SPRD data entries, we designed and implemented data collection instruments hosted and managed in the Georgetown REDCap based system, a secure, web-based application designed exclusively for building and managing online surveys, study data management and monitoring for research studies. REDCap was developed by Vanderbilt University and the REDCap Consortium, a collection of 4683 institutions from 139 countries currently utilizing the software and contributing to its continuing enhancement and maintenance [35]. REDCap is an easy-to-use, and secure method of flexible yet robust data collection. We manually entered and curated data from 5140 manuscripts found in PubMed and implemented a capability for the public to submit new entries that will be validated by the team before publication in the portal.
We developed Extract, Transform, and Load procedures to copy data from REDCap entries and PubMed sources into a cloud based destination database, which represents the data in a structured searchable format hosted in the Google Cloud Platform (GCP). We leveraged PyMed [36], a Python library that provides access to PubMed through the PubMed API using Python, to retrieve data and links from PubMed.
All SPRD data are hosted and managed in the Georgetown University Cloud-based Virtual Research Environment (VRE), leveraging the Google Cloud Platform (GCP) for provisioning computing resources, securely storing and sharing data. The VRE was designed and developed to overcome barriers met by the research community while complying with institutions’ policies and current state and federal policies and regulations. It is a multi-mission platform that can facilitate the advancement of science, education, and services and will enable the SPRD and investigators to participate in and share data, information and knowledge with the community and research networks. Users can visit www.sprdatabase.info to access the database for the information discussed above. The webpage, especially different tabs within the blue stripe at the left side of the webpage, guides the users to access information included in the database. Clicking “more here” in the search results that is obtained by accessing “Search the database” tab provide access to additional details about the particular entry. At the time of this manuscript submission, we recorded information from 5500 REDCap database entries. We will be updating the database by adding new publications in the future and we encourage users to submit their data as well.
The 5500 entries we collected from 5140 publications so far had a wide range of experimental conditions, which were most likely optimized for the specific analyte-ligand conditions. CM5 was the mostly used chip (~ 60%) as compared to other types of sensorchips (Fig. 2a). Amine coupling chemistry was the most preferred method (~ 48%) of ligand immobilization (Fig. 2b). This observation indicates that CM5 chip and amine coupling chemistry were the most commonly used sensorchip type and ligand coupling chemistry, respectively. We also observed that “Biacore instruments” were the most frequently used SPR instruments (~ 88%) and the “Biacore 3000” was the most commonly used instrument (~ 40%). Moreover, “protein” was the leading class of ligand (~ 58%) and analyte (~ 56%) as compared to other ligand and analyte classes.
Future work
As we launch SPRD with an initial repository of 5500 complete and curated data entries, we anticipate it to grow into a much larger, freely available and trusted knowledge base with up to date information. The envisioned system will require the development of a framework that enables custom design and can scale to accommodate additional workflows, new potential data sources and collaboration options a custom developed web application coupled with a database system can enable new advanced features to design and implement custom developed forms, visualization and reporting capabilities.
Roadmap and future development will consider the automation of data ingestion from different sources and data export to SPRD data repository. We can leverage the available REDCap application programming interface (API), a RESTful web service for storing and retrieving data to and from REDCap. We will survey users, data custodians and the community for feedback and desired new features.