Skip to main content

IDPsBind: a repository of binding sites for intrinsically disordered proteins complexes with known 3D structures

Abstract

Background

Intrinsically disordered proteins (IDPs) lack a stable three-dimensional structure under physiological conditions but play crucial roles in many biological processes. Intrinsically disordered proteins perform various biological functions by interacting with other ligands.

Results

Here, we present a database, IDPsBind, which displays interacting sites between IDPs and interacting ligands by using the distance threshold method in known 3D structure IDPs complexes from the PDB database. IDPsBind contains 9626 IDPs complexes and 880 intrinsically disordered proteins verified by experiments. The current release of the IDPsBind database is defined as version 1.0. IDPsBind is freely accessible at http://www.s-bioinformatics.cn/idpsbind/home/.

Conclusions

IDPsBind provides more comprehensive interaction sites for IDPs complexes of known 3D structures. It can not only help the subsequent studies of the interaction mechanism of intrinsically disordered proteins but also provides a suitable background for developing the algorithms for predicting the interaction sites of intrinsically disordered proteins.

Background

Intrinsically disordered proteins (IDPs) lack a stable secondary or tertiary structure under physiological conditions. Still, they participate in many important biological processes such as cell signal transduction, DNA metabolism, mRNA alternative splicing, protein–protein interaction, and so on [1,2,3]. Recent studies have shown that IDPs are associated with some diseases when modifications, translations, or expressions of IDPs are abnormal [4,5,6]. Due to the importance of IDPs in organisms, IDPs have become a hot spot in the current research on protein function. Over the past 20 years, many IDPs have been validated experimentally or computationally. Disprot is the first curated database containing a collection of experimentally validated IDPs and IDP disordered regions. [7]. The Disprot database includes a total of 1590 IDPs sequences, excluding the ambiguous and obsolete regions in release 2020_12. These IDPs are from 10 different species. The D2P2 database consists of computationally predicted IDPs from distinct proteomes [8], in which annotations for IDPs are derived from MobiDB. MobiDB3.0 [9] provides information about intrinsically disordered regions (IDRs), related features from various sources, and prediction tools. Different levels of reliability and different features are reported as different and independent annotations. The IDEAL [10] is a database incorporating functional with structural/disorder annotations for IDPs by manually integrating protein databank (PDB) [11].

These IDPs perform critical biological functions by interacting with other proteins or ligands. Currently, there are some databases about binding sites [12,13,14,15], such as Disbind, in which contains 226 IDPs with functional site annotations and binding ligands, including proteins, RNA, DNA, metal ions and others, respectively. However, studying IDPs-ligand interactions is still challenging due to the flexible binding affinity. Lack of enough information on how IDPs interact with other molecules, the biological functions for mostly IDPs are unknown. Although existing databases contain some helpful information, the number of IDPs and IDPs complexes is too tiny to support further study of the IDPs-ligand interaction. With the progress of structural biology, the number of protein structure files in PDB is growing rapidly, among which the structure files of IDPS are also increasing. This motivates us to develop a comprehensive IDPs-ligand interaction database and provide more interacting sites. In this paper, we introduced the IDPsBind database, which contains 9626 IDPs complex interactions. IDPsBind displays binding sites between IDPs and interacting ligands by using the distance threshold method in known 3D structure IDPs complexes from the PDB database. IDPs are selected from the Disprot database (release 2020_12). Each entry in IDPsBind contains a comprehensive list of annotations: primary information of an IDPs, sequence information, and binding sites information in PDB sequences.

IDPsBind database construction

The following is the procedure for IDPsBind construction. (a) The intrinsically disordered proteins (IDPs) are derived from the Disprot (http://www.Disprot.org/, release: 2020_12) [7]; (b) Eliminate those IDPs with ambiguous and obsolete regions; (c) IDPs-ligand complex structures with x-ray crystallography resolution of better than 3.5 angstroms in the PDB Database are selected for study; (d) Elimination those IDPs with mutant residues in the IDPs complexes; (e) Binding ligands of IDPs are selected from HETATM in the PDB file. Finally, IDPsBind contains 880 IDPs and 9626 IDPs complexes (from PDB).

As in previous studies, an amino acid residue within a protein sequence is designated as a binding site if it contains at least one atom that falls within a cutoff distance from any atoms of the ligand molecule in the complex [16,17,18,19]. Binding residues in IDPsBind are determined by a distance cutoff of 3.5 angstroms between any atoms of a protein. All corresponding PDB chains (resolution better than 3.5 angstroms) for an IDP are used for analysis. All binding ligands and binding sites information are derived from ATOM and HETATM in the PDB file. The construction process of the IDPsBind database, the distribution of binding ligands and binding sites in the IDPsBind, are shown in Figs. 123 and 4.

Fig. 1
figure 1

Distribution of 880 IDPs in the organism

Fig. 2
figure 2

Workflow of the construction of IDPsBind

Fig. 3
figure 3

Distribution of the top 20 binding ligands

Fig. 4
figure 4

Distribution of the top 20 most numerous binding sites

Web interface

IDPsBind provides six basic interfaces: Home, Browse, Download, Search, Statistics, and Help. The ‘Home’ page describes the introduction of the IDPsBind database provides links to the three primary associated databases. The ‘Browse’ page displays the summary of all entries in the IDPsBind database. This interface lists six components: IDPsBind ID, Disprot ID, UniProt ID, Protein name, Source, and Disordered content. All items collected in IDPsBind numbered from IDP0001 to IDP0880 can be retrieved by clicking the ‘Browse’ option. Clicking any IDPsBind ID will return a display of the detailed information on the target chain. The data stored for each ID has three parts. The first part shows the basic information about the protein. The second part provides the sequence of IDPs in the Disprot database and color-codes the disordered regions. The third part shows the labeled binding sites of the PDB chains (resolution better than 3.5 angstroms) when the protein interacts with ligands. Moreover, the interface displays the abbreviation of the ligand, click the ‘?’ label will show the full name of the ligand. Users can also check specific ligand information clicking the abbreviated ligand jump link to a new interface. And clicking the ‘load’ option, the corresponding structure of the complex is visualized on the right side of the page. The IDPsBind database is freely available for download. Users can download them as a whole on the ‘Download’ page. On the Search interface, users enter any keyword or IDPsBind /Disprot / Uniprot ID and then click the ‘search’, then the page shows the results similar to the “Browse the entry” page. The ‘Statistics’ interface shows some basic data and information about the IDPsBind database. The ‘Help’ interface answers some questions on the IDPsBind database. The current release of IDPsBind is 1.0, which will be updated in the future during the PDB release update.

Conclusions

We have developed a comprehensive IDPs-ligand interaction database, IDPsBind, in which IDPs are taken from the DisProt database (2020_12), and corresponding IDPs complexes are from the PDB database. Although there are already a handful of ligand-binding databases in the literature, IDPsBind is distinguished from other databases in the following aspects. (a) IDPsBind contains many interactions of IDPs, 3203 binding ligands including proteins, DNA, RNA, et al. (b) The interaction includes not only the disordered regions with the ligand, but also that of the ordered regions in IDPsBind. (c) The IDPs-ligand binding information is based on the PDB file, and all the PDB chains (resolution better than 3.5 angstroms) for IDPs were analyzed. In this way, ligand-binding sites of the target chain cannot be missing in IDPsBind. (d) All data in IDPsBind database are freely available for download. We hope that the IDPsBind can provide helpful information required for specific IDPs-relevant studies.

Availability of data and materials

The author can provide compiled executable file on data in this article. Please send an email to the author (yefeng@imau.edu.cn) to query the relevant data of this paper. And all data can be downloaded freely in IDPsBind.

References

  1. Csizmok V, Follis AV, Kriwacki RW, Forman-Kay JD. Dynamic protein interaction networks and new structural paradigms in signaling. Chem Rev. 2016;116(11):6424–62.

    Article  CAS  Google Scholar 

  2. Wright PE, Dyson HJ. Intrinsically disordered proteins in cellular signalling and regulation. Nat Rev Mol Cell Biol. 2015;16(1):18–29.

    Article  CAS  Google Scholar 

  3. Binolfi A, Limatola A, Verzini S, Kosten J, Theillet F-X, Rose HM, Bekei B, Stuiver M, Van Rossum M, Selenko P. Intracellular repair of oxidation-damaged α-synuclein fails to target C-terminal modification sites. Nat Commun. 2016;7(1):1–10.

    Article  Google Scholar 

  4. Fung HYJ, Birol M, Rhoades E. IDPs in macromolecular complexes: the roles of multivalent interactions in diverse assemblies. Curr Opin Struct Biol. 2018;49:36–43.

    Article  CAS  Google Scholar 

  5. Babu MM. The contribution of intrinsically disordered regions to protein function, cellular complexity, and human disease. Biochem Soc Trans. 2016;44(5):1185–200.

    Article  CAS  Google Scholar 

  6. Babu MM, van der Lee R, de Groot NS, Gsponer J. Intrinsically disordered proteins: regulation and disease. Curr Opin Struct Biol. 2011;21(3):432–40.

    Article  CAS  Google Scholar 

  7. Hatos A, Hajdu-Soltész B, Monzon AM, Palopoli N, Álvarez L, Aykac-Fas B, Bassot C, Benítez GI, Bevilacqua M, Chasapi A. DisProt: intrinsic protein disorder annotation in 2020. Nucleic Acids Res. 2020;48(D1):D269–76.

    CAS  PubMed  Google Scholar 

  8. Oates ME, Romero P, Ishida T, Ghalwash M, Mizianty MJ, Xue B, Dosztanyi Z, Uversky VN, Obradovic Z, Kurgan L. D2P2: database of disordered protein predictions. Nucleic Acids Res. 2012;41(D1):D508–16.

    Article  Google Scholar 

  9. Piovesan D, Tabaro F, Paladin L, Necci M, Mičetić I, Camilloni C, Davey N, Dosztányi Z, Mészáros B, Monzon AM. MobiDB 3.0: more annotations for intrinsic disorder, conformational diversity and interactions in proteins. Nucleic Acids Res. 2018;46(D1):D471–6.

    Article  CAS  Google Scholar 

  10. Fukuchi S, Sakamoto S, Nobe Y, Murakami SD, Amemiya T, Hosoda K, Koike R, Hiroaki H, Ota M. IDEAL: intrinsically disordered proteins with extensive annotations and literature. Nucleic Acids Res. 2012;40(D1):D507–11.

    Article  CAS  Google Scholar 

  11. Burley SK, Berman HM, Kleywegt GJ, Markley JL, Nakamura H, Velankar S. Protein Data Bank (PDB): The Single Global Macromolecular Structure Archive. Methods Mol Biol. 2017;1607:627–41.

  12. Yu J-F, Dou X-H, Sha Y-J, Wang C-L, Wang H-B, Chen Y-T, Zhang F, Zhou Y, Wang J-H. DisBind: A database of classified functional binding sites in disordered and structured regions of intrinsically disordered proteins. BMC Bioinformatics. 2017;18(1):1–5.

    Article  Google Scholar 

  13. Fichó E, Reményi I, Simon I, Mészáros B. MFIB: a repository of protein complexes with mutual folding induced by binding. Bioinformatics. 2017;33(22):3682–4.

    Article  Google Scholar 

  14. Yang J, Roy A, Zhang Y. BioLiP: a semi-manually curated database for biologically relevant ligand–protein interactions. Nucleic Acids Res. 2012;41(D1):D1096–103.

    Article  Google Scholar 

  15. Schad E, Fichó E, Pancsa R, Simon I, Dosztányi Z, Mészáros B. DIBS: a repository of disordered binding sites mediating interactions with ordered proteins. Bioinformatics. 2018;34(3):535–7.

    Article  CAS  Google Scholar 

  16. Gao M, Skolnick J. A threading-based method for the prediction of DNA-binding proteins with application to the human genome. PLoS Comput Biol. 2009;5(11):e1000567.

    Article  Google Scholar 

  17. Capra JA, Laskowski RA, Thornton JM, Singh M, Funkhouser TA. Predicting protein ligand binding sites by combining evolutionary sequence conservation and 3D structure. PLoS Comput Biol. 2009;5(12):e1000585.

    Article  Google Scholar 

  18. Zhao H, Yang Y, Zhou Y. Structure-based prediction of RNA-binding domains and RNA-binding sites and application to structural genomics targets. Nucleic Acids Res. 2011;39(8):3017–25.

    Article  CAS  Google Scholar 

  19. Kumar M, Gromiha MM, Raghava GPS. Prediction of RNA binding sites in a protein using SVM and PSSM profile. Proteins. 2008;71(1):189–94.

    Article  CAS  Google Scholar 

Download references

Acknowledgements

The authors are grateful to the anonymous reviewers for their valuable suggestions and comments, which have led to the improvement of this paper. The authors wish to thank all the members that do this work.

Funding

Funding for the work was provided by the Special Project of the National Natural Science Foundation of China (62141204) and the National Natural Science Foundation of China (62063024, 61461038), the Scientific Research Program at Universities of Inner Mongolia Autonomous Region of China (NJZY20005).

Author information

Authors and Affiliations

Authors

Contributions

F.YE. designed the project and performed the analysis, and drafted the manuscript. S.CZ. collected the data and carried out the computation of binding sites. S.CZ. & F.GL. set up IDPsBind web server. The corresponding author: F.YE. & F.GL. correspond to yefeng@imau.edu.cn. All authors have read and approved the final manuscript.

Authors’ information

Canzhuang Sun is a Master's student at the College of Science, Inner Mongolia Agriculture University. His research interests lie in the field of the interaction between intrinsically disordered proteins and binding ligands.

Yonge Feng is a Full Professor at the College of Science, Inner Mongolia Agriculture University. Her research focuses on machine learning and structure of protein, function, and bioinformatics. 

Guoliang Fan is a Full Professor in the Department of Physics, School of Physical Science and Technology, Inner Mongolia University. His research focuses on machine learning and epigenetics.

Corresponding authors

Correspondence to YongE Feng or GuoLiang Fan.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors confirm that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, C., Feng, Y. & Fan, G. IDPsBind: a repository of binding sites for intrinsically disordered proteins complexes with known 3D structures. BMC Mol and Cell Biol 23, 33 (2022). https://doi.org/10.1186/s12860-022-00434-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12860-022-00434-5

Keywords