Ridge regression estimated linear probability model predictions of O-glycosylation in proteins with structural and sequence data

Gana, Rajaram; Vasudevan, Sona

doi:10.1186/s12860-019-0200-9

Table 8 Description of the collected data^a

From: Ridge regression estimated linear probability model predictions of O-glycosylation in proteins with structural and sequence data

Dataset name given	Data description	Identified in glycos_public.xlsx by	Sample Size	Source
dbogap-str	O-GlcNAc glycosylated sequences with sequence and structural data	Oglycos_status = yes	1,105. Where 998 are human with unique PDB-IDs; of these, only 16 are inferred from known O-GlcNAcylated orthologs (the others are experimentally validated). The remaining 107 sequences are non-human. These 998 sequences are in-sample data for the proteins with sequence and structural information. The structural information on the 998 sequences was collected	dbOGAP
dbogap	Human O-GlcNAc glycosylated sequences	Ogly_only_seq = yes	376. These are unique UniProt Accession No. and position pairs.	dbOGAP
dbogap-unique-seq-with-str	Extract of human sequences from dbogap-str with unique UniProt Accession number and position pairs (i.e., structure is ignored)	Not identified as it is derivable using software like SAS or R	39. Of these, 28 are experimentally validated and the remaining 11 are inferred. These 39 sequences become 998 in dbogap-str via the richness in conformational changes associated with them. Of the 39 sequences, 25 are unique proteins (UniProt Accession Nos.)	N/A
dbogap-seq	Merge dbogap with dbogap-unique-seq-with-str, by UniProt Accession No. and position, and retain those in the first dataset, but not in the second.	Not identified as it is derivable	340	N/A
Oglc-PS+	Additional extract of O-GlcNAc glycosylated human sequences	Ogly_21 = yes	411. Of these, 59.12% are glycosylated at S and the others at T. Of the 25 unique human proteins in dbogap-unique-seq-with-str, 18 are in Oglc-PS+	PhosphoSitePlus
Oglc-non-dbogap	Merge Oglc-PS+ with dbogap-seq and dbogap-unique-seq-with-str, by UniProt Accession No. and position, and retain those in Oglc-PS+, but not in dbogap-seq or dbogap-unique-seq-with-str	GLCNAC_s1 = yes	259. This is used as out-of-sample data. Note, 152 of the 340 sequences in dbogap-seq are in Oglc-PS+. The total number of unique sequences (UniProt Accession No. and UniProt position pairs) in dbogap and Oglc-PS+ is 638.	N/A
Ogal	O-GalNAc glycosylated human sequences	GALNAC_s1 = yes	2,079. This is used as out-of-sample data. Of these, 60.27% are glycosylated at T and the others at S	PhosphoSitePlus
Ngly	N-glycosylated sequences	glyco_status = yes	6,328. Of these, 2,422 are “Homo sapiens (Human)”. Of the 2,422, the count of sequences with more than one sugar bound is 1,083. These 1,083 sequences are in-sample data for the proteins with sequence and structural information. If structure is ignored, there are 361 unique sequences (i.e., unique UniProt Accession No. and position pairs). These 361 sequences are in-sample data for the proteins with only sequence data	Gana et al.[34]
Phosy	[35] Phosphorylated sequences	Not identified. This is archived in a separate file: Phosy.csv	363,256. Of these, 227,810 are human with amino acids in ±7 positions of the S/T-site; and 58.95%, 24.51%, and 16.54% are phosphorylated at S, T and Y, respectively	PhosphoSitePlus
WSTW-Uniprot	Human sequences with the W– S/T–W sequon	wstw = yes	236. This extract is unique in terms of Uniprot Accession No. & position pairs	UniProt

^a The columns describe the dataset name, counts of the sequences collected, description of the data and its source. For example, 1,105 O-GlcNAc glycosylated proteins with sequence and structural data are collected and stored as dataset dbogap-str. This data is identified in glycos_public.xlsx by “yes” in column Oglycos_status. In terms of unique PDB-IDs, there are 998 sequences in this data. The last column cites the source of the collected data, dbOGAP

Back to article page

ISSN: 2661-8850

Contact us

General enquiries: journalsubmissions@springernature.com

BMC Molecular and Cell Biology

Contact us