Skip to main content

Advertisement

Table 10 Summary of the modeling/validation strategy

From: Ridge regression estimated linear probability model predictions of O-glycosylation in proteins with structural and sequence data

Row Data used What the data is used for Outcome
1 The 998 sequences in dbogap-str and the 1,083 sequences in glycos Predicting the likelihood of O-GlcNAc glycosylation with sequence and structural data Table 11
2 The 340 sequences in dbogap-seq and the 361 sequences in Ngly Predicting the likelihood of O-GlcNAc glycosylation with only sequence data. A sequence is considered to be mispredicted if its predicted probability of O-glycosylation is less than 50% and it is O-glycosylated Table 14. About 11% of sequences in dbogap-seq are mispredicted as not being O-GlcNAc glycosylated; and 9% of the sequences in Ngly are mispredicted as being O-GlcNAc glycosylated
3 The 259 sequences in Oglc-non-dbogap Calculating the out-of-sample mispredictions rate with the LPM estimated for the exercise outlined in Row 2 of this Table 54 of the 259 sequences (≈ 21%) are mispredicted as not being O-GlcNAC glycosylated
4 The 2,079 sequences in Ogal Calculate the out-of-sample mispredictions rate with the LPM estimated for the exercise outlined in Row 2 of this Table 656 of the 2,079 (≈ 31.6%) are mispredicted as not being O-GalNAc glycosylated
5 The 236 sequences in WSTW-Uniprot To see if any of these are O-glycosylated None are O-glycosylated. This again indicates that ~ (W – S/T – W) is likely necessary for O-glycosylation