Skip to main content

Table 10 Summary of the modeling/validation strategy

From: Ridge regression estimated linear probability model predictions of O-glycosylation in proteins with structural and sequence data

Row

Data used

What the data is used for

Outcome

1

The 998 sequences in dbogap-str and the 1,083 sequences in glycos

Predicting the likelihood of O-GlcNAc glycosylation with sequence and structural data

Table 11

2

The 340 sequences in dbogap-seq and the 361 sequences in Ngly

Predicting the likelihood of O-GlcNAc glycosylation with only sequence data. A sequence is considered to be mispredicted if its predicted probability of O-glycosylation is less than 50% and it is O-glycosylated

Table 14. About 11% of sequences in dbogap-seq are mispredicted as not being O-GlcNAc glycosylated; and 9% of the sequences in Ngly are mispredicted as being O-GlcNAc glycosylated

3

The 259 sequences in Oglc-non-dbogap

Calculating the out-of-sample mispredictions rate with the LPM estimated for the exercise outlined in Row 2 of this Table

54 of the 259 sequences (≈ 21%) are mispredicted as not being O-GlcNAC glycosylated

4

The 2,079 sequences in Ogal

Calculate the out-of-sample mispredictions rate with the LPM estimated for the exercise outlined in Row 2 of this Table

656 of the 2,079 (≈ 31.6%) are mispredicted as not being O-GalNAc glycosylated

5

The 236 sequences in WSTW-Uniprot

To see if any of these are O-glycosylated

None are O-glycosylated. This again indicates that ~ (W – S/T – W) is likely necessary for O-glycosylation