Supplementary Materials

The current webpage provides the supplementary materials of the research article titled as follows and they are freely downloadable for researchers who are interested in this study.

Research article published in Bioinformatics 2007, 23(23):3147-3154.

Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure

Jiangning Song1, Zheng Yuan2, Hao Tan3, Thomas Huber4 and Kevin Burrage1*

1Advanced Computational Modelling Centre, The University of Queensland, Brisbane, QLD 4072, Australia
2ARC Centre in Bioinformatics and Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD 4072, Australia
3Caulfield School of Information Technology, Faculty of Information Technology, Monash University, Caulfield, East VIC 3145, Australia
4School of Molecular and Microbial Sciences and Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, QLD 4072, Australia

 

Contact: sjn@maths.uq.edu.au or kb@maths.uq.edu.au


 

 

Supplementary Materials for downloading:

1. 4-fold cross-validation SP39 ID list used in this study.

2. SP39 dataset with detailed information about protein ID, disulfide connectivity annotation and amino acid sequence. This dataset was originally prepared by Fariselli and Casadio in 2001. It was extracted from Swiss-Prot release 39 and contains only intrachain disulfide bond annotations that were experimentally verified. The aknowledgements should be addressed to Prof. Fariselli and Prof. Casadio.

3. SP39_template dataset serving as the training dataset to build SVR model in order to carry out the holdout independent test. This strategy was originally adopted by Chen et al. (2006) for the sake of further validating their two-level SVM predictors. This dataset was prepared by Zhao et al. (2005).

4. SP43_dataset serving as the testing dataset in order to carry out the holdout independent test. This dataset was also prepared by Zhao et al. (2005).

5. Disulfide connectivity patterns providing all the possible disulfide connectivity patterns according to the classification of disulfide bridge numbers.

6. SVR input format file here providing the input data format to build SVR models as an example, for the readers' reference. There are totally 623-dimensional vectors for a cysteine-cysteine residue pair. As discussed before in the Methodology, the first 520 dimensional vectors denote the cysteine-cysteine coupling pair, the subsequent 78-dimensional vectors denote the predicted secondary structure matrices by PSIPRED, then the following 20-dimensional vectors represent the amino acid content. For the 619th-623th dimensional vectors, they represent the normalized DOC value, cysteine ordering, the normalized protein molecular weight and the normalized sequence length, respectively. This SVR input format file was generated by using a protein with 2 disulfide bonds as an example.

7. Prediction accuracy based on SP39 dataset using 4-fold cross-validation method and Prediction accuracy using SP39_template dataset as the training dataset to build SVR models and then applying the built model to the independent testing SP43_dataset in order to provide a more stringent and independent evaluation of our method.

8. Supplementary Material for the paper published in Bioinformatics. This file contains the supplementary table 1, 2 and 3, the supplementary figures 1, as well as the detailed methodology and term explanations used in our paper.