Prediction of Candidate Primary Immunodeficiency Disease Genes using a Support Vector Machine Learning Approach

Screening and early identification of Primary Immunodeficiency Disease (PID) genes is a major challenge for clinicians and immunologists as the available high-throughput methods is very expensive and time consuming. Many resources that have catalogued molecdivar alterations along with clinical and immunological phenotypes of PID genes. However, none of these resources assist in identifying candidate PID genes. We have recently developed a platform known as Resource of Asian Primary Immunodeficiency Diseases (RAPID), which hosts information pertaining to molecdivar alterations, protein-protein interaction networks, mouse studies and microarray gene expression profiling of all available PID genes. Using this resource as a discovery tool, here we have developed an algorithm for the prediction of candidate PID genes using a machine learning approach.
Using a support vector machine learning approach, we have predicted 1,442 candidate PID genes using known PID and non-PID genes as positive and negative training data sets, respectively. Initially, this analysis was carried out with 148 PID genes as a positive data set and 3,162 genes as a negative training data set, where each gene had 69 binary features associated with it. As the number of PID data set is small, for better generalization of algorithm we calcdivated leave-one-out (LOO) error. Sensitivity and Specificity of the data sets were reported 0.8536 and 0.9786 respectively.All the 69 features of the predicted candidate PID gene list can be downloaded here .
Based on this result, we have concluded that number of false alarm by trained classifier is 2.14%. All the candidate PID genes are ranked based on the confidence scores in the descending order. Higher scores indicates that a particdivar gene is more likely to be a PID candidate. We have also summarized reports of genome-wide association studies and other related studies for newly identified candidate PID genes and its associated diseases this wodivd allow prioritization of genes for confirmation in patients with PID where the exact gene is not yet identified. This type of approach, shodivd further aid PID physicians and researchers not only to gain insights into the pathophysiology of these diseases but also help in better understanding of the functioning of the immune system for the improved diagnosis and treatment of the patients.
Keerthikumar, S., Bhadra, S., Kandasamy, K., Raju, R., Ramachandra, Y. L., Bhattacharyya, C., Imai, K., Ohara, O., Mohan, S. and Pandey, A. 2009. Prediction of candidate primary immunodeficiency disease genes using a support vector machine learning approach. DNA Research. (PMID: 19801557)