Chronic lymphocytic leukemia (CLL) is one of the most common types of leukemia, and the early diagnosis of patients coincides with their proper treatment and survival. If patients are diagnosed late or proper treatment is not applied, it may lead to harmful results. Several methods could be used for the diagnosis of leukemia; some of these include complete blood count (CBC), immunophenotyping, lymph node biopsy, chest X-ray, computerized tomography (CT) scan, and ultrasound. Most of these methods are time-consuming and an application of more than one method will result as intended. This acknowledgment stresses the necessity of rapid and proper diagnosis for leukemia based on clinical and medical findings, inasmuch as it was decided to apply the artificial neural network (ANN) in order to identify a molecular biomarker for rapid leukemia diagnosis from blood samples and evaluate its potential for the detection of cancer.
Materials & methods
The independent sample t-test was applied with the Statistical Package for the Social Sciences (SPSS; IBM Corp, Armonk, NY, US) software on the microarray gene expression data of Gene Expression Omnibus (GEO) datasets (GSE22529); 12 genes that had shown the highest differences (among parameters whose p-value was less than 0.01) were selected for further ANN analysis. The selected genes of 53 patients were applied to the training network algorithm, with a learning rate of 0.1.
The results showed a high accuracy of the relationship between the output of the trained network and the test data. The area under the receiver operating characteristic (ROC) curve was 0.991, which provides proof of the precision and the relationship with identifying Gelsolin as a potential biomarker for this research.
With these results, it was concluded that the training process of the ANN could be applied to rapid CLL diagnosis and finding a potential biomarker. Besides, it is suggested that this method could be performed to diagnose other forms of cancer in order to get a rapid and reliable outcome.
Chronic lymphocytic leukemia (CLL) is a type of blood and bone marrow cancer in which uncontrolled and abnormal growth of lymphocytic cells occur. CLL is also characterized by clonal proliferation and progressive accumulation of B-cell lymphocytes that typically express cluster of differentiation 19+ (CD19+), CD5+, and CD23+. CLL progresses slowly and each year, there are more cases added to its list. CLL happens quite frequently in adults in contrast to acute leukemia, which is more frequent in children [1-3]. The survival rate in CLL cancer is significantly higher than in any other types of cancer, as around 83% of the patients show a five-year survival rate, which means 83% of patients with CLL are living at least five years after the diagnosis is made. The etiology of CLL is still elusive, however, it is most likely that genetics and environmental factors have an important effect on its occurrence [4-5]. In the karyotype experiments, del13q14, trisomy 12, del11q22-q23, and del17p13 were associated with CLL . Several other molecular pathologies were found in CLL, such as the overexpression of the unmutated immunoglobulin heavy chain variable region (IGHV) genes, zeta-chain-associated protein kinase 70 (ZAP-70), CD38 proteins, and mutations in the NOTCH1, splicing factor 3b subunit 1 (SF3B1), and baculoviral IAP repeat-containing 3 (BIRC3) genes. In addition, mutations in tumor suppressor genes, including tumor protein P53 (TP53) and ataxia-telangiectasia mutated (ATM), have been associated with degrees of resistance to common chemotherapeutic agents. Micro-ribonucleic acid (RNA) expression alterations and aberrant methylation patterns in genes, which are specifically deregulated in the CLL, including the B-cell lymphoma 2 (BCL-2), T-cell leukemia/lymphoma 1 (TCL1), and ZAP-70 genes, have also been encountered and linked to the distinct clinical parameters. The clinical manifestations of the diagnosis are extremely diverse [7-10]. Approximately 60% of the patients are asymptomatic, and the patient may be suspected of the disease after a routine blood test. When symptomatic, patients show unclear symptoms of fatigue or weakness. CLL is usually diagnosed with blood tests because the cancerous cells are found in the blood. A bone marrow biopsy is usually not needed to diagnose the CLL, but it may be done before the beginning of the treatment. Recently, molecular and cellular markers have helped with the prediction and diagnosis of CLL in patients. Therefore, the identification of key molecules in CLL could be important and vital in order to find a more effective diagnosis of CLL [2,11-12].
The gene-expression profiling using cDNA microarrays gives us the ability to simultaneously analyze multiple markers, which helps us categorize cancers into subgroups. Although, many statistical techniques to analyze the gene-expression data exist, none of them have been precisely tested for their ability to accurately distinguish different types of cancers and to further categorize them based on their diagnostic methods [13-14].
The artificial neural networks (ANNs) method is a computer-based algorithm that is modeled on the structure and behavior of neurons in the human brain and could be trained to recognize and categorize complex patterns. The pattern recognition is achieved by adjusting the parameters of the ANN by a process of error minimization through learning from experiences. The parameters could be calibrated using any type of input data, such as gene-expression levels generated by cDNA microarrays, and the output could be grouped into any given number of categories [15-16]. The ANN has been recently applied to the clinical cases, such as the diagnosis of myocardial infarction and arrhythmias from their respective electrocardiograms, and the interpretations of their radiographs and magnetic resonance imaging (MRI). The ANN correctly classifies all the samples and identifies the genes that are most related to the classifications [17-18].
In summary, the purpose of this study was to develop a method for classifying cancers to specific diagnostic categories based on their gene expression signatures using ANNs [19-20]. This study demonstrated the potential applications of these methods for CLL diagnosis and the identification of candidate targets (or genes) for diagnosis and therapy.
Materials & Methods
Microarray gene expression profile
The gene expression profile of CLL patients was extracted from the Gene Expression Omnibus (GEO) database under the accession number of GEO series 22529 (GSE22529), which was based on GEO platform 96 (GPL96) ((HG-U133A) Affymetrix Human Genome U133A array) and deposited by Jelinek D, Kay N. This data was submitted on June 23, 2010, and updated on August 10, 2018. The gene expression profile was generated from peripheral blood samples (n=104). This data contained the gene expression profile of 104 serum samples, including 81 patients and 23 healthy controls. In this study, the gene expression of 53 samples, including 11 healthy controls and 42 patients with CLL were selected for further analysis and validation of the ANN model.
Selection of genes by highest score
Initially, the expression levels of 22285 genes were ranked. Then, after a primary statistical analysis, the two-tailed student t-test was used to determine the statistical significance for the difference between the two groups of CLL patients and healthy individuals. The genes were selected based on the significance of their differences and were used as the inputs for the ANN analysis. Statistical analysis was carried out by the Statistical Package for the Social Sciences (SPSS; IBM Corp, Armonk, NY, US) 18 software.
Artificial neural network (ANN)
A multilayer perceptron ANN model with three layers was performed by RapidMiner software (RapidMiner, Inc., MA, US). The three layers consisted of an input layer, a hidden layer, and an output layer. The input layer contained 12 neurons corresponding to 12 input features; the hidden layer applied learning algorithm to the input features. Finally, the output layer had only one neuron, representing two possible diagnosis states of cancerous or noncancerous. The values of the output layer were 0 and 1, which were categorized as the cancerous and healthy control, respectively. Initially, the ANN was trained using the 12 genes with the highest scores as the input. Finally, to evaluate the ANN model, the area under the receiver operating characteristic (AUROC), the classification accuracy, and an index about reliability were calculated. The ROC curve is the plot of sensitivity (true positive rate) against 1-specificity (false positive rate), which was created by the GenEx version 6 software (MultiD Analyses, Göteborg,
Sweden) for all of the 12 inputs to recognize their differences separately and to select the best gene as a diagnostic biomarker. In this study, The Decision Tree and the Support Vector Machine (SVM) algorithms were also used to examine the ANN algorithm.
The first step of the process was selecting the significant genes. According to the t-test, the top 12 genes were identified as significant (p-value<0.001); they had the most differences between the two groups of healthy individuals and patients. These genes were selected from our dataset for further analyses (Table 1).
Then, to compare the cancerous patients with healthy groups, 12 features or genes were considered. Three algorithms were initially used to compare these two groups: the ANN, The Support Vector Machine, and The Decision Tree. Then, the algorithm with a better outcome was selected for the purpose of diagnosis. The results of this test are presented in Table 2. According to this analysis, it could be understood that the algorithm of the ANN could better distinguish the differences between the two groups with an accuracy of 99% (AUC=0.991, CA=0.969, F1=0.969); therefore, for the main analysis, this algorithm had been selected .
In the next step, the training process of the created neural network was performed for the purpose of diagnosis. According to Table 3, it is clear that the algorithm of the neural network has been able to correctly distinguish the two groups of patients from healthy with 98% accuracy based on the expression of those 12 genes. The area under the ROC curve was measured to estimate the diagnostic performance of the ANN. Table 3 shows the results of cross-validation using the ANN algorithm.
Then, in order to identify the best gene in the 12 selected genes as a diagnostic biomarker, the ROC curve analysis and plots for all of the 12 genes were calculated and determined (Table 4, Figure 1). According to Table 4, the Gelsolin (GSN) with AUC = 0.971, specificity = 0.902, and sensitivity = 1, could be a better diagnostic biomarker for the diagnosis of the CLL (Figure 2).
The diagnosis of the CLL in the early stages may lead to an increase in the survival rate and might demonstrate better therapeutic results. While taking a biopsy from the bone marrow still remains the golden standard for screening CLL, the approach has several shortcomings including its invasiveness and the patient’s discomfort. Contrarily, the non-invasive tests, such as the blood test, has low sensitivity and specificity [22-23]. The previous studies were used to indicate that the genes were potential candidates for the diagnosis of cancer in its early stages. We used the ANN analysis to provide more improvements and advancements in order to pinpoint cancer more accurately, in its early stages, and to further evaluate this computer technique potential in the assay of detecting cancer since the examination of the expression levels of a panel of genes could be used to classify patients into cancerous and healthy individuals.
Recently, the application of the ANN has found its way into medical fields. Although the ANN training algorithms vary, they share one basic function: all networks accept a set of inputs and, based on their hidden layer algorithm, generate corresponding outputs. The ANN is particularly practical for medical cases without a linear solution. In the presented study, an ANN model provided good predictive accuracy as a diagnostic biomarker for a precise classification. It was hypothesized that not only could this technique increase the accuracy of the CLL diagnostic tests, but it could also be applied to several other types of cancer diagnoses. In this paper, we made the effort to obtain sample data from the GEO database in order to classify them as cancerous and healthy individuals. Twelve genes, as listed in Table 1, were shown to be appropriate for the accurate diagnosis of CLL using the ANN algorithm. The accuracy of the detection of cancer by the assembled ANN was analyzed by the ROC analysis. The outputs of the trained ANN for testing data were used to plot the ROC curve, and the area under the ROC curve was 0.991. These results demonstrated the high performance of the ANN training in the diagnosis of CLL according to its gene expression and the good learning process of the ANN, therefore, a panel of genes could be used for the ANN algorithm to detect CLL.
Based on the ANN results and ROC curve, it could be stated that GSN has a potential diagnostic biomarker value. GSN plays many roles in various types of cancer. Gelsolin is a ubiquitous actin filament-severing protein, one of the most important members of the actin-severing superfamily, and plays a crucial role in the regulation of actin filament assembly and disassembly. Additionally, it has an important responsibility in many other cellular properties, such as carcinogenesis phenotypes, epithelial-mesenchymal transition (EMT), motility, apoptosis, proliferation, and differentiation [24-26]. GSN overexpression has been seen in many cancers, including breast cancer, oral carcinoma cells, colorectal cancer, ovarian cancer, and leukemia [27-28].
In conclusion, by collaborating the t-test and the ANN, it is possible to identify a minimum and an optimum number of gene biomarkers for the classification of healthy and CLL individuals. Based on the gene expression values, a trained ANN model accurately classified the sample data into the cancerous and non-cancerous categories. As a result, it was shown that the learning technique of the ANN could accurately differentiate cancerous samples from the non-cancerous. It could also choose the potential biomarker gene in a more time-efficient manner, which could result in better diagnosis and better treatment.
- Rozman C, Montserrat E: Chronic lymphocytic leukemia. N Engl J Med. 1995, 333:1052-1057. 10.1056/NEJM199510193331606
- Chiorazzi N, Rai KR, Ferrarini M: Chronic lymphocytic leukemia. N Engl J Med. 2005, 352:804-815. 10.1056/NEJMra041720
- Keating MJ: Chronic lymphocytic leukemia. Semin Oncol. 1999, 26:107-114.
- Moghadam MH, Movafagh A, Omrani M, et al.: Identification of homogeneously staining regions in leukemia patients. J Res Med Sci. 2013, 18:363.
- Nigam R, Jain R, Malik R, Banseria N, Ahirwar R: An association of hepatitis virus infection with rare hemopoietic malignancies. J Evol Med Dent Sci. 2013, 2:8360-8365.
- Rassy EE, Chebly A, Korban R, et al.: Untreated chronic lymphocytic leukemia in Lebanese patients: an observational study using standard karyotyping and FISH. Int J Hematol Oncol. 2017, 6:105-111. 10.2217/ijh-2017-0019
- Hallek M, Cheson BD, Catovsky D, et al.: Guidelines for the diagnosis and treatment of chronic lymphocytic leukemia: a report from the International Workshop on Chronic Lymphocytic Leukemia updating the National Cancer Institute-Working Group 1996 guidelines. Blood. 2008, 111:5446-5456. 10.1182/blood-2007-06-093906
- Calin GA, Liu C-G, Sevignani C, et al.: MicroRNA profiling reveals distinct signatures in B cell chronic lymphocytic leukemias. Proc Natl Acad Sci USA. 2004, 101:11755-11760. 10.1073/pnas.0404432101
- Calin GA, Dumitru CD, Shimizu M, et al.: Frequent deletions and down-regulation of micro-RNA genes miR15 and miR16 at 13q14 in chronic lymphocytic leukemia. Proc Natl Acad Sci USA. 2002, 99:15524-15529. 10.1073/pnas.242606799
- Hamblin TJ, Davis Z, Gardiner A, Oscier DG, Stevenson FK: Unmutated Ig VH genes are associated with a more aggressive form of chronic lymphocytic leukemia. Blood. 1999, 94:1848-1854.
- Staudt LM: Molecular diagnosis of the hematologic cancers. N Engl J Med. 2003, 348:1777-1785. 10.1056/NEJMra020067
- Binet J, Auquier A, Dighiero G, et al.: A new prognostic classification of chronic lymphocytic leukemia derived from a multivariate survival analysis. Cancer. 1981, 48:198-206. 10.1002/1097-0142(19810701)48:1<198::AID-CNCR2820480131>3.0.CO;2-V
- Schena M, Shalon D, Davis RW, Brown PO: Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science. 1995, 270:467-470. 10.1126/science.270.5235.467
- Wang Y, Tetko IV, Hall MA, Frank E, Faciusa A, Mayera KFX, Mewesac HW: Gene selection from microarray data for cancer classification—a machine learning approach. Comput Biol Chem. 2005, 29:37-46. 10.1016/j.compbiolchem.2004.11.001
- Khan J, Wei JS, Ringner M, et al.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med. 2001, 7:673-679. 10.1038/89044
- Cho S-B, Won H-H: Machine learning in DNA microarray analysis for cancer classification. Proceedings of the First Asia-Pacific Bioinformatics Conference on Bioinformatics 2003. Australian Computer Society, Inc, Sydney; 2003. 19:189-198.
- Djavan B, Remzi M, Zlotta A, Seitz C, Snow P, Marberger M: Novel artificial neural network for early detection of prostate cancer. J Clin Oncol. 2002, 20:921-929. 10.1200/JCO.2002.20.4.921
- Burke HB, Goodman PH, Rosen DB, et al.: Artificial neural networks improve the accuracy of cancer survival prediction. Cancer. 1997, 79:857-862. 10.1002/(SICI)1097-0142(19970215)79:4<857::AID-CNCR24>3.0.CO;2-Y
- Statnikov A, Wang L, Aliferis CF: A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification. BMC Bioinformatics. 2008, 9:319. 10.1186/1471-2105-9-319
- Afshar S, Afshar S, Warden E, Manochehri H, Saidijam M: Application of artificial neural network in miRNA biomarker selection and precise diagnosis of colorectal cancer [Epub]. Iran Biomed J. 2018,
- Afshar S, Abdolrahmani F, Vakili Tanha F, Zohdi Seif M, Taheri K: Recognition and prediction of leukemia with artificial neural network (ANN). Med J Islam Repub Iran. 2011, 25:35-39.
- Afshar S, Abdolrahmani F, Tanha FV, Seaf MZ, Taheri K: Quick and reliable diagnosis of stomach cancer by artificial neural network. WSEAS. 2009, 30-35.
- Wu Y, Giger ML, Doi K, Vyborny CJ, Schmidt RA, Metz CE: Artificial neural networks in mammography: application to decision making in the diagnosis of breast cancer. Radiology. 1993, 187:81-87. 10.1148/radiology.187.1.8451441
- Movafagh A, Hajifathali A, Zamani M: Secondary chromosomal abnormalities of de novo acute myeloid leukemia: a first report from the Middle East. Asian Pac J Cancer Prev. 2011, 12:2991-2994.
- Ghaedi H, Bastami M, Zare-Abdollahi D, et al.: Bioinformatics prioritization of SNPs perturbing microRNA regulation of hematological malignancy-implicated genes. Genomics. 2015, 106:360-366. 10.1016/j.ygeno.2015.10.004
- McLaughlin P, Gooch J, Mannherz H-G, Weeds A: Structure of gelsolin segment 1-actin complex and the mechanism of filament severing. Nature. 1993, 364:685-692. 10.1038/364685a0
- Janmey PA, Stossel TP: Modulation of gelsolin function by phosphatidylinositol 4, 5-bisphosphate. Nature. 1987, 325:362-364. 10.1038/325362a0
- Sun HQ, Yamamoto M, Mejillano M, Yin HL: Gelsolin, a multifunctional actin regulatory protein. J Biol Chem. 1999, 274:33179-33182. 10.1074/jbc.274.47.33179
Application of an Artificial Neural Network in the Diagnosis of Chronic Lymphocytic Leukemia
Ethics Statement and Conflict of Interest Disclosures
Human subjects: All authors have confirmed that this study did not involve human participants or tissue. Animal subjects: All authors have confirmed that this study did not involve animal subjects or tissue. Conflicts of interest: In compliance with the ICMJE uniform disclosure form, all authors declare the following: Payment/services info: All authors have declared that no financial support was received from any organization for the submitted work. Financial relationships: All authors have declared that they have no financial relationships at present or within the previous three years with any organizations that might have an interest in the submitted work. Other relationships: All authors have declared that there are no other relationships or activities that could appear to have influenced the submitted work.
We received great support and help from Dr. Saeid Afshar from the Hamadan University of Medical Sciences, who helped with the preparation of this article and the necessary analyses. We are most grateful to his kind efforts and cooperation. We would also like to express our gratitude to Mr. Sadegh Khorrami from the University of Isfahan, whose undeniable help made the preparation of this article possible.
Cite this article as:
Shaabanpour Aghamaleki F, Mollashahi B, Nosrati M, et al. (February 04, 2019) Application of an Artificial Neural Network in the Diagnosis of Chronic Lymphocytic Leukemia. Cureus 11(2): e4004. doi:10.7759/cureus.4004
Received by Cureus: December 27, 2018
Peer review began: January 13, 2019
Peer review concluded: January 30, 2019
Published: February 04, 2019
© Copyright 2019
Shaabanpour Aghamaleki et al. This is an open access article distributed under the terms of the Creative Commons Attribution License CC-BY 3.0., which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.