"Never doubt that a small group of thoughtful, committed citizens can change the world. Indeed, it is the only thing that ever has."

Margaret Mead
Original article

Application of an Artificial Neural Network in the Diagnosis of Chronic Lymphocytic Leukemia



Chronic lymphocytic leukemia (CLL) is one of the most common types of leukemia, and the early diagnosis of patients coincides with their proper treatment and survival. If patients are diagnosed late or proper treatment is not applied, it may lead to harmful results. Several methods could be used for the diagnosis of leukemia; some of these include complete blood count (CBC), immunophenotyping, lymph node biopsy, chest X-ray, computerized tomography (CT) scan, and ultrasound. Most of these methods are time-consuming and an application of more than one method will result as intended. This acknowledgment stresses the necessity of rapid and proper diagnosis for leukemia based on clinical and medical findings, inasmuch as it was decided to apply the artificial neural network (ANN) in order to identify a molecular biomarker for rapid leukemia diagnosis from blood samples and evaluate its potential for the detection of cancer.

Materials & methods

The independent sample t-test was applied with the Statistical Package for the Social Sciences (SPSS; IBM Corp, Armonk, NY, US) software on the microarray gene expression data of Gene Expression Omnibus (GEO) datasets (GSE22529); 12 genes that had shown the highest differences (among parameters whose p-value was less than 0.01) were selected for further ANN analysis. The selected genes of 53 patients were applied to the training network algorithm, with a learning rate of 0.1.


The results showed a high accuracy of the relationship between the output of the trained network and the test data. The area under the receiver operating characteristic (ROC) curve was 0.991, which provides proof of the precision and the relationship with identifying Gelsolin as a potential biomarker for this research.


With these results, it was concluded that the training process of the ANN could be applied to rapid CLL diagnosis and finding a potential biomarker. Besides, it is suggested that this method could be performed to diagnose other forms of cancer in order to get a rapid and reliable outcome.


Chronic lymphocytic leukemia (CLL) is a type of blood and bone marrow cancer in which uncontrolled and abnormal growth of lymphocytic cells occur. CLL is also characterized by clonal proliferation and progressive accumulation of B-cell lymphocytes that typically express cluster of differentiation 19+ (CD19+), CD5+, and CD23+. CLL progresses slowly and each year, there are more cases added to its list. CLL happens quite frequently in adults in contrast to acute leukemia, which is more frequent in children [1-3]. The survival rate in CLL cancer is significantly higher than in any other types of cancer, as around 83% of the patients show a five-year survival rate, which means 83% of patients with CLL are living at least five years after the diagnosis is made. The etiology of CLL is still elusive, however, it is most likely that genetics and environmental factors have an important effect on its occurrence [4-5]. In the karyotype experiments, del13q14, trisomy 12, del11q22-q23, and del17p13 were associated with CLL [6]. Several other molecular pathologies were found in CLL, such as the overexpression of the unmutated immunoglobulin heavy chain variable region (IGHV) genes, zeta-chain-associated protein kinase 70 (ZAP-70), CD38 proteins, and mutations in the NOTCH1, splicing factor 3b subunit 1 (SF3B1), and baculoviral IAP repeat-containing 3 (BIRC3) genes. In addition, mutations in tumor suppressor genes, including tumor protein P53 (TP53) and ataxia-telangiectasia mutated (ATM), have been associated with degrees of resistance to common chemotherapeutic agents. Micro-ribonucleic acid (RNA) expression alterations and aberrant methylation patterns in genes, which are specifically deregulated in the CLL, including the B-cell lymphoma 2 (BCL-2), T-cell leukemia/lymphoma 1 (TCL1), and ZAP-70 genes, have also been encountered and linked to the distinct clinical parameters. The clinical manifestations of the diagnosis are extremely diverse [7-10]. Approximately 60% of the patients are asymptomatic, and the patient may be suspected of the disease after a routine blood test. When symptomatic, patients show unclear symptoms of fatigue or weakness. CLL is usually diagnosed with blood tests because the cancerous cells are found in the blood. A bone marrow biopsy is usually not needed to diagnose the CLL, but it may be done before the beginning of the treatment. Recently, molecular and cellular markers have helped with the prediction and diagnosis of CLL in patients. Therefore, the identification of key molecules in CLL could be important and vital in order to find a more effective diagnosis of CLL [2,11-12].

The gene-expression profiling using cDNA microarrays gives us the ability to simultaneously analyze multiple markers, which helps us categorize cancers into subgroups. Although, many statistical techniques to analyze the gene-expression data exist, none of them have been precisely tested for their ability to accurately distinguish different types of cancers and to further categorize them based on their diagnostic methods [13-14].

The artificial neural networks (ANNs) method is a computer-based algorithm that is modeled on the structure and behavior of neurons in the human brain and could be trained to recognize and categorize complex patterns. The pattern recognition is achieved by adjusting the parameters of the ANN by a process of error minimization through learning from experiences. The parameters could be calibrated using any type of input data, such as gene-expression levels generated by cDNA microarrays, and the output could be grouped into any given number of categories [15-16]. The ANN has been recently applied to the clinical cases, such as the diagnosis of myocardial infarction and arrhythmias from their respective electrocardiograms, and the interpretations of their radiographs and magnetic resonance imaging (MRI). The ANN correctly classifies all the samples and identifies the genes that are most related to the classifications [17-18].

In summary, the purpose of this study was to develop a method for classifying cancers to specific diagnostic categories based on their gene expression signatures using ANNs [19-20]. This study demonstrated the potential applications of these methods for CLL diagnosis and the identification of candidate targets (or genes) for diagnosis and therapy.

Materials & Methods

Microarray gene expression profile

The gene expression profile of CLL patients was extracted from the Gene Expression Omnibus (GEO) database under the accession number of GEO series 22529 (GSE22529), which was based on GEO platform 96 (GPL96) ((HG-U133A) Affymetrix Human Genome U133A array) and deposited by Jelinek D, Kay N. This data was submitted on June 23, 2010, and updated on August 10, 2018. The gene expression profile was generated from peripheral blood samples (n=104). This data contained the gene expression profile of 104 serum samples, including 81 patients and 23 healthy controls. In this study, the gene expression of 53 samples, including 11 healthy controls and 42 patients with CLL were selected for further analysis and validation of the ANN model.

Selection of genes by highest score

Initially, the expression levels of 22285 genes were ranked. Then, after a primary statistical analysis, the two-tailed student t-test was used to determine the statistical significance for the difference between the two groups of CLL patients and healthy individuals. The genes were selected based on the significance of their differences and were used as the inputs for the ANN analysis. Statistical analysis was carried out by the Statistical Package for the Social Sciences (SPSS; IBM Corp, Armonk, NY, US) 18 software.

Artificial neural network (ANN)

A multilayer perceptron ANN model with three layers was performed by RapidMiner software (RapidMiner, Inc., MA, US). The three layers consisted of an input layer, a hidden layer, and an output layer. The input layer contained 12 neurons corresponding to 12 input features; the hidden layer applied learning algorithm to the input features. Finally, the output layer had only one neuron, representing two possible diagnosis states of cancerous or noncancerous. The values of the output layer were 0 and 1, which were categorized as the cancerous and healthy control, respectively. Initially, the ANN was trained using the 12 genes with the highest scores as the input. Finally, to evaluate the ANN model, the area under the receiver operating characteristic (AUROC), the classification accuracy, and an index about reliability were calculated. The ROC curve is the plot of sensitivity (true positive rate) against 1-specificity (false positive rate), which was created by the GenEx version 6 software (MultiD Analyses, Göteborg,
Sweden) for all of the 12 inputs to recognize their differences separately and to select the best gene as a diagnostic biomarker. In this study, The Decision Tree and the Support Vector Machine (SVM) algorithms were also used to examine the ANN algorithm.


The first step of the process was selecting the significant genes. According to the t-test, the top 12 genes were identified as significant (p-value<0.001); they had the most differences between the two groups of healthy individuals and patients. These genes were selected from our dataset for further analyses (Table 1).

Probe ID Gene symbol Species Gene name
200666_s_at DNAJB1 Homo sapiens DnaJ heat shock protein family (Hsp40) member B1
200627_at PTGES3 Homo sapiens Prostaglandin E Synthase 3
200664_s_at DNAJB1 Homo sapiens DnaJ heat shock protein family (Hsp40) member B1
200701_at NPC2 Homo sapiens NPC intracellular cholesterol transporter 2
200675_at CD81 Homo sapiens CD81 molecule
200028_s_at STARD7 Homo sapiens StAR related lipid transfer domain 7
200634_at PFN1 Homo sapiens Profilin 1
200709_at FKBP1A Homo sapiens FK506 binding protein 1A
200022_at RPL18 Homo sapiens Ribosomal Protein L18
200696_s_at GSN Homo sapiens Gelsolin
200657_at SLC25A5 Homo sapiens Solute Carrier Family 25 Member 5
200650_s_at LDHA Homo sapiens Lactate Dehydrogenase A

Then, to compare the cancerous patients with healthy groups, 12 features or genes were considered. Three algorithms were initially used to compare these two groups: the ANN, The Support Vector Machine, and The Decision Tree. Then, the algorithm with a better outcome was selected for the purpose of diagnosis. The results of this test are presented in Table 2. According to this analysis, it could be understood that the algorithm of the ANN could better distinguish the differences between the two groups with an accuracy of 99% (AUC=0.991, CA=0.969, F1=0.969); therefore, for the main analysis, this algorithm had been selected [21].

Method AUC CA F1 Precision Recall
SVM 0.985 0.952 0.953 0.955 0.952
Random Forest 0.969 0.936 0.936 0.936 0.936
Neural network 0.991 0.969 0.969 0.970 0.969

In the next step, the training process of the created neural network was performed for the purpose of diagnosis. According to Table 3, it is clear that the algorithm of the neural network has been able to correctly distinguish the two groups of patients from healthy with 98% accuracy based on the expression of those 12 genes. The area under the ROC curve was measured to estimate the diagnostic performance of the ANN. Table 3 shows the results of cross-validation using the ANN algorithm.

Method AUC CA F1 Precision Recall
Neural network 0.991 0.981 0.980 0.981 0.981

Then, in order to identify the best gene in the 12 selected genes as a diagnostic biomarker, the ROC curve analysis and plots for all of the 12 genes were calculated and determined (Table 4, Figure 1). According to Table 4, the Gelsolin (GSN) with AUC = 0.971, specificity = 0.902, and sensitivity = 1, could be a better diagnostic biomarker for the diagnosis of the CLL (Figure 2).

Probe ID Gene symbol Specificity Sensitivity AUC
200022_at RPL18 0.756097561 1 0.911308204
200028_s_at STARD7 0.818181818 0.926829268 0.88691796
200627_at PTGES3 0.727272727 0.951219512 0.840354767
200650_s_at LDHA 0.731707317 1 0.922394678
200657_at SLC25A5 0.804878049 1 0.953436807
200664_s_at DNAJB1 0.902439024 0.909090909 0.931263858
200666_s_at DNAJB1 0.853658537 1 0.922394678
200675_at CD81 0.951219512 0.909090909 0.911308204
200701_at NPC2 0.902439024 0.909090909 0.940133038
200709_at FKBP1A 0.829268293 1 0.968957871
200696_s_at GSN 0.902439024 1 0.971175166
200634_at PFN1 0.926829268 0.818181818 0.89578714


The diagnosis of the CLL in the early stages may lead to an increase in the survival rate and might demonstrate better therapeutic results. While taking a biopsy from the bone marrow still remains the golden standard for screening CLL, the approach has several shortcomings including its invasiveness and the patient’s discomfort. Contrarily, the non-invasive tests, such as the blood test, has low sensitivity and specificity [22-23]. The previous studies were used to indicate that the genes were potential candidates for the diagnosis of cancer in its early stages. We used the ANN analysis to provide more improvements and advancements in order to pinpoint cancer more accurately, in its early stages, and to further evaluate this computer technique potential in the assay of detecting cancer since the examination of the expression levels of a panel of genes could be used to classify patients into cancerous and healthy individuals.

Recently, the application of the ANN has found its way into medical fields. Although the ANN training algorithms vary, they share one basic function: all networks accept a set of inputs and, based on their hidden layer algorithm, generate corresponding outputs. The ANN is particularly practical for medical cases without a linear solution. In the presented study, an ANN model provided good predictive accuracy as a diagnostic biomarker for a precise classification. It was hypothesized that not only could this technique increase the accuracy of the CLL diagnostic tests, but it could also be applied to several other types of cancer diagnoses. In this paper, we made the effort to obtain sample data from the GEO database in order to classify them as cancerous and healthy individuals. Twelve genes, as listed in Table 1, were shown to be appropriate for the accurate diagnosis of CLL using the ANN algorithm. The accuracy of the detection of cancer by the assembled ANN was analyzed by the ROC analysis. The outputs of the trained ANN for testing data were used to plot the ROC curve, and the area under the ROC curve was 0.991. These results demonstrated the high performance of the ANN training in the diagnosis of CLL according to its gene expression and the good learning process of the ANN, therefore, a panel of genes could be used for the ANN algorithm to detect CLL.

Based on the ANN results and ROC curve, it could be stated that GSN has a potential diagnostic biomarker value. GSN plays many roles in various types of cancer. Gelsolin is a ubiquitous actin filament-severing protein, one of the most important members of the actin-severing superfamily, and plays a crucial role in the regulation of actin filament assembly and disassembly. Additionally, it has an important responsibility in many other cellular properties, such as carcinogenesis phenotypes, epithelial-mesenchymal transition (EMT), motility, apoptosis, proliferation, and differentiation [24-26]. GSN overexpression has been seen in many cancers, including breast cancer, oral carcinoma cells, colorectal cancer, ovarian cancer, and leukemia [27-28].


In conclusion, by collaborating the t-test and the ANN, it is possible to identify a minimum and an optimum number of gene biomarkers for the classification of healthy and CLL individuals. Based on the gene expression values, a trained ANN model accurately classified the sample data into the cancerous and non-cancerous categories. As a result, it was shown that the learning technique of the ANN could accurately differentiate cancerous samples from the non-cancerous. It could also choose the potential biomarker gene in a more time-efficient manner, which could result in better diagnosis and better treatment.


  1. Rozman C, Montserrat E: Chronic lymphocytic leukemia. N Engl J Med. 1995, 333:1052-1057. 10.1056/NEJM199510193331606
  2. Chiorazzi N, Rai KR, Ferrarini M: Chronic lymphocytic leukemia. N Engl J Med. 2005, 352:804-815. 10.1056/NEJMra041720
  3. Keating MJ: Chronic lymphocytic leukemia. Semin Oncol. 1999, 26:107-114.
  4. Moghadam MH, Movafagh A, Omrani M, et al.: Identification of homogeneously staining regions in leukemia patients. J Res Med Sci. 2013, 18:363.
  5. Nigam R, Jain R, Malik R, Banseria N, Ahirwar R: An association of hepatitis virus infection with rare hemopoietic malignancies. J Evol Med Dent Sci. 2013, 2:8360-8365.
  6. Rassy EE, Chebly A, Korban R, et al.: Untreated chronic lymphocytic leukemia in Lebanese patients: an observational study using standard karyotyping and FISH. Int J Hematol Oncol. 2017, 6:105-111. 10.2217/ijh-2017-0019
  7. Hallek M, Cheson BD, Catovsky D, et al.: Guidelines for the diagnosis and treatment of chronic lymphocytic leukemia: a report from the International Workshop on Chronic Lymphocytic Leukemia updating the National Cancer Institute-Working Group 1996 guidelines. Blood. 2008, 111:5446-5456. 10.1182/blood-2007-06-093906
  8. Calin GA, Liu C-G, Sevignani C, et al.: MicroRNA profiling reveals distinct signatures in B cell chronic lymphocytic leukemias. Proc Natl Acad Sci USA. 2004, 101:11755-11760. 10.1073/pnas.0404432101
  9. Calin GA, Dumitru CD, Shimizu M, et al.: Frequent deletions and down-regulation of micro-RNA genes miR15 and miR16 at 13q14 in chronic lymphocytic leukemia. Proc Natl Acad Sci USA. 2002, 99:15524-15529. 10.1073/pnas.242606799
  10. Hamblin TJ, Davis Z, Gardiner A, Oscier DG, Stevenson FK: Unmutated Ig VH genes are associated with a more aggressive form of chronic lymphocytic leukemia. Blood. 1999, 94:1848-1854.
  11. Staudt LM: Molecular diagnosis of the hematologic cancers. N Engl J Med. 2003, 348:1777-1785. 10.1056/NEJMra020067
  12. Binet J, Auquier A, Dighiero G, et al.: A new prognostic classification of chronic lymphocytic leukemia derived from a multivariate survival analysis. Cancer. 1981, 48:198-206. 10.1002/1097-0142(19810701)48:1<198::AID-CNCR2820480131>3.0.CO;2-V
  13. Schena M, Shalon D, Davis RW, Brown PO: Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science. 1995, 270:467-470. 10.1126/science.270.5235.467
  14. Wang Y, Tetko IV, Hall MA, Frank E, Faciusa A, Mayera KFX, Mewesac HW: Gene selection from microarray data for cancer classification—a machine learning approach. Comput Biol Chem. 2005, 29:37-46. 10.1016/j.compbiolchem.2004.11.001
  15. Khan J, Wei JS, Ringner M, et al.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med. 2001, 7:673-679. 10.1038/89044
  16. Cho S-B, Won H-H: Machine learning in DNA microarray analysis for cancer classification. Proceedings of the First Asia-Pacific Bioinformatics Conference on Bioinformatics 2003. Australian Computer Society, Inc, Sydney; 2003. 19:189-198.
  17. Djavan B, Remzi M, Zlotta A, Seitz C, Snow P, Marberger M: Novel artificial neural network for early detection of prostate cancer. J Clin Oncol. 2002, 20:921-929. 10.1200/JCO.2002.20.4.921
  18. Burke HB, Goodman PH, Rosen DB, et al.: Artificial neural networks improve the accuracy of cancer survival prediction. Cancer. 1997, 79:857-862. 10.1002/(SICI)1097-0142(19970215)79:4<857::AID-CNCR24>3.0.CO;2-Y
  19. Statnikov A, Wang L, Aliferis CF: A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification. BMC Bioinformatics. 2008, 9:319. 10.1186/1471-2105-9-319
  20. Afshar S, Afshar S, Warden E, Manochehri H, Saidijam M: Application of artificial neural network in miRNA biomarker selection and precise diagnosis of colorectal cancer [Epub]. Iran Biomed J. 2018,
  21. Afshar S, Abdolrahmani F, Vakili Tanha F, Zohdi Seif M, Taheri K: Recognition and prediction of leukemia with artificial neural network (ANN). Med J Islam Repub Iran. 2011, 25:35-39.
  22. Afshar S, Abdolrahmani F, Tanha FV, Seaf MZ, Taheri K: Quick and reliable diagnosis of stomach cancer by artificial neural network. WSEAS. 2009, 30-35.
  23. Wu Y, Giger ML, Doi K, Vyborny CJ, Schmidt RA, Metz CE: Artificial neural networks in mammography: application to decision making in the diagnosis of breast cancer. Radiology. 1993, 187:81-87. 10.1148/radiology.187.1.8451441
  24. Movafagh A, Hajifathali A, Zamani M: Secondary chromosomal abnormalities of de novo acute myeloid leukemia: a first report from the Middle East. Asian Pac J Cancer Prev. 2011, 12:2991-2994.
  25. Ghaedi H, Bastami M, Zare-Abdollahi D, et al.: Bioinformatics prioritization of SNPs perturbing microRNA regulation of hematological malignancy-implicated genes. Genomics. 2015, 106:360-366. 10.1016/j.ygeno.2015.10.004
  26. McLaughlin P, Gooch J, Mannherz H-G, Weeds A: Structure of gelsolin segment 1-actin complex and the mechanism of filament severing. Nature. 1993, 364:685-692. 10.1038/364685a0
  27. Janmey PA, Stossel TP: Modulation of gelsolin function by phosphatidylinositol 4, 5-bisphosphate. Nature. 1987, 325:362-364. 10.1038/325362a0
  28. Sun HQ, Yamamoto M, Mejillano M, Yin HL: Gelsolin, a multifunctional actin regulatory protein. J Biol Chem. 1999, 274:33179-33182. 10.1074/jbc.274.47.33179
Original article

Application of an Artificial Neural Network in the Diagnosis of Chronic Lymphocytic Leukemia

Author Information

Fateme Shaabanpour Aghamaleki

Genetics, Shahid Beheshti University of Medical Sciences, Tehran, IRN

Behrouz Mollashahi

Genetics, Shahid Beheshti University of Medical Sciences, Tehran, IRN

Mokhtar Nosrati

Genetics, University of Isfahan, Isfahan, IRN

Afshin Moradi

Pathology, Shahid Beheshti University of Medical Science, Tehran, IRN

Mojgan Sheikhpour

Genetics, Pasteur Institute of Iran, Tehran, IRN

Abolfazl Movafagh Corresponding Author

Genetics, Shahid Beheshti University of Medical Sciences, Tehran, IRN

Ethics Statement and Conflict of Interest Disclosures

Human subjects: All authors have confirmed that this study did not involve human participants or tissue. Animal subjects: All authors have confirmed that this study did not involve animal subjects or tissue. Conflicts of interest: In compliance with the ICMJE uniform disclosure form, all authors declare the following: Payment/services info: All authors have declared that no financial support was received from any organization for the submitted work. Financial relationships: All authors have declared that they have no financial relationships at present or within the previous three years with any organizations that might have an interest in the submitted work. Other relationships: All authors have declared that there are no other relationships or activities that could appear to have influenced the submitted work.


We received great support and help from Dr. Saeid Afshar from the Hamadan University of Medical Sciences, who helped with the preparation of this article and the necessary analyses. We are most grateful to his kind efforts and cooperation. We would also like to express our gratitude to Mr. Sadegh Khorrami from the University of Isfahan, whose undeniable help made the preparation of this article possible.

Original article

Application of an Artificial Neural Network in the Diagnosis of Chronic Lymphocytic Leukemia

Figures etc.


Scholary Impact Quotient™ (SIQ™) is our unique post-publication peer review rating process. Learn more here.