Diagnostic Pitfalls of Digital Microscopy Versus Light Microscopy in Gastrointestinal Pathology: A Systematic Review

Digital microscopy (DM) is one of the cutting-edge advances in pathology, which entails improved efficiency, diagnostic advantages, and potential application in virtual diagnosis, particularly in the current era of the coronavirus disease (COVID-19) pandemic. However, the diagnostic challenges are the remaining concerns for its wider adoption by pathologists, and these concerns should be addressed in a specific subspecialty. We aim to identify the common diagnostic pitfalls of whole slide imaging (WSI), one modality of DM, in gastrointestinal (GI) pathology. From validating studies of primary diagnosis performance, we included 16 records with features on GI cases involved, at least two weeks wash-out periods, and more than 60 case study designs. A tailored quality appraisal assessment was utilized to evaluate the risks of bias for these diagnostic accuracy studies. Furthermore, due to the highly heterogeneous studies and unstandardized definition of discordance, we extract the discordant cases in GI pathology and calculate the discrepant rate, resulting from 0.5% to 64.28%. Targeting discrepancy cases between digital microscopy and light microscopy, we demonstrate five main diagnostic pitfalls regarding WSI as follows: additional time to review slides in WSI, hard to identify dysplasia nucleus, missed organisms like Helicobacter pylori (H. pylori), specific cell recognitions, and technical issues. After detailed reviews and analysis, we generate two essential suggestions for further GI cases signing out by DM. One is to use systematized 20x scans for diagnostic workouts and requesting 40x or even 60x scans for challenging cases; another is that a high-volume slides training should be set before the real clinical application of WSI for primary diagnosis, particularly in GI pathology.


Introduction And Background
Since the outbreak of coronavirus disease  across the world, the manner in which doctors practice medicine has significantly changed, especially the need for telemedicine through digital devices [1]. It has been observed that during the COVID-19 pandemic, about two-thirds of in-person doctor visits have been replaced by telehealth in the USA [2]. Additionally, other digital health technologies like wireless medical devices and software as medical devices are promoted and regulated by the Food and Drug Administration (FDA) for clinical applications [3]. The whole idea of digitalization is to promote efficiency, which attracts pioneers in the realm of pathology. Digital pathology (DP) is a newly developed technology that involves the scanning of traditional slides to create digital images and whole slide imaging (WSI) modality, which is the most widely adopted way for pathologists to diagnose, educate, and research [4]. The major advantages of utilizing digital microscopy (DM) are well identified as reduced risk of patients' slide damage, flexible sign-out mode, and better collaboration of pathologists in a challenging case [5].
As for primary diagnosis, the WSI system needs to be meticulously verified through validation studies conducted in their institutions. To date, only one type of digital pathology device has been approved by the FDA for primary diagnostic purposes [6]. A consensus guideline was developed by the College of American Pathologists Pathology and Laboratory Quality Center (CAP-PLQC) in 2013 to guide laboratories for this validation process, which highlights the following study methodology: at least 60 cases of sample inclusion, minimum two weeks wash-out period, and a real clinical application setting. Also, the intraobserver concordance rate, which is the agreement rate between two diagnoses by digital versus glass microscopy, should be established by the same previously trained pathologist [7]. A recently updated guideline, published in 2020, differs from the 2013 guideline in terms of collaborative organizations and revision processes. Most important of all, although not evidence-based, the new guideline suggests that pathologists should read slides in random order during the entire validation process [8].
Regardless of the guideline, several validation studies have been conducted over the past 10 years, which favorably report the diagnostic concordance rate from 87% to 98.3% [9]. However, the satisfying results of agreement between digital microscopy and glass microscopy do not eliminate all concerns. In the latest systematic review and meta-analysis by Azam et al., a total of 546 major disagreement cases were identified from 10,410 pathology samples across 25 validation/comparative studies [10]. About half of these discordances are related to evaluating dysplasia, nuclear atypia, or malignancy grading. The next most common reasons for this disagreement are challenging diagnostic cases and finding out small objects [10]. Besides the discordance cases, an inherited factor that hampers the diagnostic ability of WSI is the inability to evaluate structures that need polarization (e.g., amyloid and monosodium urate crystals) [11].
Despite the previous work, no reviewer has addressed the diagnostic pitfalls in a specific subspeciality, and some of the review studies were not conducted based on the CAP-PLQC guidelines. The present review aims to assess validation studies of WSI for primary gastrointestinal (GI) pathology diagnosis to identify the challenges and pitfalls in discordant cases between digital and glass slides.

Review Protocol and Question Identification
This present review is designed and conducted following the guidelines by the Preferred Reporting Items for Systematic Review and Meta-Analyses (PRISMA) [12]. The review question identified is "what are the common encountered diagnostic difficulties in gastrointestinal pathology that pathologists are most concerned about when utilizing the WSI mode of digital slides compared with glass slides?" We believe the best answer to this question is to study discordance cases among high-quality validation studies.

Literature Review
To avoid duplicated work and study, the leading researcher did a comprehensive literature review to see if any ongoing, registered, or completed study under the same topic is available. To date, a systematic review called "The diagnostic concordance of whole slide imaging and light microscopy: a systematic review" was registered and published under the protocol CRD42015017859 [13]. It is designed to evaluate an overall concordance rate in digital pathology. Another systematic review was published with the title "The performance of digital microscopy for primary diagnosis in human pathology: a systematic review" under the protocol CRD42018085593, which primarily shows and analyzes the disagreements between digital slides and glass slides [9]. This high-quality review work was published to determine the universal diagnostic concordant and discordant cases in digital pathology [9]. Also, Williams et al. did a comprehensive work on identifying the discrepancy causes in digital microscopy [14]. In cytopathology, Girolami et al. carried out a study on the diagnostic performance and limitations, which features the scanning time and technical challenges as two main obstacles in WSI [15]. Although all of these previously published studies were conducted according to the CAP-PLQC guideline, none of these address the discordant reasons when applying digital microscopy in a specific subspecialty. Due to the well-known variations in different pathology subspecialties in terms of diagnostic protocol, our review is the first study to navigate the diagnostic challenges and the diagnostic discordances of digital microscopy in GI pathology. Besides, we only consider including reliable studies rested on the CAP-PLQC guideline and the updated version in 2021.

Search Strategy
A search of the literature was conducted across the databases: PubMed platform (National Center for Biotechnology Information, US National Library of Medicine, Maryland, USA), Scopus (Elsevier, Amsterdam, The Netherlands), and Embase (Elsevier, Amsterdam, The Netherlands). The search strategy used by the primary researcher is as follows: digital pathology OR whole slide imag* OR virtual microscopy OR digital microscopy OR digital slides OR virtual slides OR telepathology OR telemicroscopy OR digital imag* AND light microscop* OR conventional microscop* OR traditional microscop* OR glass slides OR optical microscop* AND (validation OR validate stud*). Also, preliminary filters were used like full text and validation study, in the last 10 years, humans, and English to narrow down the study pool. Because of the strict need to design a validation study on digital pathology, we deem it unnecessary to search for grey articles to eliminate bias manually.

Demonstrate the Eligibility Criteria
The CAP-PLQC created a highly reliable consensus guideline in 2013, and its updated version was also available in 2021 [7,8]. The Grade A evidence as recommendations is the backbone in our study inclusion and exclusion criteria ( Table 1). First, all the validation studies should include trained pathologists, and a complete set of WSI systems is required for primary diagnosis. Second, each study sample set should at least be 60 cases, and a wash-out period of more than two weeks is highly recommended [7,8]. Finally, intraobserver concordance should be established, which pathologists should compare the same pathology specimen reading in digital and glass microscopy rather than taking consensus diagnosis as the standards

TABLE 1: Inclusion and exclusion criteria
These criteria were based on the 2013 guideline posted by the College of American Pathologists Pathology and Laboratory Quality Center and its updated version [7,8].

Article Screening and Assess for Eligibility
Independently, two reviewers evaluated the records initially by title or abstract to exclude blatantly unqualified articles. After the process of screening, we further assess the credit of full-text articles by applying the algorithm in Figure 1. The inclusion and exclusion criteria we mentioned above are paralleled with the algorithm to evaluate the eligibility of screened records.

Study design
Single/multiple centers, onsite or remote sign-out, retrospective, prospective, or cross-sectional, numbers of pathologists, received training before the study, and included sample number

Technical setting
Scanner use, staining involved, and scanning magnification

Validation methodology
Blinding process, potential bias, wash-out period, with or without a known diagnosis, and clinical information availability

Case distribution
Esophagus, stomach, small intestine, liver, bile duct, pancreas, and large intestine

Discordant result
Discordant rate, reasons for discordance, minor or major discordance, and hazard clinical outcomes

Quality Assessment With Tailored QUADAS-2
The Quality Assessment for Diagnostic Accuracy Studies 2 (QUADAS-2) was utilized to assess the quality of each included study [17]. Our selected studies focused on human GI pathology specimens as samples to compare the digital microscopy (index test) with traditional microscopy (reference standard test) in terms of primary clinical diagnosis. Therefore, we tailored the original QUADAS-2 tool by adding additional signaling questions and excluding signaling questions that do not apply to the present review. Generally, patient selection, index test, reference standard, and flow and timing were allocated as the leading four domains, in which several signaling questions were contained by divided into risk of bias and applicability. We constructed clear instructions for each signaling question and, in each domain, we classified the evaluation category into unclear risk, low risk, and high risk. Not following the updated version of CAP-PLQC guidelines, we believe pathologists who review both the index test and reference standard test in nonrandom order still have a low risk of bias in domain two. Other than that, more than two negative answers to the signaling questions in each domain will be tagged as high risk of bias. For unclear risk of bias, studies in which essential details are not mentioned or omitted will be classified into this part.

PRISMA Flowchart
A total of 1,245 articles were identified by the search strategy we designed. After duplications removal, we reviewed the title or abstract to apply the initial screening process, which only yields 79 studies. Precisely, articles were excluded for any of the following reasons: no full-text articles (n = 10), irrelevant study (n = 962), artificial intelligence (n = 7), and not validation or comparative original study (n = 75). Subsequently, 79 full-text articles were retrieved for further screening, among which fulfill all the requirements were included. They were excluded for the following reasons: not for primary clinical diagnosis (n = 19), no gastrointestinal cases included (n = 37), insufficient sample cases (n = 1), veterinary study (n = 1), no intraobserver concordance established (n = 3), and insufficient wash-out period (n = 1). In the end, we included 16 articles, and all of them are validation studies to evaluate the diagnostic performance of the WSI system. The article selection flow diagram based on Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) is demonstrated in Figure 2.

Quality Appraisal of Studies
In summary, due to lack of information, three (18%) studies cannot be assessed for the risk of bias in patient selection, and one study designed by Borowsky et al. features on free cases deferral if the pathologists believe, in real clinical settings, future consultation and additional information are needed [18]. For the risk of bias in the index test part, three (18%) studies included pathology residents and not adequately trained pathologists involved in slides sign out; on top of that, the majority of the studies (65%) do not review digital slides and glass sides randomly. One study did not assess all the selected pathology specimens and was deemed high risk in flow and timing [19]. In our review purpose, we believed the risk of bias in domains of applicability concerns is satisfactory, except studies only focused on the pancreatic, liver, and large intestine pathology specimens [20][21][22]. Overall, the results are shown in Table 3, and two reviewers conducted the quality appraisal process.

Characteristics of Digital Microscopy Validation Studies of GI Pathology
Because the WSI system is listed as the highest risk of medical device (class III) under regulation, we deem it necessary to conduct this review for a better clinical application of digital microscopy, especially in the subspecialty of GI pathology [35]. As shown in Tables 4, 5, validation or comparative studies of digital microscopy with glass microscopy were conducted from 2012 to the present. Due to the meticulous criteria of the methodology for primary diagnosis recommended by the College of American Pathologists [7], it is noted that most of the studies were designed in the USA or Europe. At least two experienced pathologists, an average of 14, partake in those studies, and the included GI sample number also varies. The majority of the investigators use a 20x scanning magnification for conventional study slides, and 40x scans, mixed case by case. Author

Discrepant Rate and Case Distribution of Digital Microscopy
The overall concordance rate of all specialty cases is satisfying from 79.0% to 99.4%; however, the definition of discordance cases was highly heterogenous by each institution, which shows a lowest 0.5% to highest 64.28%. Therefore, we demonstrate those data as GI sample discrepancy rate, which means that different readings were observed between WSI and glass slides. In addition, 15 out of 523 and 10 out of 523 discrepancies entail significant clinical consequences reported by Borowsky et al. and Mukhopadhyay et al., separately [18,28]. The distribution of discrepant cases under the perspective of DM is shown in Figure 3. The stomach and large intestine are the two prime places where disagreements occur, and in parallel with this, the minor difference in grading of the nucleus is also well-reported across those studies as a leading reason.

Diagnostic Pitfalls -Difficulties in Diagnostic Efficiency
By addressing those discordances, we find out that in seven (46.7%) studies, pathologists reported more time consumed in WSI compared with light microscopy (LM). Larghi et al. reported an average of 24 seconds lag between the two slide-view methods [20]. Moreover, a mean of 54 seconds longer of digital pathology was showed by Thrall et al. [33]. This time lag of around one minute could be a significant setback for finishing daily sign-out cases in a large hospital. Despite not being formally recorded, larger resection pathology specimens potentially take more time for reviewing digital slides [27]. For institutions with a large number of pathology samples, this finding could decrease the efficiency and lead to further drawbacks like physician burn-out or financial shortage, which finally affect the diagnostic accuracy. Therefore, the challenge of additional time consuming may render the further implementation of WSI into the daily GI pathology signout process.
However, we should not ignore another well-reported phenomenon called the "learning effect," which means the low efficiency in WSI can be overcome by learning more and become experienced [27]. Mills et al. identified that the reading time between digital slides and glass slides is negligible after gaining the reading slide experience of 500 cases [27]. Additionally, lower scanner magnification decreases the slide reading time, but some minor features or bacteria can also be overlooked [31,36]. van der Post et al. point out that image reviewer and laboratory infrastructure software also play critical roles in the diagnostic efficiency of digital pathology [22]. In another part, when it comes to quantitative evaluation of cell numbers due to its clinical relevance, the WSI system excels traditional glass slides in diagnostic efficiency by saving time for manual counting [27].
Even though several studies show a significant challenge of additional time for slide reading, this issue can be well-addressed by making mandatory cases training for pathologists, tailoring the scanner magnification to each case, and standardizing the laboratory infrastructures and protocols regarding the WSI system. Combined with the hidden gain of quantity evaluation, diagnostic efficiency should not be a major concern for adopting digital pathology.

Diagnostic Pitfalls -Difficulties in Evaluation of Nuclear Features
It is suggested by Samuelson et al. that pitfall may occur when grading dysplasia, which relates to evaluating the chromatin details [30]. Specifically, in his study, relative hyperchromasia was unsuccessfully presented in the digital slides for two GI discrepant cases, which fails to diagnose real tubular adenoma [30]. Interestingly, Snead et al. show that two dysplasia cases reported in digital microscopy were missing in light microscopy, which may associate with the darker nuclei seen in DP and observed in Barrett's esophagus by Dr. D Treanor through personal communication [31]. After detailed analysis of the discordant GI cases in Tabata's study, higher grading of adenoma, intraepithelial neoplasia, and carcinoma rather than adenoma are majority cases of disagreement [32]. In fact, due to more mitotic activity of the malignant or high-grade tumor, the nucleus of which is often darker than low grade or benign tumor. Although no direct evidence exists, this over-grading pitfall is potentially correlated with the tendency of darker nuclei in the WSI system mentioned by Snead et al. [31]. It is further proved in a study by Villa et al. that all three pathologists believe they experienced upgrading of dysplasia lesions in DP [34]. One possible explanation is that more weight was given to cellular architecture changes in final diagnosis due to the analytical methodology of reading digital slides compared to LM [27,37]. A solitary case of misdiagnosis of poorly differentiated adenocarcinoma to inflammation was also reported in second diagnostic settings [22]. Among other subspecialty pathology cases, this diagnostic challenge of evaluating nuclear features prevails in 57% of discordance reasons of WSI system in a comprehensive systematic review [38]. Besides, in a routine surgical pathology WSI study, it is suggested that higher magnification of scanner may be a significant factor in helping visualize detailed nuclear features, and further studies are needed to identify the correlation of discordant GI tumor grading with a low magnification of digital microscopy [37]. Therefore, when signing out of GI cases, the pathology department and laboratories should be well aware of this predisposition of darker nucleus change and request higher scanner magnification before utilizing WSI for primary diagnosis.

Diagnostic Pitfalls -Difficulties in Identification of Microorganisms
To date, the most adopted scanner magnification is 20x across those validation studies, which ensues with less scanning time and storage [29]. Nevertheless, according to a study by Al-Janabi et al., Helicobacter pylori, Candida albicans, and Giardia duodenalis are three microorganisms frequently challenging to diagnose under this magnification, and an additional 40x will give them more confidence to identify [23]. Furthermore, another of his study focused on pediatric pathology showed that one missed candidiasis in a small intestine sample under 20x magnification [24]. Low magnification means fewer fine details presented to readers in traditional glass slides, but in most cases, the WSI system can generate a high-quality image in 20x magnification for diagnostic tasks [33]. There are two potential setbacks of resolution of 40x: one is more scanning time and storage, as talked about before, and the other is considered an inherent issue that image clarity is worse at scanning magnification above 20x [29,33]. Thus, in Borowsky et al.'s study design, they request 40x magnification for three slides only to identify H. pylori [18]. Subsequently, another study also indicates that the identification of H. pylori is still challengeable under 20x magnification, and even the tissue was treated with immunohistochemistry (IHC) [37]. More radically, Snead et al. reported that H. pylori could only be identified in the display of 60x, which suggests setting this magnification as default for H. pylori gastritis evaluation [31]. Based on the findings reviewed, we believe the best solution to tackle this challenge efficiently is to make GI pathologists informed and educated; when using WSI to read slides looking for microorganisms, a higher than 20x magnification is necessary, but in other cases, this upgradation is not advisable.

Diagnostic Pitfalls -Difficulties in Specific Cell Identification
There are specific cells with diagnostic power in GI pathology, such as neutrophils in gastric mucosa indicative of acute gastritis and eosinophils in the context of eosinophilic esophagitis. Therefore, identification of those indicative cells is crucial to an accurate diagnosis. It is seen in the study by Arnold et al. that the refractile nature of eosinophilic granules in the cytoplasm gives rise to the challenge to recognize them in digital slides [25]. It is suggested that the color change of scanned slides in WSI compared to glass slides plays a critical role, which could be addressed in future studies by implementing color calibration tools on computer monitors [25]. Additionally, it is reported by Thrall et al. that the small inflammatory cells like neutrophils cannot be identified well under 20x magnification, which is further testified in a study by Bauer and Slaw, which shows better recognition of neutrophils in inflammatory lesions in 40x scans [33,36].

Diagnostic Pitfalls -Difficulties in Technical Settings
Some pathologists expressed the unfamiliarity of the computer mouse to navigate slides is somehow a problem in the diagnostic settings, which is an entirely different experience than traditional microscopes [23]. It takes time for pathologists to learn and become natural in utilizing the WSI system. Moreover, in identifying important chromatic patterns of some instances, basic WSI systems do not hold the ability to capture multiple planes to evaluate the entire thickness of samples, and a function of the Z-stacking feature is often required [18]. Although not related to the present review purpose, similar findings of Z-stacking requirement are also prominent in thick cytologic smears for selective focusing of sample reading [38]. Another critical difficulty encountered by Loughrey et al. is underexposure of image, which makes it impossible counting of intraepithelial lymphocytes in colonic mucosa and hard to differentiate dysplasia [26]. These technical pitfalls can be prevented and tailored case by case through comprehensive quality appraisal of the whole digital diagnostic settings ahead.

Limitation
One limitation of our study is the lack of information in some studies, which intervenes the capability to identify discrepant cases and further analyzes the reasons for diagnostic challenges. Another worth mentioning point is, partially due to the unclear laboratory settings, we do not assess the limitations of the WSI system scanners across studies and they are reported to be a major factor in the diagnostic efficiency and accuracy, especially in studies before 2014 [39].

Conclusions
Digital pathology is the frontier innovation for the discipline of pathology, of which the benefits and limitations are well-studied in a holistic manner. However, the challenges and potential diagnostic pitfalls that pathologists and researchers seek should be tailored to a particular organ system (e.g., GI pathology).
Our study exclusively focused on the process of identifying and analyzing the common difficulties in the discordance cases of WSI with light microscopy among GI pathology. When it comes to the real validating of WSI for primary diagnosis, it is suggested that the significant pitfalls are additional reading time, inclination to hyper-grading of atypia nucleus, missed diagnosis of microorganisms like H. pylori, under-identified of granulocytes, and minor technical limitations.
Therefore, in the samples of GI pathology, we recommend pathologists to use the standardized 20x scan for routine diagnostic workouts and request 40x or even 60x scanning for evaluating microorganisms, granulocytes, and challengeable nuclear dysplasia. Besides, implement high volume digital slides sign-out training for pathologists as the general requirement before clinical application of digital pathology and turn to the Z-stacking feature of scanners for better-focused readings in case of thick specimens.

Conflicts of interest:
In compliance with the ICMJE uniform disclosure form, all authors declare the following: Payment/services info: All authors have declared that no financial support was received from any organization for the submitted work. Financial relationships: All authors have declared that they have no financial relationships at present or within the previous three years with any organizations that might have an interest in the submitted work. Other relationships: All authors have declared that there are no other relationships or activities that could appear to have influenced the submitted work.