Accuracy of Ultrasonography vs. Elastography in Patients With Non-alcoholic Fatty Liver Disease: A Systematic Review

Ultrasonography and elastography are the most widely used imaging modalities for diagnosing non-alcoholic fatty liver disease. This study aimed to assess and compare the diagnostic accuracy in patients with non-alcoholic fatty liver disease/non-alcoholic steatohepatitis. This systematic review was based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. A systematic search was done for the past seven years using Pubmed, Pubmed Central, Cochrane, and Google Scholar databases on Jun 29, 2022. Studies were included based on the following predefined criteria: observational studies, randomized controlled trial (RCT), comparative studies, studies using liver biopsy or MRI proton density fat fraction (MRI PDFF) as a reference standard, ultrasonography, and elastography with measures of their diagnostic accuracy like sensitivity (SN), specificity (SP), area under the receiver operating characteristic (AUROC) curve, and English language. The data were extracted on a predefined template. The final twelve eligible studies were assessed using the quality assessment of diagnostic accuracy tool (QUADS-2). Most studies focused on elastography techniques, and the remaining focused on quantitative ultrasonography methods like the controlled attenuation parameter (CAP) and attenuation coefficient (AC). Only one study was available for the evaluation of qualitative ultrasonography. MRI was generally found superior to other diagnostic tests for determining liver stiffness through magnetic resonance elastography (MRE) and steatosis through MRI PDFF. Data assessing the comparative diagnostic accuracy of the two tests were inconclusive.


Introduction And Background
The prevalence of non-communicable diseases (NCDs) has sharply risen in recent decades, and metabolic syndrome has primarily been at the forefront. The high-calorie, low-fiber food consumption, sedentary lifestyle, and increasing use of automated machines in our day-to-day lives have significantly contributed to it [1]. The complications of NCD are widely varied, and one which mainly targets the liver is non-alcoholic fatty liver disease (NAFLD).
NAFLD comprises a spectrum of liver diseases specifically seen in non-alcoholic patients, characterized by histological changes ranging from simple hepatic steatosis (fatty liver) to more progressive steatosis, ballooning, and inflammation (non-alcoholic steatohepatitis (NASH)) [2]. The global prevalence of NAFLD is 25% [3]. Only 20-30% of cases with NAFLD progress to NASH [4]. NASH/NAFLD is associated with an increased risk of developing cirrhosis, cardiovascular diseases, and cancer [3,5]. Most patients with NAFLD remain asymptomatic until irreversible damage already occurs in the liver. Hence diagnosing the disease is paramount in delaying progression and preventing complications. The definitive diagnostic method for NASH remains liver biopsy, but its limitations include invasiveness, sampling, and complications [6]. So research focused on non-invasive diagnostic modalities has led to the emergence of newer imaging techniques, including elastography, controlled attenuation parameters, serum biomarkers like cytokeratin, aminotransferases, and scoring systems [7]. However, ultrasonography remains the most widely used diagnostic modality owing to its low cost and availability worldwide, particularly in developing nations [8]. The poor inter-observer agreement and its highly subjective nature are some of the limitations of conventional ultrasound [9]. The emergence of newer techniques like elastography for measuring liver stiffness has provided a much-needed alternative non-invasive method for assessing liver fibrosis [10]. None of the studies in the current literature compare the diagnostic accuracy of these imaging techniques for a complex disease like NAFLD/NASH. This systematic review aims to address this gap and assess the accuracy of ultrasonography and elastography in diagnosing patients with NAFLD/NASH.

Review Methods
This systematic review was conducted according to The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 guidelines [11].

Eligibility criteria
The studies which fulfilled the following criteria were included: 1) randomized controlled trials, clinical trials, observational studies, meta-analysis, traditional reviews, systematic reviews, comparative studies, and technical reports published between 2013-2022; 2) articles in the English language; 3) free full-text articles; 4) studies that included adults (age > 18 years); 5) diagnosed with NASH or NAFLD either histologically or clinically; 6) ultrasonography or elastography for diagnosis of NASH; 7) data regarding sensitivity, specificity, or area under the receiver operating characteristic (AUROC) curve was available, and 8) reference standard test was either liver biopsy or magnetic resonance proton density fat fraction (MRI PDFF). The studies which fulfilled the following criteria were excluded: 1) editorial, observational study veterinary, retracted publication; 2) articles before 2013, and also those with only abstract; 3) articles in a language other than English; 4) patients < 18 years old; 5) patients diagnosed with cirrhosis due to causes other than NAFLD; and 6) studies using any reference standard other than the above mentioned.

Search strategy
A systematic search was conducted by scouring the following databases: PubMed, Google Scholar, Cochrane, and PubMed Central. The last search date for all databases was on June 29, 2022. The keywords and the heading terms used were based on the previous literature and through Medical Subject Headings (Mesh), depending on the database used, as seen in Table 1   All the references collected from the search strategy were arranged alphabetically using Microsoft Excel 2019. The duplicates were first removed, and the remaining articles were further reviewed through titles and abstracts to exclude the irrelevant ones. It was followed by screening full-text articles to narrow down the included studies according to the eligibility criteria.

Risk of bias
The final articles which remained after the screening process were assessed for the risk of bias using a quality assessment tool: Quality Assessment of Diagnostic Accuracy Studies (QUADAS 2) [12]. The signalling questions and the risk of bias in each domain were assessed and the responses were marked as yes, no, or unclear. The details of the QUADAS 2 tool are given in Table 2.

Domain 1 -patient selection
A: Risk of bias 1) Was a consecutive or random sample of patients enrolled?
2) Was a case-control design avoided?
3) Did the study avoid inappropriate exclusions?
Could the selection of patients have introduced bias?
B: Concerns regarding applicability 1) Is there concern that the included patients do not match the review question?

Domain 2 -Index test
A: Risk of bias 1) Were the index test results interpreted without knowledge of the results of the reference standard?
2) If a threshold was used, was it pre-specified?
Could the conduct or interpretation of the index test have introduced bias?
B: Concerns regarding applicability 1) Is there concern that the index test, its conduct, or interpretation differ from the review question?

Domain 3-Reference standard
A: Risk of bias 1) Is the reference standard likely to correctly classify the target condition?
2) Were the reference standard results interpreted without knowledge of the results of the index test?
Could the reference standard, its conduct, or its interpretation have introduced bias?
B: Concerns regarding applicability 1) Is there concern that the target condition as defined by the reference standard does not match the review question?
Domain 4-Flow and timing 1) Was there an appropriate interval between index test(s) and reference standard?
2) Did all patients receive a reference standard?
3) Did patients receive the same reference standard?

4) Were all patients included in the analysis?
Could the patient flow have introduced bias?

Data extraction and assessment
The duplicates were first removed from the studies collected. The studies were further filtered out by a screening process of titles and abstracts by two reviewers independently. The same reviewers also did the quality assessment of the studies, and in cases of discrepancies, a third reviewer helped to reach a consensus. Information regarding the author, study design, population characteristics, index, and reference tests were extracted from the studies and formulated in a table. The parameters of diagnostic accuracy, including sensitivity (SN), specificity (SP), positive predictive value (PPV) and negative predictive value (NPV), and AUROC curve of the included studies, were recorded and tabulated. Information regarding the cut-off points for the relevant tests and the sample population at various stages of steatosis and fibrosis were also included. Meta-analysis was not done due to the clinical heterogeneity in the included studies and the few studies identified for individual tests. Hence this systematic review presents the outcome, applications, and limitations of the included studies in the form of a narrative synthesis.

Study Selection and Quality Assessment
The search of the databases yielded a total of 1,579 potentially relevant articles. After the removal of inaccessible articles and duplicates, 1,386 articles remained. These articles were first screened by titles and abstracts to filter out the irrelevant ones by following the eligibility criteria, which led to the exclusion of 1,045 articles. The remaining articles were screened by full text to include only those that fully satisfied the inclusion criteria, leading to twelve studies. These were assessed for quality analysis. The study selection process and screening are given in the form of a flowchart in Figure 1.

PMC-PubMed Central
The quality assessment of the final studies was done using QUADAS 2 tool, which mainly evaluates the studies on four key domains: patient selection, index test, reference standard, and flow and timing. The risk of bias and applicability were assessed in these domains, and the results are presented in Table 3. 2022    One study used a bariatric population as the study cohort, and another involved participants from the geriatric age group as the study cohort. The study characteristics, including the population characteristics, index, and reference tests, are given in Table 4.   The diagnostic accuracy parameters of the qualitative and quantitative ultrasonography tests evaluated in the studies, including sensitivity, specificity, and AUROC, are given in Table 5.   The diagnostic accuracy parameters of the different elastography techniques are given in

Discussion
Due to the progressive nature of the NAFLD, it is of dire importance to diagnose early and prevent complications like cirrhosis, cancer, and cardiovascular disease. Of the different diagnostic modalities involved, imaging techniques are widely used, particularly ultrasonography and, in recent times, elastography. This systematic review aimed to compare the accuracy of ultrasonography and elastography by assessing the parameters of diagnostic test accuracy like sensitivity, specificity, AUROC, PPV, and NPV.
The conventional ultrasound helps estimate hepatic steatosis by assessing sure ultrasonographic signs like the brightness of the liver, visualization of intrahepatic vessels, and diaphragm [25]. A study conducted in a cohort of 72 patients of the geriatric age group found ultrasonography to have a sensitivity of 96%, specificity of 94%, and a positive predictive value of 98% to detect hepatic steatosis compared to MRS [14]. The combined score, in particular, showed higher accuracy when compared to the individual ultrasound criteria [14]. In a recent meta-analysis of twelve studies involving 2,921 participants, conventional ultrasonography's overall sensitivity and specificity for detecting ≥5% histologically defined HS were 82% and 80%, respectively [26]. Though the subjective assessment using ultrasonography has a low reproducibility [27], it can still be used for detecting moderate to higher steatosis grades due to its noninvasiveness, low radiation, affordability, and widespread use.
The advent of quantitative ultrasonographic methods like CAP and AC has helped to assess hepatic steatosis and fibrosis by measuring the ultrasound attenuation rate [28]. Beyer et al. found CAP to have higher accuracy in detecting lower levels of fat (steatosis ≥ 1) compared to higher levels (steatosis ≥ 2 and ≥ 3) [17]. It was also found that CAP could be confounded by body size, particularly in obese people. The study was limited by the participants' different geographical locations and eligibility criteria as the data was pooled from two independent studies [17]. A meta-analysis of nine studies, including 1,297 patients, showed that CAP had low accuracy for detecting severe grades of steatosis but had better performance for the S1 and S2 [29]. Though CAP has limitations, particularly in obese people and in higher grades of steatosis, and it needs more standardized cut-offs [ 25], it could be further improved upon and validated in further studies and represents a viable alternative non-invasive method for diagnosing hepatic steatosis.
The difference in which the steatotic and normal liver attenuates acoustic waves is how the AC measures are quantified [28]. Ogino et al. found a positive correlation r = 0.81, P < 0.01 between the AC values and LFC% (liver fat content%), and also good diagnostic accuracy scores for steatosis [18]. Nevertheless, with the progression of fibrosis, AC values were found to be decreasing. The results from this study were limited by its small sample size [18].
A study by Qu et al. evaluated the diagnostic performance of ultrasound attenuation parameter (UAP) and liver stiffness measurement (LSM) using Fibrotouch [19]. The diagnostic accuracy of FibroTouch (Kerry Medical Limited, Hong Kong, China) in the study was found to be higher in quantifying liver fat as the algorithm reduces the effect of subcutaneous fat on CAP computation. There was also a positive correlation between the LSM and degree of fibrosis and the UAP and steatosis [19]. For NAFLD, the area under the curve (AUC) values were found to be lower; hence, further validation studies may be needed [19].
Taibbi et al. showed that the diagnostic performances of shear wave elastography (SWE) 10, SWE 5, and SWE 3, compared with TE, had no significant difference in both significant and advanced fibrosis. However, it was higher for SWE 5 and SWE 10 [15]. Sharpton et al. found the diagnostic accuracy for detecting fibrosis lower for 2D SWE compared to VCTE, and this finding was particularly highlighted in those with higher BMI [20]. In a meta-analysis of nine studies with pSWE and 11 studies with TE, the diagnostic accuracy of both to detect advanced fibrosis and cirrhosis showed AUC of 0.94 and 0.95 for ≥F3 and =F4 (pSWE), and 0.92 and 0.94 for ≥F3 and =F4 (TE) [30]. When comparing SWE to MRE for staging fibrosis in patients with NAFLD, Zhang et al. found MRE to have more accuracy for earlier (≥ 1 and ≥ 2) fibrosis stages [22]. The AUC values of SWE were found to be lower for fibrosis staging here, and the study was limited by its small sample size and the cohort distribution of milder NAFLD [22]. Ali et al. found TE to be better at detecting fibrosis stage ≥2 when using an LSM cut-off value of 12.8 Kpa. He also found the diagnostic accuracy to increase when hemoglobin A1C (HbA1c) and alkaline phosphatase (ALP) were added to the LSM [24].
Park et al. showed MRE to have superior diagnostic accuracy compared to TE in diagnosing any stages of fibrosis except cirrhosis [13]. The negative predictive value was high for TE in diagnosing fibrosis (stages 2-4), (stages 3-4), and cirrhosis [13]. While the well-characterized prospective cohort served as its strength, this study was limited by its cross-sectional design and the median time interval of 107 days [13].
For detecting stage four fibrosis, Imajo et al. found MRE to be better than 2D SWE and VCTE, while the difference was less in stages ≥1, ≥2, and ≥3 [23]. The factors in the study which played a significant role in the discordance between 2D SWE, VCTE, and histopathology findings include skin capsule distance (SCD), sex, and interquartile range of liver stiffness to the median (IQR-median) [23]. There were no such factors affecting MRE. The interobserver and intraobserver repeatability was found to be excellent for MRI compared to 2D SWE and VCTE [23]. Tang A et al. found a high reproducibility when comparing the MRE liver stiffness from two centers with intraclass correlation coefficient (ICC) of ≥ 0.941, pairwise biases of ≤ 0.11 kPa, and reproducibility coefficients (RDCs) of ≤ 22.8% [21]. The study's limitations are the uneven distribution of liver fibrosis among the patient cohort and the inclusion of only one experienced analyst in the two academic centers [21]. In a meta-analysis of twelve studies including 910 patients, MRE was found to have diagnostic accuracy with AUROC of 0.89, 0.93, 0.93, and 0.95 for stages F ≥ 1, F ≥ 2, F ≥ 3, and F ≥ 4 [31]. MRE's high diagnostic accuracy and reproducibility in classifying liver fibrosis have provided an almost comparable method to liver biopsy. Nevertheless, the high cost and accessibility limit its widespread application.

Limitations
This review included studies limited to the English language from mainly three databases from 2013-2021. Grey literature was not included. There was also heterogeneity in the included studies. The studies varied regarding the study population, index tests used, their cut-offs, and the reference standard. An analysis of the diagnostic accuracy could not be done due to the few studies included for the individual index test. Only one study evaluating qualitative ultrasound was included in the study. The degree of fibrosis in the study cohorts is also variable, which can affect the applicability of the results. Hence this review recommends prospective and cross-sectional studies with larger sample sizes and the same reference standard and index tests in the same population.

Conclusions
In patients with NAFLD/NASH, MRI was found to be overall superior compared to other tests in terms of MRI PDFF for detecting steatosis or MRE for liver stiffness. However, its widespread application is limited by the high cost and accessibility. While quantitative ultrasonographic parameters have greatly improved the accuracy for detecting steatosis, and TE and pSWE are moderately effective in diagnosing fibrosis, there is insufficient data to arrive at a definite conclusion. Hence this review recommends the need for larger prospective or cross-sectional studies with the same reference standard and index test along with standardized cut-offs to improve the results' generalizability.

Conflicts of interest:
In compliance with the ICMJE uniform disclosure form, all authors declare the following: Payment/services info: All authors have declared that no financial support was received from any organization for the submitted work. Financial relationships: All authors have declared that they have no financial relationships at present or within the previous three years with any organizations that might have an interest in the submitted work. Other relationships: All authors have declared that there are no other relationships or activities that could appear to have influenced the submitted work.