Ultrasound Classification of Thyroid Nodules: A Systematic Review

Ultrasound (US) based classification systems exist for the stratification of thyroid nodules based on the risk for malignancy. This systematic review aimed to assess the evidence for the performance of US-based thyroid nodule classification systems through correlation with fine needle aspiration biopsy (FNAB). PubMed and Scopus were searched using keywords that included ‘ultrasound classification’, ‘thyroid nodules’, ‘fine needle aspiration’, and ‘malignancy’. Inclusion criteria were as follows: studies/reviews reporting on US imaging for the classification of thyroid nodules. Exclusion criteria were as follows: no comparison between US imaging findings and histology reports based on FNAB, no full English text available/accessible. The database searches identified 66 publications. After evaluation, 12 studies met the inclusion criteria. Two US-based classification systems for thyroid nodules were assessed: the Thyroid Imaging Reporting and Data System (TIRADS) and the American Thyroid Association (ATA) guidelines. For TIRADS, the sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) ranged from 70.6% to 97.4%, 29.3% to 90.4%, 23.3% to 64.3%, and 87.1% to 99.0%, respectively. The median sensitivity, specificity, PPV, and NPV for TIRADS was 90.0%, 57.4%, 49.0%, and 91.0%, respectively. One study comparing TIRADS with the ATA guidelines demonstrated that TIRADS was superior in terms of sensitivity, whereas the ATA guidelines were superior in terms of specificity and PPV. The high sensitivity and NPV of the US-based TIRADS classification system have excellent utility for correctly classifying nodules as positive for malignant disease and for predicting the absence of malignant disease. The paucity of studies assessing the ATA guidelines highlights avenues for further research comparing TIRADS with other systems of thyroid nodule classification.


Introduction And Background
High-resolution ultrasound (US) is the gold standard test for the identification of thyroid nodules. Despite the high prevalence of thyroid nodules (up to 12% of adults), the incidence of thyroid cancer is relatively low (3.2 per 100,000) [1]. Although the majority of thyroid nodules are asymptomatic, it is recommended that all patients with palpable nodules undergo US imaging to determine whether the nodule requires a fine needle aspiration biopsy (FNAB), US follow-up, or no further evaluation. However, because of a lack of correlation between clinical symptoms and malignancy, the American Association of Clinical Endocrinologists recommends that all nodules smaller than 10 mm or any suspicious nodules on US imaging should be investigated further using FNAB. This recommendation is based on studies that established that prognosis is inversely related to nodule size [2,3].
Recently, guidelines have been developed in order to permit US imaging to be used for the identification and stratification of nodules based on the risk of malignancy [4][5][6][7][8][9][10]. These guidelines include the Thyroid Imaging Reporting and Data System (TIRADS), which was developed and validated based on existing multi-institutional data and expert opinion [11]. The risk stratification of thyroid nodules not only serves to identify patients that require FNAB but also reduces the unnecessary risk and cost associated with performing invasive procedures, such as FNAB, in patients with low-risk nodules that require either US follow-up or no further investigation. Therefore, the decision to perform FNAB should be based on the risk of malignancy rather than the size of the nodule per se.
Currently, multiple systems are used worldwide for the risk stratification of thyroid nodule features on US scanning. Many of these systems use complex algorithms based on several US imaging features, which may be difficult to use depending on the experience of the individual performing the US scan. The aim of this study was to review the current evidence for US classification systems of thyroid nodules and their correlation to subsequent FNAB findings, with a view of providing suggestions for avenues for future research.

Searches
Recommendations from the Preferred Reporting Items for Systematic Reviews and Meta-Analyses were incorporated into this review [12]. Keywords included 'ultrasound classification', 'thyroid nodules', 'fine needle aspiration', and 'malignancy'. Electronic searches were performed on PubMed and Scopus databases for English language studies published between September 2012 and September 2017. The search term used in each database were as follows: PubMed (ultrasound classification AND thyroid nodules AND fine needle aspiration AND malignancy); Scopus (TITLE-ABS-KEY [ultrasound classification AND thyroid nodules AND fine needle aspiration AND malignancy]). No filters for journal, study design, or subject were applied to the search, although conference proceedings and abstracts were excluded.

Study Selection
Two authors (Rakesh Mistry and Thushanth Sooriyamoorthy) independently assessed all studies from both searches against the eligibility criteria. Studies were included that identified the use of US imaging for the classification of thyroid nodules. The following studies were excluded in this study: studies with no full English text available/accessible and studies with no US-based system of classification of thyroid nodules and/or no attempt to correlate US findings with histology based on FNAB.

Data Extraction
Two authors (R.M. and T.S.) independently extracted data from the included studies into a selfdesigned template, referring to the Cochrane Handbook for Systematic Reviews of Interventions as a guide [13]. Study information and clinical characteristics were extracted from all studies (where reported), including the performance (sensitivity, specificity, positive predictive value [PPV], and negative predictive value [NPV]) of US imaging for the classification of thyroid nodules. Multiple authors (Christopher Hillyar, Anjan Nibber, R.M., and T.S.) evaluated the extracted data for accuracy and agreement.

Data Analysis
Not all studies reported all variables. Items that were not reported or were unclear were not included in the analysis. Data were analyzed using Microsoft Excel. Two authors (C.H. and A.N.) independently conducted data analysis.

Performance of Ultrasound Imaging for the Classification of Thyroid Nodules
Of the eight studies using TIRADS, six reported performance parameters for US-based classification of thyroid nodules (e.g. sensitivity, specificity, PPV, and NPV) [14,16,17,20,22,23].
Of the two studies reporting on the ATA guidelines, only one included US imaging performance parameters [20]. No publications reported US imaging performance parameters with the Bethesda system. Table 2 summarizes the performance of US imaging for the classification of thyroid nodules using the TIRADS system.  The sensitivity, specificity, PPV, and NPV ranged from 70.6% to 97.4%, 29.3% to 90.4%, 23.3% to 64.3%, and 87.1% to 99.0%, respectively. The median sensitivity reported for TIRADS was 90.0%, with three studies reporting a sensitivity for TIRADS of 90.0% or above [16,20,23].

Ultrasound performance parameter Participants (nodules) Median (%) Range (%) References
Although one study reported that the sensitivity of TIRADS was 70.6%, this retrospective study assessed a small cohort of only 100 participants [22]. In contrast, analysis of the sensitivity of TIRADS from larger studies gave sensitives of 97.0%-97.4%, when a total of 5,273 nodules (from 4,162 participants) were assessed [16,20]. Due to a wide range of reported results, the median specificity (57.5%) for TIRADS was considerably poorer than the sensitivity. Only one large study assessing 3,980 nodules (from 2,921 participants) reported favorable results with a specificity of 90.0% [16], whereas two studies, including one assessing 1,293 nodules (from 1,241 participants), reported specificities of less than 50%, with the lowest specificity reported being 29.3% [14,20]. The median PPV (49.0%) for TIRADS was similarly poor, with two large studies including a total of 4,162 nodules (from 5,273 participants) reporting PPVs of 23.3%-40.0% [16,20]. Finally, the median NPV (91.0%) of TIRADS was excellent and was the most consistently reported performance parameter with the lowest range. Three of five studies reported NPVs of greater than 90.0% [16,20,22], with two of these being large studies including a total of 5,273 nodules (from 4,162 participants), which reported an NPV of 91.1%-98.1% [16,20].
In one study, a direct comparison was made between the TIRADS and the ATA guidelines [20]. Yoon  This study also reported that, unlike TIRADS, some nodules could not be classified using the ATA guidelines [20].

Conclusions
This study assessed the literature on US-based thyroid nodule classification systems, which demonstrated that TIRADS has utility at classifying thyroid nodules. The variability of the specificity of TIRADS, which was borne out by large studies assessing thousands of thyroid nodules, suggests that the performance of US, especially at classifying nodules as negative for disease, is highly dependent on the skill of the operator. In clinical practice, the poor PPV of TIRADS may be associated with an excess number of FNABs of benign nodules and represents a source of procedural risk, reduced cost-effectiveness, and unnecessary discomfort and concern for the patient. Although mild pain from FNAB can be controlled with paracetamol, future research should focus on quantifying the pain and stress encountered by patients undergoing FNAB for thyroid nodules. The favorable NPV of TIRADS may offset the impact of the PPV and help reduce the number of unnecessary FNABs of benign thyroid nodules. The paucity of studies assessing the ATA guidelines makes any comparison with TIRADS a tentative assessment at best and represents a significant opportunity for further research. Thus, research directed at improving the TIRADS system using powered studies with large patient populations is required to compare TIRADS with other classification systems (ATA guidelines/Bethesda) in order to demonstrate superiority. This may be used to inform and update the British Thyroid Association (BTA) guidelines, an area of particular importance in the UK.

Conflicts of interest:
In compliance with the ICMJE uniform disclosure form, all authors declare the following: Payment/services info: All authors have declared that no financial support was received from any organization for the submitted work. Financial relationships: All authors have declared that they have no financial relationships at present or within the previous three years with any organizations that might have an interest in the submitted work. Other relationships: All authors have declared that there are no other relationships or activities that could appear to have influenced the submitted work.