How and Why Paediatric Weight Estimation Systems Fail - A Body Composition Study

Background Weight estimation during medical emergencies in children is essential, but fraught with errors if the wrong techniques are used, which may result in critical drug dosing errors. Individualised weight estimation is required to allow for accurate dosing in underweight and obese children in particular. This study was designed to evaluate the associations between weight estimations from different systems and body composition in order to establish how and why they may perform well or poorly. Methods A convenience sample of 332 children aged from one month to 16 years had weight estimations using four age-based formulas: the Broselow™ Pediatric Emergency Tape (Armstrong Medical Industries, Inc., Lincolnshire, IL), the Mercy Method, and the Pediatric Advanced Weight Prediction in the Emergency Room, Extra-large/Extra-long Tape (PAWPER XL) Tape. They also had an assessment of body composition using dual x-ray absorptiometry (DXA). The weight estimates were compared against total body weight (TBW), calculated ideal body weight (IBW), and DXA-measured fat-free mass (FFM). Analyses of associations between age, length, weight estimation outcomes, and body composition were performed. Results Age-based formulas were very inaccurate because of the erratic relationship between age and body composition. The Broselow tape estimated IBW well in obese children because of the strong relationship between length and fat-free mass. It predicted TBW poorly in underweight and obese children, however, because of the poor relationship between length and fat mass. The Mercy Method’s performance was unrelated to body composition, but estimated TBW reasonably well and could not predict IBW or FFM. The PAWPER XL Tape’s performance was the most closely associated with body composition and, therefore, achieved an acceptable accuracy for estimations of TBW, IBW, and FFM. Conclusions Of the systems evaluated, the PAWPER XL Tape has the best association with body composition and the most accurate estimations of TBW, IBW, and FFM.


Introduction
The ultimate purpose of the weight estimation systems used during the management of medical emergencies in children is to enable the administration of accurate doses of potentially life-saving drugs [1]. If the weight estimation is inaccurate, or the methodology is prone to error when used during emergency situations, then the child is at risk of medical error [2]. An understanding of how and why these systems can fail would provide insight into which methodologies should be preferred and how they should be used within the context of the overall medical management of the critically ill or injured child. It would also provide understanding into how they could be improved. Malfunction could be as a result of inaccuracy of the estimation technique (producing a high incidence of critical errors), a failure to use the technique correctly (e.g., the Broselow™ Pediatric Emergency Tape used the wrong way round or calculation errors for formulas), a failure to function well in specific subgroups (e.g., underweight and overweight children), or a child falling outside of the weight estimation system's parameters (e.g., children "too tall for the Broselow Pediatric Emergency Tape") [3][4].
Some of the most commonly used weight estimation methods include formulas based on age, the Broselow Pediatric Emergency Tape, and newer dual length and habitus-based systems, such as the Mercy Method and the Pediatric Advanced Weight Prediction in the Emergency Room, Extra-large/Extra-long (PAWPER XL) Tape. Some of these methodologies, such as agebased formulas, have already been shown to be inaccurate in previous studies, but the underlying causes have not been well-studied [5][6]. Other techniques, such as the PAWPER XL Tape, have been shown to be accurate in some studies [7][8][9] but could still potentially be improved, especially as other studies have shown inconsistent accuracy in obese populations [10][11], primarily as a result of inaccurate habitus assessment.
The degree to which weight estimation systems can be considered to be inadequate depends on the standards by which they are judged. With the increasing prevalence of obesity in children, in both high-income as well as low-and middle-income countries, and a high prevalence of underweight children in low-and middle-income countries, weight estimation systems should ideally be able to provide sufficient information to allow correct dosing in children with both normal and extremes of body composition [4]. Although there is still some controversy about the correct dose scalar to use in obese children, the broad consensus is that, while total body weight (TBW) is still used for many drugs, ideal body weight (IBW) is required for the safe dosing of others [12]. This is important because there is some evidence that drug dosing errors might contribute to poorer outcomes in obese children suffering from cardiac arrest [13]. Therefore, while drug dose determination during emergencies may be difficult, a high standard of accuracy is required to ensure patient safety. For underweight children, an accurate estimation of TBW is essential, as IBW may be significantly higher than TBW [4]. Therefore, the ideal weight estimation system needs to be able to predict both TBW and IBW accurately, and an individualised plan should be employed for weight estimation in each child, dependent on their body composition or habitus.
The primary objectives of this study were to evaluate the vulnerabilities of selected age-based formulas, the Broselow Pediatric Emergency Tape, the Mercy Method, and the PAWPER XL Tape with regards to the prediction of an appropriate weight descriptor for drug dose calculations and to identify how variations in body composition could influence the accuracy of weight estimation.
A convenience sample of 332 children from one month to 16 years of age who presented to the Emergency Department (but did not require emergency treatment) were enrolled between October and December 2015. Exclusion criteria included failure to obtain consent and the inability to obtain critical measurements.
After basic demographic data were obtained, each child was changed into a hospital gown for the subsequent measurements. Anthropometric measurements of length, mid-arm circumference (MAC) and humerus length were obtained with the child in a supine position (to simulate emergency treatment conditions). Measured weight was obtained using a Tanita SC-240 Body Composition Analyser, following which whole-body dual x-ray absorptiometry (DXA) measurements of body composition were acquired using a Hologic Discovery A Densitometer (software version 12.6). All data were collected by one of the researchers (MW or LG).
TBW was estimated with the Broselow tape [14] and PAWPER XL tape [8] as well as the Mercy Method [15], the Advanced Paediatric Life Support (APLS) formulas, Erker formulas [16], the European Paediatric Life Support (EPLS) formula, and the Best Guess formulas ( Table 1). A visual gestalt assessment of habitus was used to classify children for the Erker formulas into "thin," "normal," and "thick" categories [16]. Body mass index (BMI), BMI-for-age Z-scores, and an estimate of IBW (using the BMI50 method) were calculated using the Centers for Disease Control (CDC) growth charts [17].  Fat-free mass (FFM), fat-free mass index (FFMI), and fat mass index (FMI) were derived from the DXA data for each child, using proprietary paediatric formulas installed by the manufacturer.

Name Formula or method Restrictions
This data was evaluated using an analysis based on a modified Bland-Altman methodology. Each weight-estimation system was compared with TBW, IBW, and DXA-measured FFM using a percentage error analysis. Mean percentage error (MPE), 95% limits of agreement of percentage error, and the percentage of estimations falling within 10% (PW10) and 20% (PW20) of the weight descriptor were the three primary outcome measures used.
Subgroup analyses were performed in three weight categories (< 10 kg, 10 to 25 kg, and >25 kg) and three habitus categories based on BMI-for-age Z-scores ("thin" children Z-score ≤ 2.0, "fat" children Z-score ≥ 2.0, and normal-weight children in between). A subgroup analysis was also performed for those children falling outside the restrictions of the weight estimation systems. A PW10 of > 70% and a PW20 > 95% was considered to be an acceptable accuracy for a weight estimation method [2,18].
The associations between age, length, and body composition were evaluated using a graphical representation and Pearson correlation analysis, with and without logarithmic transformation, as appropriate.
Data from each of the weight estimation systems were plotted on a Hattori chart, according to the percentage error category, to represent the associations between body composition and accuracy of weight prediction [19]. A description of the Hattori chart analysis is shown in Figure 1. In order to evaluate the accuracy of the assignment of habitus score (HS) for the PAWPER XL, an "ideal" HS was determined for each child (the HS which would result in the best weight estimate at the child's length), which was compared to the original HS. For comparisons with IBW and FFM, the PAWPER XL HS3 and HS1 weights, respectively, were used.

Characteristics of study participants
A total of 332 children were included in the study. The basic demographic and body composition data is shown in Table 2. There were a substantial number of underweight children (15.3%) and overweight or obese children (22.3% and 10.2%, respectively). This allowed for the weight estimation systems to be tested over a spectrum of body habitus variations. Subgroup data for thin, normal, and fat children were based on categories derived using the BMI system described by Cole et al. [20]. Since body composition reference data have not been well established in children, and especially in younger children, most analyses in this study made use of pragmatic limits that might affect drug dosing decisions: children were considered to be significantly "fat" when their TBW was > 120% of IBW (which roughly corresponds to a Z-score of +2) as this would require the use of IBW as a drug dosing descriptor for some drugs. Likewise, children were considered significantly "thin" when their IBW was > 120% TBW (which approximates a Z-score of -2) as the use of IBW would result in a critical overdosing error. There was a clinically important number of children whose TBW and IBW differed by more than 20%.

Body mass index category
All Thinness

TABLE 2: Description of the Study Population: Demographic Information with Body Composition Data
The data is presented for the whole sample as well as for categories of habitus BF: body fat; BMI: body mass index; FFMI: fat-free mass index; FMI: fat mass index; HS: habitus score; IBW: ideal body weight; IQR: interquartile range; TBW: total body weight The performances of each of the weight-estimation systems against TBW, IBW, and DXAmeasured FFM are shown in Table 3. With regards to estimating TBW, the overall accuracy of the regular age-based formulas was very poor (PW10 ranging from 29

TABLE 3: Accuracy Outcomes of the Weight Estimation Systems Evaluated in This Study
The data is presented for the whole sample as well as by subgroups of weight, habitus, and two special categories (length > 145 cm and HS > 5). For the purposes of this evaluation, children were defined as "thin" if TBW was less than 90% of IBW, "fat" if TBW was greater than 120% of IBW, and "normal" for the remainder. The accuracy of the weight estimation systems was specifically evaluated in children with length > 145 cm as those children comprise the subgroup of children too tall for the Broselow tape. The subgroup of children with HS > 5 was important because this represents severely obese children. The PAWPER XL tape has a defined mechanism for predicting IBW and FFM (or lean body weight (LBW)). The HS3 weight is used to predict IBW and the HS1 weight is used to predict FFM. Since these dosing scalars are only intended to be used in obese children, the data were only calculated for the obese children in the sample. The PAWPER XL tape, the Mercy method, and the APLS formula were able to provide valid weight estimations for all children in the study sample, with the Broselow tape, the EPLS, Erker, and Luscombe formulas outside of their restrictions in 14.5%, 10.5%, 10.5%, and 22.9%, respectively. For the subgroup of children "too tall" for the Broselow tape (those with a length > 145 cm), only the Mercy method and the PAWPER XL tape maintained acceptable accuracy.
The relationships between age, length, and body composition components (including FFM and FM components) are shown in Figure 2 (Panels A-D). The relationship between age and weight (Panel A) was weaker with TBW than with IBW (r 2 = 0.74 and r 2 = 0.91 for TBW and IBW, respectively, p < 0.001). This difference was greatest in older children. Once the effect of height was removed from the age (Panel B) by expressing FFM and FM as indices (FFM/height 2 and FM/height 2 ), the very poor association between age and FFMI and FMI was exposed (r 2 = 0.26 and r 2 = 0.03, respectively, p < 0.05 and p = 0.8, respectively). Length and weight were strongly correlated (Panel C), although the relationship between TBW and length was weaker than between IBW and length (r 2 = 0.77 and 0.93 for TBW and IBW, respectively, p < 0.001). This association was reflected in the association between length and FFM and FM (Panel D). Length was strongly correlated with FFM (r 2 = 0.86, p < 0.001) but far less well with FM (r 2 = 0.48, p < 0.001). This relationship held true when evaluating length against logarithmic transformations of FFM and FM (r 2 = 0.96 and r 2 = 0.66, respectively, p < 0.001). In summary, the relationship between age and TBW and IBW was much weaker than that between length and weight descriptors. While the length and IBW and the length and FFM were strongly correlated, the relationship with TBW and FM was significantly weaker.  Figure 3 (Panels A to D) illustrates the associations between body composition and the accuracy of the age-based weight estimation systems in the Hattori chart format. The EPLS formula (Panel A) was the most accurate in children with a low BMI, with no discernible difference in discriminating between FMI or FFMI. In contrast to this, the APLS formula (Panel B) and the Best Guess formula (Panel C) showed a substantially greater overestimation of weight but could not differentiate between children with low and high BMI. The Erker method (Panel D) had a far greater accuracy at all values of BMI than the other formulas. It also, however, produced some large overestimations of weight, even in children of normal weight. In summary, the agebased formulas were all inaccurate but with different biases. The EPLS formula underestimated weight in higher-BMI children, while the APLS and Best Guess formulas overestimated the weight of low-BMI children. The Erker formula had less relationship with body composition but still failed to predict weight accurately.  While the PAWPER XL tape still showed some inaccuracies at the far extremes, especially of FMI, the Mercy method's performance was virtually independent of habitus, even at the extremes. In summary, these methods showed fewer critical errors in weight estimation than age-formulas. The Broselow tape showed errors in all children with higher or lower than average BMI, while the PAWPER XL tape showed the same errors only at the extremes of habitus. The performance of the Mercy method showed almost no relationship to body habitus or composition.

FIGURE 4: A Hattori chart of the study population showing outcomes of total body weight estimation by the length and length-and habitus-based methods
Panel A: outcomes of total body weight estimation by the Broselow tape; Panel B: outcomes of total body weight estimation by the PAWPER XL tape; Panel C: outcomes of total body weight estimation by the Mercy method Square markers represent an overestimation of weight and round markers represent an underestimation of weight. Markers with a green fill indicate a weight estimation accuracy of within 10% of actual weight; orange markers an accuracy of between 10% and 20% of actual weight; and red markers an error of greater than 20%. The medians for each error category are shown in black.  assigned to the study population. The data showed a good correlation between FMI and HS (r 2 = 0.70) and a lesser correlation with FFMI (r 2 = 0.43). It also shows a comparison between the actual HS assigned in this study and the "ideal" post hoc HS. In this analysis 51.2% of scores remained unchanged, 41.6% differed by one HS point and the remaining 7.2% differed by two points. In summary, habitus scores accurately represented differences in body composition, especially changes in FM. Ideal and actual habitus score assignments were similar except at extremes of obesity.
Panel A: the actual habitus score assignments. Orange markers represent HS1 and HS2; green markers represent HS3; yellow markers represent HS4 and red markers represent HS5 and above.

Age and length as predictors of TBW, IBW, FFM, and FM
The relationship between age and weight and length and weight is fundamental to the ability of these variables to predict weight. Length was clearly more closely associated than age with TBW, as has been shown previously [21], as well as with IBW and FFM. Once length was removed as a confounding variable, the effect size of the correlation between age and weight was seen to be small. In terms of length, the association with FFM was much closer than with FM, which further explains why one dimensional length-based systems (such as the Broselow tape) were able to predict IBW well but did not predict TBW well because of the potential large variations in FM at any given length [22]. Although length was demonstrably superior to age as a weight-estimation variable, some method of assessing habitus is clearly also required to account for differences in FM between children.
A note of caution: although FFM, lean body weight (LBW), and IBW are frequently used interchangeably as an appropriate scalar for drug dosing in obese children, they are not identical. IBW has no true biological validity but has been shown to be similar to FFM in older children [23]. The relationship between LBW and FFM is also unclear in children, but LBW is probably 5% to 10% higher than FFM and is the true "pharmacological scalar" that is desired for calculating doses for hydrophilic medications in obese children [24]. How this should best be translated into clinical practice is, as yet, undetermined.

Interpreting the Hattori chart analysis of weight estimation systems
For the unmodified age-based formulas, the EPLS formula was reasonably accurate only in children with an FFMI and FMI at the lower end ranges; the opposite was true of the Best Guess formula, with the APLS formula falling somewhat between the two. Differences between the age formulas were, essentially, whether the major errors were greater in lower BMI children (APLS and Best Guess formulas) or in higher BMI children (EPLS formula). The degree of weight estimation error for the formulas was only poorly associated with body composition, supporting the findings that age is much less predictive of body weight than length. The habitus-modified Erker formula, although more accurate than the other formulas, showed the most random association between body composition and accuracy. This strongly suggests that, despite a good theoretical basis for this system, it will not be able to achieve adequate accuracy.
The Broselow tape: the most accurate weight estimations were in the central zone (the "normal weight" child), with critical inaccuracies at increasing FFMI and FMI and decreasing FFMI. This was typical of what might be expected -a good relationship between length and weight but variations in FFM/FM not accounted for [22]. Most of the inaccurate estimations were underestimations (from higher than average FMI, and to a lesser extent, FFMI). When considering what the body composition analysis revealed in terms of how the various methods failed, it was clear that the age formulas and Broselow tape were unable to produce accurate weight estimations outside of their narrow area of calibration. Unless these methods are restricted to patient populations that fall within these narrow limitations, they should not be used if better methods are available.
The Mercy method: this method's performance showed almost no association with body composition. Almost all error categories had FFMI/FMI medians within the central region, with the exceptions of the few children with > 10% weight overestimation who were in the high FFMI/FMI sector. This pattern indicated that some other factor, unrelated to habitus, was the root cause of weight estimation error, such as measurement errors, regional aberrations in body composition, or unknown factors. While the Mercy method was able to accurately predict weight in children of all different body compositions, its failures bore no relation to extremes of body composition. This was, therefore, not a "calibration" error, which would make it difficult to identify specific vulnerabilities and devise methods to improve accuracy.
The PAWPER XL tape: this method showed poor accuracy only at extremes of FFMI and FMI. This was in contrast to the Broselow tape, which showed poor estimation at smaller deviations from the medians. This indicated that the PAWPER XL tape was better "calibrated" as it accounted for a greater degree of variation in body composition. The pattern of inaccuracies further indicated that the tape could potentially be improved with better weight estimation at extremes of body habitus: "fatness" (high FMI) was underestimated in the higher habitus scores, but "slimness" (low FFMI) was under-recognised in the lower habitus scores.

Age-based Formulas Failed Badly Because of Large Variability of Weight-for-age (FFM and FM)
In this study, the strength of the association between age and TBW or IBW was intermediate at best. This was because the association of age with body composition (FFMI and FMI) was very weak, which was evident once length was removed as a confounding variable. This strongly suggests that it is unlikely that age can ever be used to accurately estimate weight and will always be inferior to length-based systems [25]. The association of age with both FFMI and FMI was weak enough to suggest that neither TBW nor IBW would be able to be accurately predicted. Although the formulas were most accurate in the 10 to 25 kg category (as has previously been found), this accuracy never achieved acceptable levels (PW10 > 70% and PW20 > 95%) as the variability of FFMI and FMI remained consistent across the age range of the population [26]. No age formula has ever been shown to perform satisfactorily well in any previous study (best performances: PW10 45% to 55%) [2]. The new Erker formula, which is habitus-modified, failed to deliver on the potential showed in its theoretical development [16]. This error was predominantly one of modest overestimation of weight in children with normal habitus and modest underestimation of weight in fat children. There were, however, relatively fewer critical errors (estimation error > 20%) than with the other formulas. The biggest weakness with this system was that, at any age, children could have the same habitus but very different lengths (and, therefore, weights) that cannot be accounted for by this system. It is unlikely that further calibration will improve this system significantly.

The Broselow Tape Fails Because of Large Variability of FM for Length
Numerous studies from across the world have confirmed that the tape overestimates weight in underweight populations (often to a dangerous degree) and underestimates weight in populations with a high prevalence of obesity [2,26]. Although the relationship between length and weight was far stronger than that between age and weight, the relationship between length and FM was far more inconsistent and increasingly variable with increasing length. This variability was sufficient to account for the relatively poor performance of the tape. It also explains why the tape predicted TBW well in children of "normal" habitus, but poorly in all others. This is why researchers have successfully proven the possibility of using alternative methods of habitus assessment to improve the accuracy of the Broselow tape [27,28].
The present study found that the relationship between length and FFM was strong, which explained the excellent association between length and IBW for the Broselow tape. Importantly, however, this is only relevant for obese children (with respect to drug dosing for hydrophilic drugs), but the Broselow tape as yet has no validated mechanism for identifying obese children. This needs to be considered if the Broselow tape is used in overweight or obese children.

The Mercy Method Fails Because of Non-habitus-related Factors
Generally, the Mercy method predicted TBW well across the age and habitus spectrum but was weakest in infants and obese children. Critical errors were very uncommon. It predicted IBW poorly, especially in obese children in whom IBW would be required, but this was not surprising, as the method was not designed to predict IBW. The system also had no mechanism to identify obese children for whom an alternate weight-descriptor (such as IBW) might be required. The inaccurate estimations incurred by this method were not specifically attributable to habitus, although the poorest accuracy was in obese children (with a mixture of under-and overestimation of weight, however). Part of the explanation might be that, in this study, children were measured in the supine position (simulating how it would be used in medical emergencies), unlike previous studies, which might have led to less accurate measurements [29]. Other studies, not by the developers of the system, have also shown a substantially poorer accuracy than in the original studies [2,7,9]. Nonetheless, the Mercy method remains one of the most accurate systems available today, although not ideally suited for use in the Emergency Department.

The PAWPER XL Tape Fails Because of Incorrect Assignment of Habitus Scores
Generally, the PAWPER XL tape predicted TBW, IBW, and FFM extremely well but was weakest in infants (overestimating weight) and morbidly obese children (underestimating weight). The rate of critical estimation errors for all weight descriptors was the lowest amongst all the weight estimation systems. The errors of estimation appeared to be directly related to body habitus, which suggested that an improvement in the assessment of habitus might further improve the tape's "calibration" and, therefore, accuracy. Studies in obese populations in the United States have also demonstrated this vulnerability to inaccurate HS assignment [11]. The "ideal" HS assignments showed that the habitus estimations were frequently more moderate than the ideal scores and that it was theoretically possible to achieve virtually perfect weight estimation with ideal habitus assessment. However, it is unclear whether it is possible in practice to differentiate between the relatively minor differences in external indicators of body composition that might be required to achieve this accuracy. Validated reference figural images will likely be required to enable the habitus assessment to be standardised and generalised across different populations. This technique has been validated in preliminary studies for the PAWPER XL tape, but further research is needed [30].
The ability of the PAWPER XL tape to accurately estimate FFM and IBW, in addition to being able to identify the obese children for whom this would be required for dosing calculations, was an interesting finding which might prove useful in the future. This requires further research.

Conclusions
The association between predictive variables (age and length) and TBW, IBW, FM, and FFM showed the underlying biological limitations of using age or length alone to predict weight. None of the age-based formulas achieved a satisfactory degree of accuracy, including the age-based habitus-modified formulas of Erker. Differences between the age formulas were essentially whether the major errors were greater in lower-BMI children (APLS and Best Guess formulas) or in higher-BMI children (EPLS formula). The Broselow tape performed better than the age-formulas but it did not achieve satisfactory accuracy in estimating TBW. It was accurate in predicting IBW, however, and could be used for this purpose for drug dose calculations as long as TBW was known or estimated using another technique. The Mercy method (a dual length-and habitus-based method) demonstrated a higher accuracy in estimating TBW than the univariate methods, with fewer critical errors. Its performance was completely independent of body composition. This suggested that other patient factors or user errors might be critical determinants of its functioning accurately. It was unable to estimate IBW. The performance of the dual length-and habitus-based PAWPER XL tape was the best of all the systems. It was also the only system that had a specific mechanism to produce estimations of IBW (the HS3 weight) and FFM (the HS1 weight). The overall accuracy of estimation of TBW, IBW, and FFM were very good, even for children outside of the restrictions of other methods. Its weakest performance was in children with extreme habitus types, which was probably mostly due to errors or aberrations in the assessment of body habitus. This needs to be addressed in future research.

Additional Information Disclosures
Human subjects: Consent was obtained by all participants in this study. Human Research Ethics Committee, University of the Witwatersrand issued approval M120486. Approved unconditionally. Animal subjects: All authors have confirmed that this study did not involve animal subjects or tissue. Conflicts of interest: In compliance with the ICMJE uniform disclosure form, all authors declare the following: Payment/services info: All authors have declared that no financial support was received from any organization for the submitted work. Financial relationships: All authors have declared that they have no financial relationships at present or within the previous three years with any organizations that might have an interest in the submitted work. Other relationships: All authors have declared that there are no other relationships or activities that could appear to have influenced the submitted work.