Patient-Reported Outcomes Measurement Information System Validation in Hip Arthroscopy: A Shift Towards Reducing Survey Burden

Background The Patient-Reported Outcomes Measurement Information System (PROMIS) was developed to provide measures of patient-reported symptoms and healthcare outcomes across a variety of conditions in an easily accessible manner. The purpose of this study was to validate PROMIS against traditional legacy measures in patients undergoing hip arthroscopy for femoral acetabular impingement (FAI). Methodology Outcome measures collected pre- and post-operatively included PROMIS Pain Interference (PI) and Physical Function (PF), modified Harris Hip Score (mHHS), Hip Outcome Score-Activities of Daily Living and Sport (HOS-ADL and HOS-Sport), Nonarthritic Hip Score (NAHS), and Visual Analog Scale (VAS). Pearson’s correlation coefficients were calculated between each outcome measure. Results Strong correlations were observed between the PROMIS PF T-Score and the mHHS (r = 0.64-0.83, p < 0.0001), HOS-ADL (r = 0.54-0.81, p < 0.0001), HOS-Sport (r = 0.55-0.74, p < 0.0001), and NAHS (r = 0.61-0.78, p < 0.0001) measurement tools. PROMIS Computer Adaptive Testing PI T-Score and VAS also demonstrated a strong correlation (r = 0.64-0.80, p < 0.0001). Conclusions PROMIS PF scores correlate strongly with mHHS, HOS-ADL, HOS-Sport, and NAHS scores at all time points. Likewise, PROMIS PI scores correlate strongly with VAS pain scores. On average, patients completing PROMIS need to fill out only four or five questions. This study supports the use of PROMIS as an efficient, valid outcome tool for patients with FAI undergoing hip arthroscopy.


Introduction
The definition of value in the United States healthcare system has recently undergone a paradigm shift with increasing focus placed on patient-reported outcomes (PROs) [1,2]. PROs offer a quantitative analysis of patients' perceptions of their own health, function, and quality of life. Clinicians, researchers, administrators, and policy-makers are interested in PROs as an aid in determining which interventions provide value. However, many PROs are inconsistently utilized across healthcare systems as various types of measures may have variable applicability to certain conditions. For example, a patient undergoing a total hip arthroplasty in their later years will likely have lower functional demands than a young athlete undergoing hip arthroscopy for femoral acetabular impingement (FAI). As a result, some measures may be susceptible to floor and ceiling effects when applied to different conditions [3][4][5][6][7][8].
In an effort to improve and standardize PROs, the National Institute of Health developed the Patient-Reported Outcome Measurement Information System (PROMIS). To improve the ease, efficiency, and accuracy of data collection, Computer Adaptive Testing (CAT) was developed as a delivery system for PROMIS [9]. The CAT testing method addresses the inherent limitations of PROs collection through traditional long and short forms. Compared to other collection methods, CAT has demonstrated reduced completion times and has shown to limit floor and ceiling effects [8,10]. PROMIS CAT is used routinely in many healthcare systems, and its applicability has been validated for many disease processes and clinical scenarios. To validate PROMIS in hip arthroscopy, it must be correlated with known legacy measures. The Hip Outcome Score-Activities of Daily Living and Sport subscales (HOS-ADL and HOS-Sport), Nonarthritic Hip Score (NAHS), modified Harris Hip Score (mHHS), and Visual Analog Scale (VAS) have served as traditional legacy measures in this patient population. There have been other studies of small sample sizes that compared PROMIS scores in FAI syndrome to mHHS, international hip outcome tool-33, and the Hip Disability and Osteoarthritis Outcome Score [11][12][13][14]. The purpose of this study was to validate PROMIS against traditional legacy measures in patients undergoing hip arthroscopy for FAI. Further, we aimed to establish if floor or ceiling effects exist for PROMIS, determine the average number of questions required to complete PROMIS, and establish minimal clinically important difference (MCID) values in this setting.

Materials And Methods
Institutional review board approval was obtained for this study (STU00205020). A power analysis was performed and an estimated sample size of 62 patients was determined to be necessary to demonstrate a significant correlation (r > 0.4) with an alpha level of 0.05 at 90% power. Ultimately, we elected to enroll 100 patients to account for potential loss to follow-up throughout the study. All English-speaking patients aged 18-60 presenting for hip arthroscopy for a diagnosis of FAI were approached for inclusion in the study. Exclusion criteria included patients unable to consent, minors, and those undergoing revision surgery. Age, sex, date of surgery, type of surgery, and diagnosis associated with the surgical encounter were recorded. The enrollment period occurred from April 2017 to July 2018.
Legacy measures for hip arthroscopy patients included the mHHS, HOS-ADL, HOS-Sport, NAHS, and VAS pain scores. These measures, in addition to PROMIS-CAT Physical Function (PF) and Pain Interference (PI) domains, were collected electronically through the secure Research Electronic Data Capture portal prior to surgery and at two, six, and 12 weeks and six months post-operatively [15]. PROMIS and legacy scoring questionnaires have been described previously in patients undergoing operations in the hip, with higher PF scores indicating better function and lower PI scores indicating less pain [16].
Continuous variables were reported as mean ± standard deviation (SD) and compared using Student's t-test.
Correlations between PROMIS and legacy measures were reported as Spearman rank correlation coefficients. Floor and ceiling effects for each score were considered present if >15% of the respondents achieved either the lowest or highest score, respectively. MCID was calculated as half of the SD of the sample, as previously described [17]. Alpha level was set at p < 0.05. All data and statistical analyses were performed using JMP Pro (version 13.0, SAS, Cary, NC, USA).

Discussion
This study indicates PROMIS PI and PF scales demonstrate convergent validity and responsiveness with legacy measures for patients undergoing hip arthroscopy for FAI. Further, these scales distinguish clinical improvements in an efficient manner without notable floor or ceiling effects.
PROMIS CAT utilizes item response theory to minimize redundancy and maximize precision in PRO measurement collection by customizing future questions based on prior responses [18]. This eliminates the need for patients to complete all the questions to attain a valid test score unlike standard legacy measures. In this series, the PROMIS scores were obtained with less than five questions on average. This is compared to the mHHS, NAHS, and HOS measures which ask eight, 20, and 26 questions, respectively. Longer questionnaire length has been linked to decreased rate and quality of responses, though to our knowledge, has not been assessed for traditional hip legacy measures [19]. Although our study did not directly assess time to completion, other PROMIS studies have indicated reduced number of questions correlated with shorter time to completion [20,21].
Previously described hip legacy measures have demonstrated floor and ceiling effects in hip preservation surgery. Kemp et al. reported notable ceiling effects for both mHHS (24%) and HOS-ADL (16%) between 12 and 24 months after surgery [22]. PROMIS PI and PF were able to accurately assess physical function and pain status without notable floor or ceiling effects. This is consistent with other reports in the literature of pre-operative PROMIS values for FAI patients undergoing hip arthroscopy [14]. This indicates that the PROMIS item response question selection accurately and precisely quantifies patient pain and function in the FAI cohort. The efficiency of the PROMIS CAT system allows for data collection in a productive manner prior to a clinical encounter either at home or in a waiting room with immediate scoring that is available during a clinical encounter. Further, these scores can be utilized by physicians or other providers to quantify baseline functional or pain levels and change overtime in either operative or non-operative interventions. Additionally, patients can use this information to track their own personal health progress.
This study also sought to quantify MCID values using a distribution-based method, as described in previous orthopedic surgery studies [23,24] PROMIS MCID PF and PI values were 4.4 and -4.25 points, respectively. The mean improvement in each measure at the six-month time point from baseline was statistically significant (p < 0.0001) as well as clinically significant following the above MCID thresholds at PROMIS PF (9.3 ± 1), PROMIS PI (-8.9 ± 1).
Despite the benefits of PROMIS testing, our study also has several limitations. Even though adequate power for clinical correlation was obtained, our study is limited in sample size for more thorough subgroup analysis. As such, much of the baseline demographics and specifics of the surgical procedure were not included in this study. Further, while we are still collecting additional data, one and two years post-surgical intervention, we felt that the five data points would be more than sufficient for validation of the metric as well as determining the average number of questions needed to complete PROMIS metrics. We also report MCID values using a distribution-based method which has been criticized for its generalizability, though to date, there is no generally accepted method for calculating MCID. Lastly, our study included only adults and English-speaking patients and may not be generalizable to non-English speakers or patients under the age of