Deep Learning-Based Functional Independence Measure Score Prediction After Stroke in Kaifukuki (Convalescent) Rehabilitation Ward Annexed to Acute Care Hospital

Introduction Prediction models of functional independent measure (FIM) score after kaifukuki (convalescent) rehabilitation ward (KRW) are needed to decide the treatment strategies and save medical resources. Statistical models were reported, but their accuracies were not satisfactory. We made such prediction models using the deep learning (DL) framework, Prediction One (Sony Network Communications Inc., Tokyo, Japan). Methods Of the 559 consecutive stroke patients, 122 patients were transferred to our KRW. We divided our 122 patients’ data randomly into halves of training and validation datasets. Prediction One made three prediction models from the training dataset using (1) variables at the acute care ward admission, (2) those at the KRW admission, and (3) those combined (1) and (2). The models’ determination coefficients (R2), correlation coefficients (rs), and residuals were calculated using the validation dataset. Results Of the 122 patients, the median age was 71, length of stay (LOS) in acute care ward 23 (17-30) days, LOS in KRW 53 days, total FIM scores at the admission of KRW 85, those at discharge 108. The mean FIM gain and FIM efficiency were 19 and 0.417. All patients were discharged home. Model (1), (2), and (3)’s R2 were 0.794, 0.970, and 0.972. Their mean residuals between the predicted and actual total FIM scores were -1.56±24.6, -4.49±17.1, and -2.69±15.7. Conclusion Our FIM gain and efficiency were better than national averages of FIM gain 17.1 and FIM efficiency 0.187. We made DL-based total FIM score prediction models, and their accuracies were superior to those of previous statistically calculated ones. The DL-based FIM score prediction models would save medical costs and perform efficient stroke and rehabilitation medicine.


Introduction
In Japan, stroke is the third leading cause of death and the second leading cause of long-term disability. Japan started a characteristic rehabilitation system in 2000 called kaifukuki (convalescent) rehabilitation wards (KRWs). KRW is incorporated into the Japanese medical insurance system, and The Japan Ministry of Health, Labour, and Welfare define KRWs as the essential inpatient rehabilitation system. Stroke patients are eligible for the KRW. They can undergo rehabilitation up to 150 days and 3 hours per day of rehabilitation, including physical, occupational, and speech therapy, in the KRW. There are over 60 KRW beds per 100,000 individuals, comprising 4.6% of the total Japanese hospital beds, and the number of KRW is increasing [1]. been used as a quality indicator of KRW, and hospitals with KRWs must achieve certain FIM-related standards [1]. Therefore, we try to increase and predict the total FIM score, FIM gain, and FIM efficiency (FIM gain divided by length of stay (LOS) in KRW) of each patient. Previously, several reports used multiple regression to predict total FIM score [4][5][6][7][8][9][10][11][12][13][14][15], but there are some questionable points; (1) not doing validation, (2) not confirming the normal distribution of variables and residuals to use multiple regression [16].
Generally, statistically making a prediction model needs many samples over thousands, so these studies tend to be country-initiated or academic association-initiated research. However, the larger the sample size, the less detailed information is available, such as comorbidities or laboratory test data, and the more there are missing data. Also, the treatment strategies vary from hospital to hospital, and patient backgrounds differ depending on countries and regions. Therefore, statistically making a universal prediction model for the FIM score is very difficult [16].
Recently, deep learning (DL), contained in artificial intelligence, is attractive. DL is gradually starting to be used in neurosurgical diseases in decision making for spinal canal stenosis [17], predicting outcomes after stroke [18,19], predicting the occurrence of stroke from meteorological information [20], automated diagnosis of primary headaches from a Japanese medical questionnaire [21], pathological diagnosis [22], or radiomics studies of brain tumours [23]. However, there are no reports on the DL-based prediction model regarding the total FIM score after KRW admission. We hypothesized that we could make a good prediction model for our hospital using the DL framework, even with a small dataset. Therefore, we herein produced the prediction model using DL framework, Prediction One (Sony Network Communications Inc., Tokyo, Japan, https://predictionone.sony.biz/) [19][20][21]24,25] with our dataset and compared the utility of the DLbased model to previously-reported multiple regression models. We also investigated our patients' characteristics admitted to KRW because our hospital is unique in that KRW is annexed to an acute care hospital.

Study population
We retrospectively retrieved data from medical records of all the consecutive 559 stroke patients admitted between 2017 and 2019 and treated at our institution. The stroke diagnosis was based on the clinical history and the findings of computed tomography (CT) or magnetic resonance imaging.
General management of stroke, including cerebral infarction (CI), intracerebral haemorrhage (ICH), and subarachnoid haemorrhage (SAH), was according to the Japanese Guidelines for the Management of Stroke 2015 [2]. We first performed acute care, including appropriate medical treatment and surgery. Rehabilitation is limited to 60 min in the acute phase. After 1-2 weeks of acute-phase rehabilitation, we discussed the patients' final destination with their families; home, KRW, nursing facilities, or long term hospitals. Patients who could live independently or with sufficient support from their families were discharged home in 2-4 weeks. Patients who needed more rehabilitation and might have the potential to live independently or with families' supports could be transferred to KRW. Patients who would not live independently or bed-ridden but whose families committed to providing in-home care could also be transferred to KRW, and such patients must have been discharged home. Patients who needed care and whose families could not support them were discharged to nursing facilities. Patients with severe neurological deficits like vegetative states were transferred to long term hospitals.
In the KRW, we could provide rehabilitation that was in line with real life. The daily rehabilitation was up to 3 hours, and the LOS was up to 150 days as maximum. We, doctors, nurses, and therapists, held monthly meetings with the patients and their families together, and we decided how long the rehabilitation in the KRW would be. This study retrospectively investigated the patients' characteristics, including age, stroke types, and discharge destination.

Clinical variables for making prediction models
We collected data regarding physiological symptoms at admission, i.e., date of admission, age, sex, height, weight, body mass index, history of smoking and habitual massive drinking (over 450g ethanol intake/week), comorbidities (history or present treatment by a clinician for hypertension, dyslipidemia, diabetes mellitus, chronic kidney diseases, orthopaedic diseases, or cancer). Glasgow Coma Scale (GCS) and National Institutes of Health Stroke Scale scores on admission were also investigated. We also measured triglycerides, total cholesterol, high-density lipoprotein cholesterol, low-density lipoprotein cholesterol, albumin, C-reactive protein, blood glucose, haemoglobin A1c, haemoglobin levels, and white blood cell and lymphocyte counts admission in the acute neurosurgical ward. Albumin, lymphocyte, and total cholesterol are known factors for controlling nutritional status scores to assess the patients' nutritional status [26]. As radiological findings, we investigated temporal muscle thickness (TMT) (mm) as an indicator of systemic skeletal muscle mass [27][28][29][30][31][32][33][34] based on the results of the CT at admission. SYNAPSE V 4.1.5 imaging software (Fujifilm Medical, Tokyo, Japan) was used through the methods described previously [32,33].
We also investigated the modified Rankin Scale scores when the patients were transferred to the KRW. The LOS in the acute care ward and KRW were also investigated. Barthel Index (BI) at the admission of acute care ward, FIM score at admission to KRW and the discharge of KRW, FIM gain, and FIM efficiency were also collected, and the total FIM score at discharge was defined as an outcome in this study. There were no missing values.

Making prediction model by Prediction One
We used Prediction One framework to make the prediction models. We divided our 122 patients' data randomly into 61 patients training dataset and 61 patients validation dataset. Prediction One read the training data and automatically performed 5-folds cross-validation. Prediction One automatically adjusted and optimized the easy to process variables statistically and mathematically and select appropriate algorithms with ensemble learning. Prediction One made the best prediction model by an artificial neural network with internal cross-validation. The details are trade secrets and could not be provided.
We let the Prediction One framework make three prediction models using training dataset; (1) using 49 variables acquired only at the admission of acute care ward, (2) using 40 variables, including FIM scores, which could be known only at the admission of KRW, and (3) using all 70 variables acquired both at the admissions of acute care and KRW. The determination coefficient (R 2 ) and strong variables of each model were automatically calculated. Then, we performed validation using the validation datasets. Correlation coefficients (r) and residuals between the predicted and actual total FIM scores were calculated to evaluate the models' accuracy.

Statistical analysis
Results are basically shown as median (interquartile range). The difference between the training dataset and the external validation dataset was tested appropriately using the Mann-Whitney U test, Fisher's exact test, or Pearson's chi-square test. Univariate analysis on the association between each variable and total FIM score at the discharge of KRW was also performed. We could not perform multiple regression due to the small sample size and non-normal distribution of variables. A two-tailed p < 0.05 was considered statistically significant. We calculated r and these p values using SPSS software version 24.0.0. (IBM, New York, USA).
Prediction One produced each prediction model in less than two minutes. The R 2 of each model were described in Table 4. Model (1), using 49 variables acquired only at the acute care ward admission, had an R 2 of 0.794. Its r and mean ± standard deviation of residuals between the predicted and actual total FIM scores were 0.372 (95%confident interval 0.120-0.578) and -1.56 ± 24.6. Model (2)   The more robust variables of each model are listed in Table 5. In model (1), TMT, lymphocyte count, lowdensity lipoprotein cholesterol level, total BI score, and haemoglobin level had enormous effects on the outcome. In model (2)

Discussion
We made prediction models for the total FIM score at the discharge of KRW in our hospital. We made three models; (1) using information gained at the admission of acute care ward, (2) using information gained at the admission of KRW, and (3) combined (1) and (2). DL-based models (2) and (3) had good accuracies, and we also performed validation, despite our small dataset. This is the first report on creating DL-based prediction models of total FIM scores at the discharge of KRW.

KRW annexed to acute care hospital
Our hospital was rebuilt in 2017, and KRW was newly annexed. Until then, patients had been transferred to the other hospitals' KRWs, but it took more than a month from the stroke onset to transfer. Since the new hospital, the transfer to our KRW from the stroke onset has been shortened to 23 days. Furthermore, the mean FIM gain of 19 and the mean FIM efficiency of 0.417 is higher than those of national averages of 17.1 and 0.187 [35]. This may be because the staff in the acute care ward and those in KRW work closely together and frequently held meetings with patients and their families. Sharing the information on patients' comorbidities, treatment status, and rehabilitation progress will allow us to smooth transfer. This openness is one of the advantages of our hospital.
In our hospital, all patients were discharged home because we only permitted the transfer to KRW for patients who might have the potential to live independently with/without families' support or those who would not live independently or bed-ridden but whose families committed to providing in-home care. We decided on these determinations after around 1-2 weeks after onset. Whether it is socially and medically right to make such an early decision needs further discussion. However, we are forced to make such early decisions to use our limited medical resources in this rural area effectively. In this situation with limited medical resources, total FIM score prediction at the discharge of KRW is essential for the effective use of limited medical resources and the decision-making of patients and their families.

Advantages of DL
Conventional cost-and time-consuming statistical analysis needs variable optimization like a logarithmic transformation to make the variables with normal distribution increase the prediction model's accuracy mathematically. It also requires the arbitrary selection of variables based on previous studies, and multivariate analysis needs 10-15 folds number of samples against the variables [36]. Therefore, there is a risk that variables that may be important cannot be used in the statistical analysis. Even the multivariate analysis cannot be done in a small hospital with small data. Furthermore, we should do multiple imputations or listwise deletion in statistical analysis when there is missing data, leading to inaccuracy. Furthermore, we should validate the models' accuracy, but several previous reports were not validated [16].
DL has the potential to overcome these problems [37]. DL develops useful models with less effort or time using the small dataset, without time-consuming variable optimization nor arbitrarily choosing variables because the DL framework automatically performs these processes. The optimal number of the variables for the DL framework is not revealed, and the DL framework sometimes finds interesting variables as necessary that has not been taken into account in the previously reported statistical models. Furthermore, the DL framework automatically uses appropriate values instead of the missing ones and calculate the best prediction model.
We then review these benefits of DL in our study. Conventionally, we could have used only six variables for statistical analysis due to the small sample size of the training dataset (n = 61). However, we could use 70 variables for making the model (3) and make a good prediction model from the small dataset. We did not need to perform variable optimization by ourselves. Furthermore, some unexpected variables, such as TMT, lymphocyte count, blood glucose level, were judged to be important in DL models. The suggestions are essential because data from the acute phase affects the total FIM score after KRW admission. What was not statistically significant in univariate analyses was deemed essential. Besides, the time needed for creating each model was less than 2 minutes. Finally, the models achieved high accuracies both in the training and validation dataset. While many reports did not conduct validation, we believe that our report is important, creating a stir about validation to present the accuracy.

Limitations of DL
First, the prediction model derived from our data cannot be applied to other institutions. The training and validation dataset must be updated to keep up with advances in medical science and surgical techniques, and medication changes. Creating a DL-based prediction model that can be used universally at any hospital will still require country-initiated or academic association-initiated collaborative research at many institutions. It may eventually require the same amount of effort as the traditional statistical model creation.

Limitation of this study
First, the sample size was small, and it is unknown how many samples are needed for DL analysis. We did not investigate the detail of rehabilitation training and how long the rehabilitation was actually performed per day for each individual. Further continuation study is needed.