Wednesday, June 14, 2017

PROMIS - does it introduce selection bias?

Performance of PROMIS for Healthy Patients Undergoing Meniscal Surgery

The Patient-Reported Outcomes Measurement Information System (PROMIS) was developed as an extensive question bank with multiple health domains that could be utilized for computerized adaptive testing (CAT). These authors investigated the use of the PROMIS Physical Function CAT (PROMIS PF CAT) in an otherwise healthy population scheduled to undergo surgery for meniscal injury with the hypotheses that (1) the PROMIS PF CAT would correlate strongly with patient-reported outcome instruments that measure physical function and would not correlate strongly with those that measure other health domains, (2) there would be no ceiling effects, and (3) the test burden would be significantly less than that of the traditional measures. 

Patients scheduled to undergo meniscal surgery completed the PROMIS PF CAT, Knee injury and Osteoarthritis Outcome Score (KOOS), Marx Knee Activity Rating Scale, Short Form-36 (SF-36), and EuroQol-5 Dimension (EQ-5D) questionnaires. Correlations were defined as high (‡0.7), high-moderate (0.61 to 0.69), moderate (0.4 to 0.6), moderateweak (0.31 to 0.39), or weak (£0.3). If ‡15% respondents to a patient-reported outcome measure obtained the highest or lowest possible score, the instrument was determined to have a significant ceiling or floor effect.
A total of 107 participants were analyzed. The PROMIS PF CAT had a high correlation with the SF-36 Physical Functioning (PF) (r = 0.82, p < 0.01) and KOOS Sport (r = 0.76, p < 0.01) scores; a high-moderate correlation with the KOOS Quality-of-Life (QOL) (r = 0.63, p < 0.01) and EQ-5D (r = 0.62, p < 0.01) instruments; and a moderate correlation with the SF-36 Pain (r = 0.60, p < 0.01), KOOS Symptoms (r = 0.57, p < 0.01), KOOS Activities of Daily Living (ADL) (r = 0.60, p < 0.01), and KOOS Pain (r = 0.60, p < 0.01) scores. The majority (89%) of the patients completed the PROMIS PF CAT after answering only 4 items. The PROMIS PF CAT had no floor or ceiling effects, with 0% of the participants achieving the lowest and highest score, respectively.

Comment: This study does not provide data on the cost and training necessary to implement the PROMIS CAT system. Basically it shows that the PROMIS CAT correlates well with free, universally available tools. While the authors conclude that "It may be a reasonable alternative to more burdensome patient-reported outcome measures," it may be actually the case that currently available and generally accepted inexpensive tools provide an alternative to the more burdensome PROMIS CAT. 

Implementing this system requires the provider to have access to a PROMIS CAT program for each diagnostic condition she or he manages and for the provider's patients to have access to the system each time they provide an update. As a result, the providers and the patients using PROMIS CAT are likely to be highly selected and not representative of the general practice of the specialty.  

We rely on 'inclusive' approaches to outcome assessment that put valid followup within the reach of the maximal number of providers and patients so that the information gained is as generalizable as possible.

See the related posts below.

Psychometric evaluation of the PROMIS Physical Function Computerized Adaptive Test in comparison to the American Shoulder and Elbow Surgeons score and Simple Shoulder Test in patients with rotator cuff disease

The National Institutes of Health has recently developed the Patient-Reported Outcomes Measurement System (PROMIS) Computer Adaptive Test (CAT) that applies technology of computerized adaptive testing used in examinations like the Graduate Records Examinations.
With CAT, questions are sequentially administered from a large item ‘‘bank’’ until predetermined reliability criteria are met. Each question response produces a probability curve of the respondent’s estimated ability. For example, a patient who can throw a ball with ease has a high probability of having upper-end physical function. Subsequent questions can then be chosen by the CAT ‘‘engine’’ that will further discriminate the respondent’s estimated ability while uninformative and repetitive questions are omitted.

These authors studied 187 patients with clinical diagnosis of rotator cuff disease completed the American Shoulder and Elbow Surgeons (ASES) score, Simple Shoulder Test (SST), and PF CAT.

Responses from 187 patients were analyzed. The PF CAT required fewer questions than the ASES or SST (PF CAT, 4.3; ASES, 11; SST, 12). Correlation between all instruments was moderately high. Item reliability was excellent for all instruments, but person reliability of the PF CAT was superior (0.93, excellent) to the SST (0.71, moderate) and ASES (0.48, fair). Ceiling effects were similar among all instruments (PF CAT, 0.53%; SST, 6.1%; ASES, 2.3%). Floor effects were found in 21% of respondents to the SST but in only 3.2% of PF CAT and 2.3% of ASES respondents.

Comment: Unfortunately the CAT requires the patient to be at a computer or tablet that carries the program. It cannot be completed on paper and thus is not amenable to follow-up mailings. The authors did not measure the time to complete the PROMIS nor its relative convenience or user-friendliness. They did not study the ability of the PROMIS responses to be translated into terms that patients can easily grasp. It is 'the new kid on the block' so that its results cannot be compared to data collected in the past.

By contrast the user-friendly Simple Shoulder Test can be completed on paper anywhere in the world in under two minutes and requires nothing other than a pencil. The SST has been utilized in over 650 publications according to a recent PubMed search. As early as ten years ago, it was recognized that this simple 12 item questionnaire had the ability to characterize (1) the function of normal shoulders, (2)  the functional deficits for many different diagnoses, and (3) the different responses of male and female patients. Here are some figures from that article that was based on 2674 patients.

Self-assessed outcome at two to four years after shoulder hemiarthroplasty with concentric glenoid reaming.

Shoulder scoring scales for the evaluation of rotator cuff repair.

It can be easily used to track patients' recovery over time.

As well as its use in multiple languages, for example

Validation and reliability of a Spanish version of Simple Shoulder Test (SST-Sp)

Finally, and perhaps most importantly, while the PROMIS score is a number without particular meaning to a patient, the results of the SST can be easily communicated:

Is the Simple Shoulder Test a valid outcome instrument for shoulder arthroplasty?

It is in the interest of all shoulder surgeons to document the functional status of their patients before and sequentially after treatment to learn what treatments are working for which diagnoses in which patients. Observing one's personal outcomes is a great way for a surgeon to continue to improve. The issue is that many of the existing instruments are time consuming, at risk for observer bias, unvalidated, and/or inflexible in terms of the patient's need to return to the office for evaluation.

These authors point out that the Simple Shoulder Test (SST - a 12 item 'yes' or 'no' questionnaire) is a brief, inexpensive, convenient, and widely used patient-reported outcome tool. Since it is the patient who should be served by our treatment, it makes sense to base the outcome assessment on the patient's self-assessed ability to perform shoulder functions.

The results are easy to record in the patient's medical record so that the trend over time can be easily followed. The12 individual questions of the SST allow a more detailed discussion of each patient's functional deficits and the likelihood that the desired function might be restored.

The SST does not require any potentially biasing participation by physicians, nurses, or therapists; it can be completed without a computer and without the patient’s having to return to the surgeon’s office. It has been adapted and validated in multiple languages, including Persian, Spanish, Portuguese, Dutch, and Turkish.

However, the SST has not been rigorously evaluated for patients treated with shoulder arthroplasty. The goal of their study was to rigorously evaluate the validity of the SST for outcome assessment in shoulder arthroplasty using a systematic review of the literature and an analysis of its properties in a series of 408 surgical cases. For these patients the authors documented SST scores, 36-Item Short Form Health Survey scores, and satisfaction scores collected preoperatively and 2 years postoperatively. The responsiveness of the SST was assessed by comparing preoperative and 2-year postoperative scores. The criterion validity of the SST was determined by correlating the SST with the 36-Item Short Form Health Survey. The construct validity of the SST was tested through 5 clinical hypotheses regarding satisfaction, comorbidities, insurance status, previous failed surgery, and narcotic use.

They found that scores improved from 3.9 ± 2.8 before arthroplasty to 10.2 ± 2.3 after arthroplasty (P < .001). The change in SST correlated strongly with patient satisfaction (P < .001). The SST had large Cohen’s d effect sizes and standardized response means. Criterion validity was supported by significant differences between satisfied and unsatisfied patients, those with more severe and less severe comorbidities, those with workers’ compensation or Medicaid and other types of insurance, those with and without previous failed shoulder surgery, and those taking and those not taking narcotic pain medication before surgery (P < .005).

They concluded that these data combined with a systematic review of the literature demonstrated that the SST is a valid and responsive patient-reported outcome measure for assessing the outcomes of shoulder arthroplasty.

Comment: A figure from this paper helps to make a point. The black arrows in the figure below indicate patients who did not fare so well after shoulder arthroplasty. These are the ones on which the surgeon needs to focus, asking - was the type of arthroplasty the right treatment, was it done well, was the rehab properly organized, were there other factors interfering with the desired outcome?

The SST can also be useful in discussing with the patient "when is it the right time for a shoulder replacement for arthritis?" This question comes up often.

Of course, the answer depends on many things, including the degree to which the quality of life of the individual is impaired by the shoulder condition, the condition of the muscles, tendons, bone and nerves around the shoulder, the expectations of the patient, the overall health of the individual, the individual's willingness to accept the risks of surgery, and the degree of comfort the individual has with the surgeon.

We recently summarized the SST scores of over 2800 of our patients at the point where they had decided to have a shoulder joint replacement for their arthritis. The average preoperative SST score was 3.9. The numbers of patients with each of the 12 possible SST scores is shown below. Basically, this graph shows that 62% of patients having joint replacement had preoperative SST scores of 4 or below; 30% had SST scores from 5-8; and 8% had scores from 9-12.

We find this graph useful for allowing the patient to place their current self-assessed comfort and function in the context of other patients having shoulder arthroplasty.

One of the operations we offer to patients with shoulder arthritis is the ream and run procedure, a method of shoulder joint replacement arthroplasty that avoids the potential risks associated with a plastic socket.
The chart above shows how the Simple Shoulder Test (SST) is used to document the recovery of patient self-assessed comfort and function after a ream and run procedure for shoulder arthritis. The data represent the average recovery from a consecutive series of over 100 patients with at least two years of post surgical follow-up. The vertical axis represents the total number of SST questions answered 'yes', while the horizontal axis represents the years after surgery. The dots show individual data points and the lines show the average (plus or minus one standard deviation) for all the patients.
Since, on average, patients having the ream and run live over 500 miles from our center, routine office visits are impractical for them.  Because patients mail or email their results to us, we have been able to keep close tabs on their recovery using the Simple Shoulder Test.The full article was recently published in the JBJS and discussed in this post.

In conclusion, if "value" is defined as the benefit divided by the cost, the SST appears to be a high value assessment tool.
We are not sure that the PROMIS is progress.

The reader may also be interested in these posts:

Use the "Search" box to the right to find other topics of interest to you.