UW Shoulder and Elbow Academy: I have a new arthroplasty implant and want to show it's better than what's out there.

There is tremendous interest in innovating new arthroplasty approaches in the hope of improving clinical outcomes for our patients. See Does Knee Prosthesis Survivorship Improve When Implant Designs Change? which concludes "It is difficult to predict whether a new system will demonstrate better survival than a previous one, and widespread uptake of a new design before a benefit is shown in robust clinical studies is unwise."(thanks, Seth Leopold, for this reference).

In that a lot of time, energy and money is required to bring an innovation to the market place, how do we find out whether this investment of resources has resulted in a clinically significant improvement in outcome?

The bottom line is that it's difficult.

There are some key elements to such an analysis: sample size and control of confounding variables

Sample Size
We know that such a study requires prospectively collected data, but how many patients with two-year followup do we need to meaningfully compare the outcomes for the new implant to those for the traditionally used system?

The answer is that it depends - among other things - on our selection of the primary outcome variable: is it revision rate, complication rate, patient reported outcomes (PRO) or range of motion (ROM)?

The authors of Systematic review of shoulder arthroplasty outcomes: what sample size is meaningful? reviewed published studies of anatomic or reverse TSA with prospectively collected data and a minimum two-year follow-up. They analyzed common clinical outcomes reported for shoulder arthroplasty and reported the sample sizes necessary to confirm a clinically meaningful and statistically significant difference for these outcome measures with the assumptions of significance at α = 0.05 and power at β = 0.20. They found that the sample size needed to detect meaningful differences was much lower for continuous measures (ROM and PROs) than for dichotomous outcomes (revisions and complications). Specifically they found the required sample sizes to be up to 100,000 patients to detect differences in complication rates, up to 10,000 for differences in revision rates, and several 100 for patient reported outcomes and range of motion. Note the orders of magnitude greater numbers required for the dichotomous variables of complication rate and revision rate in contrast to variables with scaled measures (ROM, PRO). This is related in part to the low rates of complications and revisions so that large sample sizes are needed to detect a significant difference. Some of the details of their findings are shown graphically at the end of this post.

Controlling for Covariates
When comparing two treatment groups in clinical research, surgeons often attempt to match important characteristics (covariates) so that the outcomes of the two treatments can be compared without confounding related to differences in the patients. The authors of What are We Matching On and Why?:A Systematic Review of Matched Study Designs in Shoulder Arthroplasty examined the variety and inconsistency of matching patient cohorts in studies assessing outcomes following shoulder arthroplasty. They found 110 studies encompassing 483,738 shoulder arthroplasties, 82 (74.6%) studies employed direct matching and 28 (25.5%) employed propensity score matching. [Direct matching matches the two treatment groups based on near-exact agreement of the covariates (e.g., age = 65 +/- 5 years, sex = male, BMI = 25 +/- 5); in propensity matching a logistic regression model is first built that assigns each study participant a propensity score representing the conditional probability of that patient being in one group versus the other as a function of baseline factors. Rather than individual variables, patients are then matched on their propensity score. Propensity matching has an advantage when there are a large number of covariates under consideration]

Seventy-four distinct covariates were used in at least one study, with 86 unique combinations of covariates employed. Studies used a median of 4 covariates (range 1-27). The most common covariates were age (94.5%), sex (89.1%), body mass index (26.4%), smoking (19.1%), and follow-up duration (19.1%). Only 16 (14.6%) studies reported justification for the covariates included. Sample sizes ranged from 115 to 106,114 shoulders. Thus their analysis of 110 "matched" outcome studies in shoulder arthroplasty "revealed marked discrepancies in the number and type of matched variables and in how patient matching is performed and reported. This variability may have negative implications for the reproducibility, generalizability, and transparency of matched cohort studies within shoulder arthroplasty."

Of course, in addition to age, sex, body mass index, smoking , and follow-up duration, there are many other clinically important covariates that will confound the analysis if they're not controlled for: diagnosis, cuff status, glenoid type, social determinants of health, preoperative function, implant size and position, and the surgeon performing the arthroplasty. The full list may be quite a bit longer. However, matching for each additional covariate reduces the number of shoulders that qualify for inclusion in the matched sample (because of excluding those without suitable matches) to the extent that the final colort may no longer be representative of patients being treated with the two types of arthroplasty. This effect can be assessed by comparing the characteristics and numbers of all patients prior to matching to the characteristics and numbers of those still in the study after matching.

Length of Followup

Many studies list "minimum two year followup" as an inclusion criterion. Is it reasonable to combine followups at two years with followups at 5 years 7 year, or longer? There are also issues when we extend the period of followup as reviewed in The Challenge of Long Term Followup, including that the methods used a while back may no longer be what we're using today and that a progressively smaller percent of the original cohort is lost as the period of followup is extended.

So, is there a better option? The answer is both yes and no.

In Is pyrocarbon better than a ream and run? - a randomized controlled trial, we propose a hypothetical randomized controlled trail and point out it strengths (e.g. it does not require ex ante identification of the potentially confounding covariates or trying to match the patients in each treatment group) and its weaknesses (e.g. some patients' unwillingness to have their treatment decided at random - risking an unrepresentative sample, and the cost).

A final thought is that, overall, the current results of shoulder arthroplasty are quite good with low complication and revision rates. The scores using the common PRO are within the minimal clinically important difference of a perfect score. As a result it may be difficult for any innovation to improve the current outcomes by an amount that is clinically significant. The reader will have no problem finding articles that conclude that "while there was a statistically significant improvement, the improvement was not clinically significant).

This does not mean that innovation is without value, but it does suggest that innovation should be directed at a specific problem, one that occurs sufficiently often and has a measurable effect so that it will be straightforward to determine whether the innovation fixes it. As Einstein is reported to have said, “If I had an hour to solve a problem, I'd spend 55 minutes thinking about the problem and 5 minutes thinking about solutions.”

We need to keep pecking at the problems

Pileated Woodpecker

Shot this week in Interlacken Park, Seattle

You can support cutting edge shoulder research that is leading to better care for patients with shoulder problems, click on this link

Follow on twitter/X: https://x.com/RickMatsen
Follow on facebook: https://www.facebook.com/shoulder.arthritis
Follow on LinkedIn: https://www.linkedin.com/in/rick-matsen-88b1a8133/

Here are some videos that are of shoulder interest
Shoulder arthritis - what you need to know (see this link).
How to x-ray the shoulder (see this link).
The ream and run procedure (see this link).
The total shoulder arthroplasty (see this link).
The cuff tear arthropathy arthroplasty (see this link).
The reverse total shoulder arthroplasty (see this link).
The smooth and move procedure for irreparable rotator cuff tears (see this link)
Shoulder rehabilitation exercises (see this link).