UW Shoulder and Elbow Academy: Surgical failures: what causes them and how can we do better for our patients. Warning: this post is lengthly but informative!

The Book of Why is transforming our understanding of the causation of surgical outcomes and how to optimize them for our patients. The book is both terrific and dense. Here I try to provide some "Cliffsnotes" that relate to our surgical practices.

Chapter 1: The Ladder of Causation

Level 1: Association

Most of the publications relating to surgical outcomes are observational, reporting associations of various factors with the result ("12% of patients having an anatomic total shoulder had a surgical revision, 36% had rotator cuff failure, 60% had cementless glenoid components, older patients have a lower hazard ratio for revision ...."). Comparative tools - such as p values, hazard ratios, and Kaplan-Meier curves - are commonly used in these reports to show associations, but not which factors determine the outcome. While these tools can identify the factors associated with surgical failure, they don't tell us how we can avoid the causes of adverse outcomes for our future patients.

Level 2. Intervention.

When we do surgery for an individual patient, we have to select a procedure from among the alternatives (for an irreparable cuff tear, we select debridement, partial repair, subacromial balloon, superior capsular reconstruction, bioactive graft, tendon transfer, or reverse total shoulder). If I chose one rather than the others, how would the outcome have been different for the patient? Even the best attempt at a randomized controlled trial of these seven different procedures would not be able to guide our patient management. Is it realistic for our surgical actions to be objectively "data driven" or does our choice need to be informed by a subjective analysis as in Level 3?

Level 3. Counterfactual thinking

Learning from failure. Fortunately most orthopaedic surgeries turn out well for the patient; thus the greatest opportunity for learning comes from our failures. When a patient experiences a complication, we need to ask retrospectively, "if I had chosen a different procedure or different implant, might the outcome have been better for the patient". This question cannot be answered objectively, but asking it forces the surgeon to try to assess the root cause of each failure and to ask the subjective question, "in hindsight, how it might have been addressed". Progress will be accelerated when we treat every failure as a causal case study, not just a statistic. "For a specific patient with a loose anatomic glenoid component, is it likely that revision could have been avoided if we had used an augmented glenoid?"

Chapter 2: The Genesis of Causal Inference

Every time we adopt a new implant or technique our patients become unwitting participants in uncontrolled experiments. Retrospective case series can only provide observations (e.g. patients with osteoarthritis treated by reverse total shoulders had a complication rate of 20%) that are confounded by inter-patient variability and inter-surgeon variability, leaving us with no information on causation or prevention.

What about RCTs?

1. Prospective randomized controlled trials (RCTs): the good and the bad.

The good: Only by deliberately assigning treatments randomly to patients from a defined population can we separate correlation from causation.

The bad: surgical RCTs are rare, often underpowered and limited by

(a) those patients with diagnosis X who consent to a randomized study of treatment by procedure A or B may not represent the typical patient with diagnosis X ("I don't want my treatment to be decided by the flip of a coin, I want to decide my treartment in partnership with my surgeon")

(b) eithics: (is there really equipoise?)

(c) surgeon skill variation (surgeon A is really good at fixation of fractures while surgeon B is really good at endoprosthesis treatment of fractures)

(d) challenge of obtaining long term followup on a sufficiently large number of patients.

(e) only answering "does procedure A work better than procedure B on average", but not "for which patient, with which anatomy, and in which surgeon's hands?"

Chapter 3: From Evidence to Causes

and

Chapter 4 Confounding and Deconfounding

The problem of confounding:

A third variable can create a spurious associate between two other variables. For example, older, lower demand patients may preferentially be offered a reverse total shoulder. If they have lower revision rates than the patients receiving anatomic total shoulders, it may look like reverse total shoulder arthroplasty causes fewer failures, when age may be the primary factor influencing the revision rate. Similarly, surgeon case volume may influence both implant choice and complication rate (obscuring the relationship between implant and complication). Cuff quality may influence both implant choice and functional outcome.

We need to ask, "what factors could plausibly influence both the procedure and the outcome?" These confounders need to be controlled for either by randomization or stratification.

Such confounders may be difficult to identify; here are some possible approaches

1. From the surgeon's intuition, experience and domain knowledge, does the varible:

(a) influence treatment choice?

(b) independently influence outcome?

(Note that bone quality influences both).

2. Are there empirical clues such as imbalances beween treatment groups (are patients receiving one procedure older, sicker, or lower demand)?

(Note that patient age influences procedure and outcome).

Examples of confounders

(a) Age: older patients more likely to get a reverse AND are more likely to have lower revision rate

(b) Surgeon volume: high volume surgeons prefer certain implants AND have better outcomes

(d) Youth and male sex: determines procedure choice (e.g. ream and run vs total shoulder) AND outcome

(e) Healthier patients with less deformity: determines implant choice (e.g. stemless humeral component) AND revision rate

Each of these must be considered in analysis to avoid misleading conclusions

Some important confounders are rarely measured:

(a) frailty

(b) social determinants of health

When planning a study, we need to explicity list potential confounders and decide how to measure and account for them. Once identified, confounders can be handled by randomization, restriction, matching, regression, stratification, and/or propensity methods. However, each of these methods carry their own risks, for example propensity matching risks loss of the cases that cannot be matched (which reduces the generalizability of the result). The same can be said for randomized controlled trails. In any event the conclusion of the study needs to acknowledge the identified confounders, the potential for other confounders and how the authors endeavored to mange confounding.

Chapter 5 Colliders

Not every variable should be controlled for. Ask "is this variable a cause or and effect?"

A confounder (e.g. surgical volume) is a cause of both the exposure (implant choice) and the outcome (revision). We should control for this variable in deciding the relationship between implant choice and revision.

A collider (e.g. revision) is an effect of unrelated factors (patient comorbidities) and (poor surgeon skill). If we control for revision, it will create a spurious connection between patient comorbidities and poor surgeon skill.

Chapter 6 Causal Paradoxes

Suppose revision rates appear higher for anatomic TSA compared with reverse TSA when looking at raw totals. But when stratified by age group, we find: (1) younger patients (who are more likely to get anatomic TSA) have higher revision rates overall but (2) within each age group, anatomic TSA actually perform better than reverse TSA. The paradox occurs because age (a confounder) wasn’t adjusted for in the aggregate data. This is an example of Simpson's paradox.

Simpson's paradox is a statistical phenomenon where a consistent trend appears in different groups (below right), but disappears or reverses when the groups are combined (below left) for the same data. This occurs because a hidden confounding variable (in this case patient age) distorts the relationship between the main variables being studied.

We need to be aware of paradoxes when considering:

Registry data: Revision risk comparisons between aTSA and rTSA can show Simpson’s paradox if patient factors (e.g., rotator cuff status, age, bone quality) are not stratified.

Center outcomes: High-volume centers may appear to have “worse” outcomes overall because they take on more complex patients. Within complexity strata, they may actually have better results.

Implant comparisons: Stemless vs stemmed TSA may look different in revision risk until stratified by deformity or bone quality.

Chapter 7 Intervention

Why Traditional Statistics Fall Short.

When we ask: "If we do a reverse total shoulder instead of an anomic in this patient what is the likelihood of revision?" we are asking a causal question, not simply observing what factors have been reported for revision.

Observation (association): "In the registry, patients with a reverse TSA had more revisions"

Causal question (intervention): "If I do a reverse total shoulder rather than an anatomic on this 72-year old man with poor cuff integrity, how would that change the probability of a revision?"

Traditional statistical tools such as regression and stratification can balance measured variables (age, cuff status, glenoid type) but they fall short when two fundamental challenges are present:

(1) Complex Causal Structures

Surgical decisions involve networks of influences where confounders and colliders play critical roles

Example of confounder: Bone quality. Bone quality influences both implant choice and revision. Implant choice also influences revision. If we fail to account for bone quality, the apparent relationship between implant type and revision is biased.

Example of a collider: Revision. Comorbidities and surgical technique are not directly related, but both influence revision. If we analyze only patients having a revision, we are likely to find a spurious correlation between comorbidities and surgical technique.

Note that adjusting for a confounder is essential, adjusting for a collider is misleading. Traditional regression models often cannot distinguish between the two.

(2) Unmeasured confounders.

Even the best registries do not document important factors that influence surigical decisions and outcomes. Examples are:

(a) Surgeon philosophy (preference for reverse or anatomic)

(b) Patient motivation (adherance to rehabilitation, pain tolerance)

(d) Social determinants of health

(e) System accessibility

Because these variables are unmeasured, statistical adjustment cannot account for their influence, leaving residual bias in comparisons.

To translate evidence into surgical choices a surgeon needs to

(1) Identify all confounders that could affect the outcome (patient related, surgeon related, system-related)

(2) Estimate the weight (influence) of each confounder based on surgical experience and the literature - which are the most important?

(3) Determine which of the important confounders are known for the case in question.

(4) Integrate these insights to answer the question, "If do a reverse total shoulder rather than an anatomic on this 72-year old man with poor cuff integrity, how does that change the probability of a revision?"

Chapter 8 Counterfactuals

Association (observation) : "Patients with a reverse TSA had more revisions"

Intervention (choosing between options): "If we do a reverse TSA instead of an anatomic in this 72-year old man with poor cuff integrity, how does that change the risk of revision?"

Counterfactual ("what if"): "For this patient who had a reverse TSA and required revision, what would have happened if I had done an anatomic instead?"

Counterfactuals cannot be answered with traditional statistics alone. We need (1) a causal model describing how patient, surgeon, implant and systems interact (2) evidence (from registries, trials, cases studies) to estimate probabilities of different paths, (3) to "re-run" history under different choices.

Example: our 72-year old man with poor cuff integrity had a reverse TSA and required a revision. If this patient had instead received an anatomic TSA, would revision have been avoided? Answering this question requires a causal model that considers

both measured (age, sex, glenoid type, bone quality, cuff status, comorbidities) and unmeasured confounders (motivation, pain threshold, compliance, change, social determinants of health).

Counterfactuals are central to learning from failure: given what we could not control (characteristics of the patient, shoulder, and environment) what could we have been done differently that may have reduced the risk of failure (surgeon-controllable variables), including

glenoid implant: choice, placement, sizing, bone preparation, fixation

humeral implant: choice, placement, sizing, bone preparation, fixation

glenohumeral relationships: lateralization, distalization, centering, compression

soft tissue managment: cuff, subscapularis, capsule

infection prophylaxis:

rehabilitation

Such an analysis can be informed by classifying the type(s) of failure leading to revision:

dislocation

acromial fracture

glenoid fracture

humeral component breakage

glenoid component breakage

polyethylene wear

subscapularis failure

rotator cuff failure

stiffness

infection

Each of these failure modes drives a different set of counterfactuals: would a differnent implant, surgical technique, or postoperative treament have avoided a particular type of failure?

If you've read this far, congratulations. I've tried to take concepts that are very important and make them accessible.

Comments welcome!

Finding the meat

Western Tanager

Seattle

2022

Follow on twitter/X: https://x.com/RickMatsen
Follow on facebook: https://www.facebook.com/shoulder.arthritis
Follow on LinkedIn: https://www.linkedin.com/in/rick-matsen-88b1a8133/

Here are some videos that are of shoulder interest
Shoulder arthritis - what you need to know (see this link).
How to x-ray the shoulder (see this link).
The ream and run procedure (see this link).
The total shoulder arthroplasty (see this link).
The cuff tear arthropathy arthroplasty (see this link).
The reverse total shoulder arthroplasty (see this link).
The smooth and move procedure for irreparable rotator cuff tears (see this link)
Shoulder rehabilitation exercises (see this link).

Popular Posts

Monday, September 8, 2025

Surgical failures: what causes them and how can we do better for our patients. Warning: this post is lengthly but informative!

Get new posts by email:

Jason Hsu, M.D.

Corey Schiffman, M.D.

Frederick Matsen, M.D.