The Cass Review Into Gender Identity Services for Children - Part 2

What the review got right

Apr 29, 2024

persons right hand doing thumbs up — Pictured: technically correct, the best kind of correct. Photo by Sincerely Media on Unsplash

This is part 2 of my series looking at the Cass review into gender identity services for children in the UK. You can find the other parts here (I will update as I add sections):

I’m going through some of the main sections of the Cass review into gender identity services and discussing the evidence and findings. But one of the things that I think is really important, before I point out what I see as errors, is to note that much of the work is unquestionably high-quality. In particular, the commissioned evidence from the University of York team is, as far as I can tell, very good.

This is not to say that I agree with everything that the UYork studies say. I think in places the evidence is more complex and sometimes more definitive than the reviews show. To be honest, that’s what you’d expect from systematic reviews with a narrative structure - at the end of the day, a certain amount of the findings is going to be up to the opinions of the authors, which leaves a lot of space to disagree.

That being said, there are a lot of extremely passionate people online who dislike the main conclusions of the Cass review, and many of them are making arguments against the document that are simply incorrect. The problem with this is twofold - firstly, we should always be accurate when we criticize something. It’s just fundamentally important. Secondly, by making false critiques of the review, you weaken the real criticisms of the document. The serious issues with the review and its conclusions are hidden behind the noise being made about things that aren’t really problems.

Here are some of the criticisms of the review, and why they are not accurate. I will be updating this list as I see new theories pop up online.

Personal Attacks On Dr. Cass

I’ve seen a lot of hate directed at Dr. Cass online, much of it unfounded. Various news sources have reported that she’s been advised to stop using public transport due to risks to her safety, and there’s quite a bit of vitriol against her personally in any discussion of the review.

Hating the public figure behind a review like this is understandable, in my opinion. People are angry, and Dr. Cass is the head of the review which is the subject of that anger. I’m not going to tell anyone not to be angry, but I will say that by all reports and public information, Dr. Cass is a well-qualified, highly-respected pediatrician. Attacking her personally doesn’t address the review’s findings or conclusions, and it’s just generally a bad look for all involved.

They Discarded 98% Of The Evidence!

There is a false theory that the Cass review excluded 98% of the studies that they identified because these were not considered high-quality evidence. This is because, in the two systematic reviews conducted by the University of York into puberty blockers and hormones for children, of the 103 studies identified just 2 were considered high quality.

What’s happening here? The systematic reviews that looked at interventions - i.e. giving children drugs or psychological help - rated studies that they identified using a fairly standard scale called the Newcastle-Ottawa scale. This scale asks some very basic questions, like does the study follow-up all participants and if not, why not, which give the reviewers some insight into the biases that an observational study might have. This provides a somewhat objective rating of how useful a study is as evidence. In the systematic reviews in question, the authors divided studies into a low, moderate, or high quality bracket based on how well they did on this scale.

The reviews then discarded all studies that were rated as low quality, and included moderate and high quality papers into their narrative synthesis. So, firstly, the claim that the Cass review discarded 98% of the literature is simply incorrect - the reviews included 60/103 studies, and excluding a total of 42% due to low quality.

It’s important to note that quality is mostly about causal inference - it’s not that the studies are bad, per se, it’s that they don’t provide sufficient information to know if treatments cause the outcomes that the study measures. It’s not a moral judgement. No one is a bad person for doing low quality research - I’ve published low quality papers myself in the past. All this designation tells you is that the data is not sufficiently robust to draw useful causal links from, nothing more.

In terms of the ratings of studies as low quality, it’s hard to argue with the decisions of the reviewers. This study, for example, was rated as low quality. The paper looks at self-reported outcomes for people who had cross-gender hormones and surgery at a single clinic. The authors don’t explain why some people dropped out, only use self-reported scales, there’s no control group, and the data that they do present is quite limited. Whether or not you see this paper as useful, it’s undeniable that you can’t really draw causal conclusions from this sort of research. At best, it tells us that the people who went to this clinic and responded to the authors when they were asked said that they felt better after transition-related care. But there are many possible biases that make it hard to say that the medical and surgical treatments caused the decreases in psychological issues and dysphoria.

There’s a common belief that more studies means better data, but if anything quite the opposite is true. A single high quality piece of research can often outweigh dozens or even hundreds of low quality pieces of work.

In my opinion, the research team was mostly justified in excluding low quality studies, for the simple reason that such evidence doesn’t really change the conclusions. If the team had added in all of these papers into the narrative review, they would’ve come to the same conclusions, because these studies don’t shift the needle very much. I’ve personally done similar things in systematic reviews, because the entire problem with low quality research is that it doesn’t change the evidence base that you’re researching much.

They Changed Methodology Halfway Through!

This is a more interesting criticism. The UYork team pre-registered their systematic review on PROSPERO, which is a database of review registrations, in 2021. They then updated this registration in 2023 to broaden their search criteria and questions being asked.

In the registration, the authors said that they were going to used a tool called the Mixed Methods Appraisal Tool (MMAT). The MMAT is a tool that was designed to allow people to rate a variety of qualitative and quantitative methodologies at the same time. It’s mostly used when people run systematic reviews of literature that include a large quantity of qualitative work, which generally means interview and focus group studies as opposed to quantitative studies such as pre/post evaluations on depression scales and similar.

At some point, the reviewers switched to the Newcastle-Ottawa Scale for rating the literature included in their reviews. In the reviews, there is no reason given for this that I can see, which is certainly not best practice. There’s a belief going around online that the reason that the reviewers switched the scales is because the MMAT recommends against excluding low quality work, while the NOS has no such recommendation.

As someone who does systematic reviews professionally, this argument makes no sense to me. All rating for bias is to some extent subjective. While both the MMAT and NOS attempt to create some measure of objective ranking for research, they are both ultimately up to the judgement of the reviewers who are using the tools. Changing your rating scale isn’t going to magically change the conclusions of a systematic review, especially when the review uses a narrative (i.e. subjective) synthesis method anyway.

In addition, as I noted above, including low quality studies in these reviews probably wouldn’t change much, because the low quality of the papers reduces their usefulness anyway. I very much doubt that anyone cared enough about the rating scale to switch it for nefarious reasons - the most likely explanation is that they didn’t find many qualitative studies in their searches. It’s not best practice that the reviews don’t explain the differences between their registration and the publication, but on the scale of academic crimes this barely rates as a misdemeanor.

They Only Care About RCTs

A common critique of the Cass review is that the team involved only cares about randomized clinical trials (RCTs) which are the gold standard of medical evidence, and discarded everything else. As I’ve discussed above, this simply isn’t true. The reviewers intentionally used a rating scale that was appropriate for observational studies, and even using this relatively low bar there were vanishingly few examples of what the reviewers considered to be high quality research.

The review DOES emphasize doing RCTs to test the treatments, but that’s not quite as simple as you might imagine. Had there been a plethora of high quality observational research, it’s quite likely that the need for randomized trials would be, if not entirely gone, at least substantially diminished.

Sticking With The Facts

As I said, I’ll add to this list as I see arguments pop up that don’t make sense to me. I think it’s important to stick to the facts when critiquing a review such as the Cass report. There are significant issues in the document, which I have begun to outline, but some of the arguments being made to discredit the review don’t make much sense.

I think the review made some serious mistakes in both science and interpretation, but they didn’t simply discard most of the evidence, or sneakily change their methodology to get rid of important research. The real story of the Cass review is much more complex than a single weakness that entirely discredits the work.

Gideon M-K: Health Nerd

Discussion about this post