Does Gender Affirming Care Reduce Suicide Rates For Young People With Gender Dysphoria?
A new study has the internet afire.
If you’re a science writer online, there are roughly 5 topics that you should studiously avoid, because engaging with them in any way will fill up your inbox for months with angry people regardless of your position. Vaccines are one. Vaping. Oddly enough, breastfeeding. And, of course, the ongoing war about anything related to transgender people.
This week, a new study has come out that has caused a huge amount of controversy in the discussion about healthcare for young transgender people. If you read the reports from anti-treatment groups, the study is proof that young adults experiencing gender dysphoria - the feeling of (often extreme) discomfort caused by differences in self-perceived gender and a person’s sex assigned at birth - are mentally ill and need counselling rather than hormones and gender-affirming surgery. However, if you look at the actual data from the study, a rather different picture emerges about the importance here for transgender people.
If you look at the results the authors report, it seems like gender-affirming hormones and surgery are associated with a massive two-thirds reduction in suicide risk for people with gender dysphoria. That’s…quite different to the public messages from anti-treatment advocates.
In a sane world, most of would have little interest in the medical treatment of transgender individuals. About 1% of the population in most countries identifies as trans, and not all of those people even opt for medication or surgery in for their dysphoria. As a cis man, it’s rather bizarre that I’m even writing about issues with scientific interpretation in the transgender health space.
But apparently every person online is desperately invested in the genitals of other people. So let’s have a look at the evidence here and why this study seems to, if anything, support gender-affirming treatment rather than providing evidence against it.
The Study
The study in question is a very boring retrospective epidemiological analysis from Finland. The authors looked at people who had been referred to gender clinics using the Finnish national database of healthcare records, which is linked to other healthcare data such as the national suicide register. They identified everyone who had been 23 or younger at the time of referral to a gender clinic for treatment, and compared these people to a fairly arbitrary group of individuals who were matched by year of birth and sex assigned at birth on the risk of suicide.
The median age of people referred to gender clinics in this group was 19, so most of the people here are younger adults. The authors used what’s called a Cox regression model, which is a statistical technique that looks at the risk of something happening over time, to compare people referred to the gender clinic for gender dysphoria with their controls. This model allows you to control for potentially confounding factors, and in this case the authors ran an uncorrected basic model as well as one controlling for age, sex, and total lifetime contacts with psychiatric care. The original model showed a very substantially increased risk of suicide in trans people - about 430% - but in this corrected model the difference was reduced to an 80% increase that was no longer statistically significant.
The authors go on to claim that this lack of statistical significance means that gender dysphoria doesn’t require treatment, because the suicide risk that is associated with gender dysphoria is simply due to other psychiatric illnesses. They then argue that this makes hormones, surgery, and other gender-affirming care unnecessary, and that young people with gender dysphoria likely need psychiatric and psychotherapeutic interventions rather than drugs to help them transition genders.
Now, I called this analysis boring for a reason. For an epidemiological study, this is completely humdrum. One model shows one thing, adding some additional terms into the regression shows something slightly different, and this advances our understanding of some condition by a tiny fraction. If this was anything other than transgender medicine, this study would be read by a few dozen people at most, and it would be largely forgotten thereafter. The authors seem to believe that their results are incredibly meaningful, but even if you take these findings entirely at face value there really isn’t much you can read into two statistical models with wide confidence intervals. You’d need much more research to make any strong claim about how gender dysphoria impacts suicide risk, ideally with more robust methodology and a larger number of events (suicides) to run through statistical models.
But this is a politically-charged issue, and the authors are quite vocal advocates - they initially presented their findings to SEGM, an anti-hormone/surgery advocacy group who believe that transgender care should be composed primarily (if not entirely) of psychotherapy - which means that everyone cares deeply about the numbers here. SEGM has implied that these results mean that gender dysphoric youth are not actually transgender, and that the dysphoria is a result of psychiatric illness that should be treated before any gender-related care is trialed.
So what do the numbers actually show?
A Study Filled With Issues
The first problem with interpreting the analyses that the authors have conducted is that the model itself is weak. Without getting to into the weeds about the statistics here, it’s important to note that the Cox proportional hazards model that the authors have run may simply not be the correct way to analyze this data. Specifically, it’s possible that issues with the model in the study may have caused the lack of association that the authors note. Using another type of statistical model it’s quite likely that the authors would have found a statistically significant association. The model the authors run also has some signs of overfitting - in this case, putting too many additional terms into the regression with too few events to rely on, causing the model to break down. This is potentially indicated by the very wide confidence intervals that the authors report:
Basically, the fact that the model didn’t find anything is largely uninterpretable.
The way the model was run is also problematic. As I mentioned above, the authors ran a corrected analysis. What this means statistically is that they added terms into the regression model to account for the impact of age, sex, and previous psychiatric visits on risk of suicidality.
The problem here is that previous psychiatric visits have a complex interrelationship with gender dysphoria. In this case, the authors identified gender dysphoria using diagnostic codes - specifically F64.0, F64.2, F64.8 or F64.9 from the International Classification of Diseases Version 10 (ICD-10). To get an ICD-10 code like this, you have to have a diagnosis by a doctor. That means that previous psychiatric appointments directly cause a diagnosis of gender dysphoria. In addition, for some people entering the gender clinic service results in a referral for psychiatric services, meaning that a diagnosis of gender dysphoria may also directly cause more psychiatric appointments. These issues are broadly called collider stratification bias in epidemiological research - one way to avoid them is by only looking at psychiatric appointments prior to the expression of gender dysphoria, but in this study the authors did not do that.
In addition, adding psychiatric services into a regression model looking at suicide is potentially problematic. Psychiatric appointments are a marker for severe mental health issues, and by taking the impact of such issues out of the equation by controlling for them in a statistical model you may simply be removing any associations with suicide regardless of whether they are meaningful. This is another form of collider stratification bias. It’s a bit like if you wanted to look at the impact of speeding on the risk of a car crash but only looked at people going 100mph. You might find that, in this population of people who were going incredibly fast, there was no relationship between additional speed and risk of crash, even though increased speed definitely makes you more likely to crash.
By removing the impact of psychiatric visits on suicide risk in their statistical model, the authors are essentially looking at whether gender dysphoria predicts suicide in people with perfect mental health. But this only makes sense if there is no way that gender dysphoria is causally related to mental health issues. If, for example, untreated gender dysphoria caused people to get depressed and anxious, then this sort of analysis would be very problematic.
There are other problems with drawing conclusions from the study. The strategy by design was very simple, and missed a large number of potentially confounding factors. This means that you can’t really assume that any of the results are meaningful, because unmeasured variables may completely overturn the author’s results.
However, the other issues in the paper are much less important than the big elephant in the room that many people have ignored. In this study, hormones and surgery were associated with a very large reduction in the risk of suicide.
Specifically, the authors report in the last sentence of their discussion a sensitivity analysis that they ran comparing people with gender dysphoria who got treatment to those who didn’t (bolding added for emphasis):
“To explore the role of GR, models accounting for sex, year of birth and psychiatric treatment were repeated by dividing the GR group into those who had and those who had not proceeded to GR. Adjusted HRs for all-cause mortality were 1.4 (95% CI 0.6 to 3.3; p=0.5) in the GR− group and 0.7 (95% CI 0.2 to 2.0; p=0.5) in the GR+ group, as compared with the controls. Adjusted HRs for suicide mortality were 3.2 (95% CI 1.0 to 10.2; p=0.05) and 0.8 (95% CI 0.2 to 4.0; p=0.8), respectively.”
What this shows is that if you compare the people who were referred to the clinic for treatment and actually got it to the people who were referred and were never treated, there is a massive difference in suicide rates. The authors don’t directly compare these two groups, but with some simple maths I’d estimate that treated patients had a roughly 70% reduced risk of suicide when compared to untreated patients. In other words, for young trans people, hormones and surgery may be lifesaving.
The authors completely dismiss this finding, however, stating that:
“when psychiatric treatment needs, sex, birth year and differences in follow-up times were accounted for, the suicide mortality of both those who proceeded and did not proceed to GR did not statistically significantly differ from that of controls.”
What they mean by this is complex, and the next bit of this post will be quite finicky, but let’s think about this statement. It’s ridiculous for two reasons. Firstly, the authors are comparing treated and untreated gender dysphoric people with the matched control cohort. But we don’t really care about that comparison all that much - we want to know if treatment is likely to cause a reduced risk of suicide, not if there’s some difference between people attending a gender clinic who are/aren’t treated and some other arbitrarily-selected young people. If you look at the comparison between people who had medical assistance with their transition vs people who didn’t, the difference is large and likely statistically significant *.
The other big issue here is the idea of statistical significance. Statistical significance is a hotly-debated subject - generally, statisticians hate it while everyone else just uses it anyway. Basically, when we run scientific studies, we are looking for numbers that are different from a null hypothesis. In this case, the null hypothesis is that there is no difference in suicide rates between the two groups we’re looking at. The statistical test takes a look at the numbers and gives us a probability estimate - a p-value - which denotes how likely we are to see these results again in the same population if we were to re-run the study a very large number of times.
In this case, the p-value is 0.05. That means that if we could somehow replicate Finland in its entirety and re-run the gender clinic from 1996-2019, we would expect to see a difference this big or bigger between the untreated dysphoric patients and matched controls about 5% of the time. This doesn’t mean that the association is “true” or “false”, but it gives us some indication of how likely the results are, given the numbers in question.
Traditionally, a p-value of 0.05 is actually the threshold for statistical significance - most authors would say that this result meets our entirely arbitrary bar for importance. However, the authors of this study argued that we should reduce our bar for statistical significance from the arbitrary threshold of 0.05 to a lower, also arbitrary threshold of 0.01 instead. They say this is:
“In order to avoid type 1 error due to multiple testing and the large data size”
In this context, a type 1 error is when you inappropriately conclude there is a relationship between two things. This happens when you run lots of statistical tests, because there is always a chance that two entirely unrelated things will have a statistically significant relationship simply due to chance.
However, this argument doesn’t really apply to the paper that the authors have published. They’ve only really run three models - one uncorrected, and two corrected - and those models largely overlap in what they test. Even if you used the most conservative control for multiple comparisons here - which is called a Bonferroni adjustment - you wouldn’t get to a p-value threshold of 0.01.
In addition, the argument about a large sample size makes no sense at all. While there were a reasonably large number of people in this study, with 2k clinic patients and 16k controls, what we actually care about for the type of statistical test the authors were running is the number of events (suicides). In this case, we’re looking at just 7 suicides in the clinic patient cohort, split up between people who were and weren’t treated. In such situations, as I noted above, Cox proportional hazards models are notorious for misfiring and producing spurious results.
In reality, the appropriate conclusion from this study is that hormones and surgery are associated with a roughly 70% reduction in risk of suicide for gender dysphoric youth, but that this result is very uncertain. It’s plausible, based on the data here, that the true reduction is anywhere from <1% to >90%.
Bottom Line
The thing to remember here, however, is that none of this is necessarily accurate because the study itself is just not very strong. The authors did not do any of the careful work that you need to do to make inferences about causality, such as producing Directed Acyclic Graphs or considering the impact of residual confounding, to make any particularly important claims about dysphoric young people.
As I said right at the start, this paper is boring. Humdrum. I do many of these analyses each month, because they’re an everyday activity for most epidemiologists. They’re useful as background information, but you need to do a great deal more work to make any important inferences. There is simply no reasonable way that you can conclude anything about how to treat dysphoric youth based on this paper. In addition, the limited analyses that the authors did had some fairly major holes, which undercuts the conclusions of their paper even further.
That being said, to the extent that this paper is useful, the main finding seems to be that hormones and surgery are associated with a drastic reduction in suicide risk for transgender young people. It’s true that this finding is incredibly uncertain, and that residual confounding is a big issue, but that’s true of every model the authors ran. Given that both of the statistical models the authors ran are essentially identical aside from one minor change, there’s no reasonable way to ignore the findings of one and accept the findings of the other.
Ultimately, it would be a mistake to read to much into this research. But the biggest finding here seems to be that treating young people who have gender dysphoria with hormones and surgery is associated with a drastic reduction in suicide risk.
*Specifically, the hazard ratios of 3.2 and 0.8 for the GR- and GR+ groups imply that, of the 7 suicides that happened in the gender dysphoria cohort, only 1 happened in the treated group while 6 happened in the untreated group. This returns a crude risk ratio of 0.27 (95% CI 0.25-0.29) for treatment. That’s a risk reduction of 73%, and it’s very statistically significant. Of course, this is largely conjecture on my part, because the authors don’t give the exact numbers and back-calculating from a corrected model is…not robust, but it’s still quite interesting to note.