Mental health problems are comorbid, which means that they are positively intercorrelated and don’t tend to occur in isolation. Of all people diagnosed with major depression, for example, about half of them have at least one more comorbid mental health problem, such as generalized anxiety disorder or posttraumatic stress disorder. Now, the same holds for medical problems, and on top of that, there is comorbidity between mental and physical health problems, for example, cancer and depression often go together. Many have speculated as to the causal mechanism that governs this comorbidity, and today’s blog post is about one particular theory: the d (disease) factor.
The d factor
A few days ago, a new paper came out on this topic in World Psychiatry, one of the most renowned journals in psychiatry. The authors aim to answer the question why mental and physical health problems often co-occur. The short paper is entitled “First evidence of a general disease (“d”) factor, a common factor underlying physical and mental illness”. In the paper, the d factor is defined as an “underlying disease dimension [..] that accounts for the individuals’ propensity to develop mental as well as physical conditions”, and as “a general vulnerability to develop any of the included conditions”. This rests on similar work in the mental health literature, where authors have identified a p (for psychopathology) factor that is thought to explain the comorbidity among mental health problems.
In the paper on the d factor, authors fit a particular statistical model to a large dataset, and claim that they have “discovered” the d factor: “the results support the assumption of the existence of a general ‘d’ factor in adults”. The authors also claim that their discovery has “highly relevant research and clinical implications regarding our understanding and management of mental and physical conditions, as well as for service organizations”, “relevant implications for the conceptualization and classification of mental and physical conditions”, and “important implications for clinical practice and policy”.
These are sweeping conclusions that, in my view, are not supported by evidence.
Shortcomings of the paper
The paper repeats mistakes that have been made in the p factor literature; I’ll summarize 3 in some detail below.
1. Bifactor Schmifactor
First, and most importantly, the authors fit 3 statistical models to the data: a) a correlated factors model, b) a unifactor model, and c) a bifactor model. It doesn’t really matter what these models are or do in particular. What is important here is that when you fit a statistical model to data, you are doing something similar to trying to find the right lid (statistical model) to your pot (data). If the lid fits well, which is determined by so-called “fit indices” in statistics, it indicates that you have a good match between statistical model and data. This may then allow you to conclude that your statistical model represents or describes your data well. For example, a few decades ago scientists found that a statistical model “smoking causes lung cancer” fit data well, corroborating the theory that smoking causes lung cancer.
In the current paper, the authors find that the bifactor model fits the data better than the other two models they fit. Unfortunately, the bifactor model has an exceptionally high fit propensity, meaning that it is a lid that fits all sorts of pots really well. For the smoking and cancer example, this would mean that no matter how the data look like, the statistical model would tell you that smoking causes cancer because this statistical model tends to fit all sorts of data well.
Worse, it is well established in statistics that even if you buy a pot and lid together (i.e. they are the perfect match!), and you now swap out your newly purchased, precious lid for a lid from the bifactor model, the bifactor model lid will fit better although we know for a fact it is the wrong lid for the pot! Statistically speaking, this means that even if you simulate data from e.g. a correlated factors model, the bifactor model will fit this simulated data better than a correlated factor model, raising very big concerns about using model fit of the bifactor model to determine what model is best.
Unfortunately, the authors’ whole argument rests on this one point, the fit of the bifactor model:
“We found that the bifactor model fitted the data best (CFI=0.98, TLI=0.98, RMSEA=0.016) […]. Therefore, our results support the assumption of the existence of a general “d” factor in adults.”
This conclusion is false, and a little disheartening this was not caught in peer-view, given how well known the problem is in the statistical literature. There are two more problems I would like to explain.
2. Statistical equivalence
First, there are hundreds of models that represent hundreds of different causal processes, but the authors didn’t fit those hundreds of models to their data: they only fit three. This makes it difficult to conclude that they really found support for their particular theory. For example, one could have the theory that …
-
… having depression makes one more vulnerable to developing an anxiety disorder
-
… having a physical health problem such a chronically weak immune system makes one more vulnerable to develop other physical health problems due to this weak immune system
-
… the comorbidity of physical and mental health problems, such as between cancer and depression, is not explained by some underlying disease factor, but that the relation comes from the fact that people with cancer are more likely to develop depression.
This “systems” theory is highly plausible given what we know about mental and physical health comorbidities, but the authors didn’t test this theory, because they didn’t fit an appropriate statistical model to the data. If you were to fit such a model to the data, it is widely known in statistics that the model would have very good fit to the data as well! This is due to something known as statistical equivalence, or in other words, that there are multiple competing models that can describe the authors’ data equally well (there are several lids for the pot). Philosophers call this situation one where a theory is “underdetermined by data”, i.e. that data and statistical model together are insufficient to provide strong evidence for a theory. This, of course, is another big problem for the authors’ interpretation that the underlying d factor theory is supported because the bifactor model fits their data: other models they didn’t fit have at least equal fit, and there may be models with even better fit. How can they then claim they discovered the lid?
3. Discovering vs generating latent variables
Finally, I would argue that the authors did not “discover” the d factor: they “created” it. The d factor is not in the data, but one can fit a bifactor model to make a new variable, and then one can call it the d factor. But calling this process “discovery” is odd, because in any situation where a set of variables is positively correlated (no matter the causal process that leads to these correlations), I can create a variable that summarizes these variables. For example, I can simulate data in which 10 variables are correlated with each other because every variable causes one other variable: A causes B, B causes C, C causes D, and so on. Now I have 10 intercorrelated variables, and I can fit a latent variable model that summarizes these correlations as latent variable M (the matrix factor). But that has nothing to do with discovery, and concluding that the m factor “underlies” my data, or “accounts” for the correlations among my 10 variables, is not defensible given the evidence I have: a latent variable I myself created.
Conclusion
There are other challenges with the paper, but these are just your typical challenges with measures we use in psychology and psychiatry, or with the fact that the data authors have don’t lend themselves to the type of causal inference that the authors engage in (i.e. that the d factor “accounts for” the data or “underlies” the data). Ashley Watts and colleagues recently submitted a paper on the topic (I will link to it here once it is available online) that summarizes all these issues for the p factor literature, and these challenges apply to the current paper as well.
The main concern I have here is, to briefly reiterate, that the conclusion that there is evidence for the existence of an underlying dimension — the d factor — does not follow from demonstrating that a bifactor model has good fit for data. A lot more work would need to be done to support this conclusion.