What’s in a name (2): domains, scales, scores, factors & dimensions

The original commissioning specification for the CORE system required that the items in the measure covered domains of wellbeing (or “well being”, or “well-being”: there’s another naming issue!), problems/symptoms, functioning and risk.  The questions were supposed, where possible to include intrapsychic ones and interpersonal ones and functioning was to cover both more personal/intimate and more social functioning and risk should cover intrapunitive and extrapunitive risk, i.e. risk to self and risk to others.  We liked this framework and noted that the first three domains had some links to the phase model of change in therapy which suggests that well-being change comes first, then symptom/problem improvement, then functioning improvement (Howard, Lueger, Maling & Martinovich (1993) A phase model of psychotherapy outcome: causal mediation of change. J Consult Clin Psychol. 61(4):678-85).

We thought the commission specification was right that these were fairly conceptually distinct domains of experience that should be covered by a measure of change in therapy and that was supported by extensive surveys of therapists/practitioners, managers, commissioners (“purchasers” in the jargon of the time) and end users and lay people and we thought we should say which items we saw as belonging most strongly to which domains and offer the opportunity to study scores, and changes in scores, on each domain.  However, we never imagined that these would form clear “factors” or principal components in cross-sectional psychometric studies nor that the chronological relationships between them over time in cohorts or even within a single person in therapy would be neat.  If you feel lousy (low wellbeing) it’s likely that you will have or develop problems and even symptoms, and vice versa.  Similarly, struggling to function well either in personal interactions and/or at work or in caring duties will dent a sense of wellbeing and lead to problems: these simply aren’t independent factors or dimensions. 

With the advantage of hindsight it’s easy to see that we should have been clearer about that.   We tried to use the terms “domains” and “domain scores” in preference to “factors”, “dimensions”, “scales” but slipped from time to time.  We thought we were sufficiently explicit about our use of exploratory factor analysis being exactly that: exploratory, and mainly to check that there was a large main factor and a good collection of smaller factors.  We were unsurprised in our early work (Evans, Connell, Barkham, Margison, McGrath, Mellor-Clark & Audin (2002) Towards a standardised brief outcome measure: psychometric properties and utility of the CORE-OM. British Journal of Psychiatry, 180, 51–60) to find a structure that didn’t reflect the domains but which seemed to some extent to separate positively cued from negatively cued items and to separate the risk items from the other items.  We have never expected that this structure would replicate strongly in different cultures and samples and we only used confirmatory factor analysis to show just how poor the fit to a simple factor structure (Lyne, Barrett, Evans & Barkham (2006) Dimensions of variation on the CORE-OM. British Journal of Clinical Psychology, 45, 185–203).  That paper was intended to be a definitive statement about the expected psychometric structure, at least in British clinical samples.  Here’s the statement from the abstract:

The CORE-OM has a complex factor structure and may be best scored as 2 scales for risk and psychological distress. The distinct measurement of psychological problems and functioning is problematic, partly because many patients receiving out-patient psychological therapies and counselling services function relatively well in comparison with patients receiving general psychiatric services. In addition, a clear distinction between self-report scales for these variables is overshadowed by their common variance with a general factor for psychological distress.

And the end of the discussion:

These considerations with respect to the CORE-OM domains are of importance for future research and scale development, but the utility of CORE-OM has already been demonstrated as a widely used benchmarking measure and reliable indicator of change in psychotherapy research and practice. The scoring method that has proved most
useful in this regard is that in which all 28 non-risk items are scored as one scale and the
risk items as the other. This research confirms that the scale quality of CORE-OM when
scored in this way is satisfactory.

So some suggestions/pleas:

  1. by all means report change on specific domain scores if they are pertinent for the work that went on with the client/patient but don’t imply that the specific scores are well defined factor analytic scales;
  2. the risk and non-risk items are sufficiently distinct in cross-sectional psychometric studies that it may be wise to report the non-risk and risk scores as well as the total scores in almost any study;
  3. if you possibly can, talk about the scores from the CORE-OM and CORE-SF/A and SF/B as “domain scores” not “dimensions” or “factors”.

What’s in a name (1): scoring CORE measures

We may have caused a bit of confusion by introducing the term “Clinical score”.  Perhaps it’s not on the scale of the Capulet/Montague name tragedy (Shakespeare, 1591-1995?) but it may be worth explaining the scoring here as I do see mistakes and do get asked about this.

History

We started out scoring using the mean of the items and recommending pro-rating if not more than 10% of items were missing, i.e. using the mean of the remaining items.  That meant you could get a pro-rated mean overall score for the CORE-OM if as many as three items were missing, for the “non-risk” score if two of the non-risk items were missing, for the function and problems scores if one of their items was missing, and you couldn’t pro-rate if any items were missing for the well-being or risk scores.  You could get overall scores for the CORE-SF/A, CORE-SF/B if one of their items was missing (but not for domain scores as any missing item there means more than 10% of the items are missing).  Similarly, you could use a pro-rated score for the GP-CORE, the LD-CORE, the YP-CORE and the CORE-10 if one item was missing but pro-rating the CORE-5 was clearly impossible. 

All those scores had to lie between 0 and 4 by definition but they could be awkward looking numbers like 0.84 and over the early years we got feedback that many clinicians and managers didn’t like these “less than one and fractional” scores. 

“Clinical Scores”

With mixed feelings in the team, the idea of “Clinical Scores” came in: the item mean as above, but multiplied by 10 to get a score that in clinical samples would pretty much always be a x.y sort of number with x >= 1 and scores ranging between 0 and 40. The same rules about pro-rating were retained.  This “x10 = Clinical Score” gives that rather easy scoring for a complete CORE-10 or complete YP-CORE that the “Clinical Score” is just the sum of the 10 items completed (but if one item is omitted you still have to find the mean of the nine completed items and multiply that by 10).   For a completed  CORE-5 the route to the “Clinical Score” is almost equally easy: the Clinical Score is twice (2x) the sum of the five items’ scores.

We sometimes see people reporting the sum of the items: please don’t do that, we’ve never recommended that anywhere.  We also see people not saying explicitly that they’re using the original “mean item score” or the “Clinical Score”, please do say which you used even if it seems very obvious.  Finally, we encourage people always to be explicit about having used pro-rating (if you have) and to be explicit about numbers of incomplete questionnaires and numbers of items missed. This all maximise comparability of reports.  Non-comparable scoring may not be as lethal as Mantua family feud was to Romeo, Juliet and Mercutio, but it’s definitely to be avoided!

Reference

Shakespeare, W. (1591-1595, exact date uncertain) “Romeo and Juliet” available in many versions as the peer-reviewed format hadn’t been invented: quarto 1, quarto 2, first folio and later versions.  However, the fatal name issue is consistent in all.