Are you  struggling through time limitations with your clients that may cause you to  withhold testing? Or do your clients suffer from fatigue and either quit before  finishing the test or just start answering haphazardly?  We offer an alternative to our clients by  providing each of our Narrative Reports with complete profile graph, critical  items, and subscales through our prorate system.

The first 370 items on the MMPI-2 Test contain all of the items that are used in the 10  basic scales and all of the L ,F, and K scales. This is sufficient to generate  a comprehensive profile, completely justified code types, and the entire list  of the clinical subscales. We prorate all of the other scales as needed.  If your client answers items beyond 370, our  system takes this into account. However, with this short format for the test  there should be no more than 5 unanswered items before item 370.

By choosing  this alternate way of administering the MMPI-2 Test to your clients, you use  less time testing and still receive a full personality description, motivations  and behavioral dispositions, our material on the origins of problematic  behaviors, and the unique depth of our validity analysis.


One of the touchstones of psychotherapy is the quality of empathy, the capacity to feel the emotions that envelope and motivate the client. The more we understand our clients, the more effective our interventions will be. Comprehending what experiences in our clients’ lives have strongly influenced their present suffering can strengthen our connection with them.

To conceptualize all behavior as adaptive is a pivotal step in this direction: the Adaptation and Attachment Supplements are my present effort to facilitate this understanding. To have guidelines toward the recent as well as more remote past origins of the person’s current frustrations and suffering is to have a new set of hypotheses to consider in exploring what has led the person into his or her present state and circumstances. To be able to help the person understand what are his/her points of greatest sensitivity, how they have come about, how he/she is protecting him/herself, and then what the prices are that the person is paying to do this is to clarify the person’s choices and thus to enable change. The clinician is a guide or facilitator in assisting the changes that can gain the person a more gratifying life, i.e., a “facilitator of more effective adaptation.”


It is easy to become habituated to how judgmental our clinical terms are. “You are passive-aggressive,” “dependency manipulative,” or “borderline” are hardly less pejorative than (respectively) obnoxious, conniving, and crazy. The moment we start forming negative judgments of our clients we start to lose them – they will start having to defend themselves against us, we who are supposed to be their allies and protectors. The more fully we can understand what conditioning experiences and biologic vulnerabilities have operated to shape the client as he or she presently is, the easier it is to keep our own good-bad perceptions out of the way.


Note that the words “maladjustment” and “maladaptation” do not appear anywhere else in this website other than this one paragraph. Maladaptation is an observer’s judgment as to how people’s ways of reacting are not gaining them the gratifications and goals that they reasonably might pursue or perhaps explicitly desire. Given the strength of private self-justifications, “I am maladjusted” or “I am a dysfunctional individual” is rarely or at the most limitedly a part of most people’s private self-perceptions, even if defensively proclaimed in order to blunt someone else’s criticisms and judgments (“You may be right, I guess I am.”). But the private self-statement is far more likely to be, “But I had to do that,” than “I just did that because I am so badly maladjusted.”


As people with elevated scores on the Pa-3 subscale so consistently demonstrate, the judging person knowsthat he/she is right. Thinking in terms of all behavior as being adaptive can dilute and help relax what seems a natural if not nearly universal vulnerability to become judgmental when frustrated or threatened. We see the challenge is not to never be judgmental but rather to become aware that we are protecting ourselves against our own discomforts when we become judgmental.


Empathy is a major bridge to compassion; the path we are proposing is from etiology to empathy to compassion. Note how strongly the originators of the major religions have advocated compassion. For example, Christ spoke of caring for the most vulnerable: “Inasmuch as ye have done it unto one of the least of these my brethren, ye have done it unto me.” Matthew 25:40. Buddha made compassion a core focus of his thought, not to cause suffering to another sentient being. The Dalai Lama emphasizes this. If you truly appreciate another’s pain and hurt, you become aware of whatever pains your own actions are causing them.


Our clients so often have fears and other reactions they do not want; seeking to change themselves is typically why they seek help. For the clients to understand that their distress is the natural outcome of what they have endured and survived is to see more clearly what it is that they want to change. A natural benefit of this conceptualization is less judgmental negativity and increasing compassion for themselves as well as others. Our world is one of ever-increasing population demands on finite resources, with all the frustration, aggression, and violence that can ensue from that. The quality of life – if not the chances for the survival of the species itself – may be improved by whatever increases in compassion that we can contribute to society.


Whatever fresh insights may arise from the collaboration of Buddhists and neuroscientists, it is my hope that these may lead us to become more and more “warm-hearted persons.” I would like to conclude this essay with the Dalai Lama’s own concluding words:

Assorted topics: brief comments and opinions

1. What clinical contributions can the RC scales make? Are there potential uses? What validation do they need? How can we respond to misleading RC testimony? I do not believe these are the “cores” of the clinical scales as claimed; (for detailed discussions of the deficiencies of the RC scales see Butcher et al., 2006, Caldwell, 2006, and Nichols, 2006). The RC scales (Tellegen, A., Ben-Porath, Y. S., McNulty, J., Arbisi, P., Graham, J. R., & Kaemmer, B. (2003). The MMPI-2 Restructured Clinical scales: Development, validation, and interpretation. Minneapolis: University of Minnesota Press) depend heavily on the face validity of the items, and the items are so transparently obvious that the “scored direction” usually is readily discernable (just as with the content scales). The one unique and clinically interesting contribution is the RC8 scale, the restructured version of scale 8-Sc. RC8 collects the reality-disturbed but non-paranoid content in 8-Sc (feelings of unreality, peculiar experiences, hearing strange things, etc.). Their subjective realness to the individual may at times bypass the often pervasive suppression of persecutory content when L is much elevated. These RC scales need to be studied by neutral researchers who have no investment in the outcome, especially in circumstances that include incentives to bias one’s responses, both self-favorably and self-negatively in different samples.


Suppose that in a custody case, for example, the opposing parent has by history a problematic conscience, e.g., financial deceptions, abusiveness, dishonesty, or etc. The opposite expert utilizes a very low score on the RC4 scale to argue that that parent does not test as psychopathic, i.e., is not a scoundrel, can be trusted to reform as promised, etc. Your MMPI-2 profile has an at least somewhat elevated score on scale 4-Pd as well as an L and/or K score that is mid-60’s or higher, etc.: the parent was clearly defensive in responding to the test. The opposite expert can be presented with the RC4 items together with the parent’s corresponding responses and be asked to read the RC4 items to the court. (Note that the RC4 items are obvious as to what is the desirable response (trouble with the law, truant in school, past petty thievery, engaged in fights, etc.). After reading the items to the court, the expert can be asked such questions as, “This client has completed at least three years of college, hasn’t he/she doctor?” “Would these items not be quite obvious as to what is the more favorable response?” “Has this parent not already been shown to have been quite defensive toward the test?” Note the face valid interpretation of the RC4 items: such presentations as this can help to expose the shallowness of depending on face valid uses of item responses.


2. Where does the Ho scale come from? What can it tell us? The Cook-Medley Hostility scale was originally developed as a schoolteacher student-applicant-screening scale. Those who tended to see many of the kids as potentially ill-behaving brats often needing restraint tended to fail as teachers; those who mostly saw them as generally interested and wanting to learn much more often succeeded. This unexpectedly showed up many years later as the best predictor of atherosclerosis in the 25 year follow-up of physicians who had been University of North Carolina medical students. Such cynicism was seen as making an interesting contribution to atherosclerosis. The content scale CYN seems directly fashioned after Ho, which latter I believe is a bit more subtle and a somewhat better scale.


3. What are the utilities of the newer addiction scales AAS and APS? AAS (Addiction Admission Scale) is basically a set of critical items regarding chemical abuse, extensively overlapping the corresponding critical item sets for chemical abuse in our and other computer-generated reports. We provide AAS because it appears useful in order to pick up some test misses from the MAC-R, although it can overweight episodic abuse whereas the MAC-R appears more a predictor of daily dependence. APS (Addiction Potential Scale) is a competitor to the MAC-R. I haven’t seen much research on it since the original derivation of the scale. I think it is a bit less subtle and not as consistently effective as the MAC-R, which is one of the most non-transparent and non-obvious sets of items of which I am aware (liking the work of a forest ranger, liking to cook, etc.). We need a non-obvious set of items because of the frequency if not also the urgency of the motivation to disguise one’s problems with chemicals. I think the MAC-R just works better, and it has a far longer history and breadth of acceptance.


4. How do the GM and GF scales for the MMPI-2 fit into our thinking about gender identity and gender roles? Gender Masculine and Gender Feminine are curious scales. To a considerable degree they are the more stereotypic, expected responses such as to like hunting vs. liking to play “house” as a child, etc. What is most curious is how low the correlations are (mostly +.20s to +.30s) not only with Mf but also with each other. A person can have high scores on both scales with a diversity of gender-related interests as well as being low on both, reporting relatively few such interests in either direction. Mf contains numerous items about feelings of interpersonal sensitivity, avoidance of loud, rough play, etc., that do not appear in either GM or GF. GM and GF appear to me to be more gender role related versus Mf as predominantly gender identity determined.


5. What is the usefulness of the S scale, Superlative adjustment? Scale S seems to me to be confounded with at least three elements: conscious defensiveness (airline pilot training applicants), genuinely good adjustments (creeps and weirdos need not apply to be airline pilots), and an over-control quality akin to or associated with the 34/43 MMPI code that is relatively frequent among pilot applicants. Thus confounded, it contains a quite limited amount of unique information, and I find limited use for it because of the uncertainty as to which psychological aspects are contributing to the elevation. Some clinicians like the subscales of the S scale as illuminating specific attitudes on an item content basis.


6. Where does the F(p), or psychiatric F scale fit into validity assessments? This is an interesting scale. The original F scale was a set of responses that normals rarely made. F(p) is a set of responses that neithernormals nor patients rarely make. The items carry a distinctly greater degree of implausibility, e.g., all laws should be thrown away, than the other rare response and malingering scales. F(p) seems to work well with severely disturbed populations, e.g., state hospital and VA patients (the latter often being (1) seriously disturbed, (2) concerned about their pensions, and (3) of lower SES levels, all three of which can simultaneously inflate both their clinical profile elevations and specifically their scale F elevations. Note that the correlation of F and the Ss or socioeconomic status scale (Nelson, 1952) is -.77 in the Caldwell Data Set, n = 52,543, a huge contribution of Ss (SES) to F. The correlation of Ss to F(p) is .59, considerably less of the F(p) variance, 35%, as compared to the 59% contribution (r at = .77) of Ss to F. In addition, the F(p) scale has a serious flaw: four items overlap the L scale, being more extreme and rarely claimed denials of less than proper behaviors. But this subset of items operates in the opposite direction. Thus, an unsophisticated, guarded subject can get an elevation on F(p) from being defensive and perhaps somewhat ill-comprehending in item interpretation without any effort to fake sick: beware F(p) and L up together. The scale needs to be re-normed without the L items.


7. What about non-gendered T scores? Clinically, I think this is psychometric bosh and nonsense. As with the non-K-correction idea, all of the useful interpretive research on the code types has been on the traditional gender-specific and K-corrected original MMPI norms. There is no body of in-depth pattern research on the MMPI-2 where, for about one-third of psychiatric patients and just over one-half of normals, the same set of raw scores will generate two different codes using the MMPI vs. MMPI-2 norms. Also, we need gender-specific norms to filter out – as best we can – gender-determined but non-pathological influences on item responding and scores. For example, averaged across women, do their menstrual cycles and associated physical vulnerabilities tend to heighten their attention to the states of their bodies? Ironically, the consequences of using non-gendered norms for anti-discriminatory reasons end up being more discriminatory against women than are the gender-specific norms. For example, any average heightening of somatic attention tends to raise the raw and T scores on scales 1-Hs, 2-D, and 3-Hy. On the average, women do answer a few more somatic items, but I believe this is primarily on a commonly shared experiential basis rather than on an individually psychopathological basis. In sum, I have yet to perceive any clinical utility for the non-gendered norms.




Butcher, J. N., Hamilton, C. K., Rouse, S. V., & Cumella, E. J. (2006). The deconstruction of the Hy scale of MMPI-2: Failure of RC3 in measuring somatic symptom expression. Journal of Personality Assessment, 87(2), 186-192.


Caldwell, A. B. (2006). Maximal measurement or meaningful measurement: The interpretive challenges of the MMPI-2 restructured clinical (RC) scales. Journal of Personality Assessment, 87(2), 193-201.


Nelson, S. (1952). The development of an indirect, objective measure of social status and its relationship to certain psychiatric syndromes (Doctoral dissertation, University of Minnesota, Dissertation Abstracts, 12, 782. Discussed in Caldwell, A. B., (1997). Whither goest our redoubtable mentor, the MMPI/MMPI-2? Journal of Personality Assessment, 68(1), 47-68.


Nichols, D. S. (2006). The trials of separating bath water from baby: A review and critique of the MMPI-2 restructured clinical scales. Journal of Personality Assessment, 87(2), 121-138. Tellegen, A., Ben-Porath, Y. S., McNulty, J., Arbisi, P., Graham, J. R., & Kaemmer, B. (2003). The MMPI-2 Restructured Clinical scales: Development, validation, and interpretation. Minneapolis: University of Minnesota Press.

What are the arguments regarding the use of the MMPI-2 vs. MMPI-A with adolescents?

For the first 24 years of developing my MMPI interpretation system, there was only the original MMPI. There was extensive research showing good, validly interpretable profiles down to ninth grade/age 15. This included an extensive research program (books by Hathaway & Monachesi, 1953, 1961, 1963) based on the testing of over 15,000 ninth grade students in Minnesota around 1950. This included a major exploration of the true longitudinal prediction (rather than concurrent or post-diction) of subsequent juvenile delinquency. There was the work of Marks, Seeman, and Haller (1974), with special adolescent norms, and there was a large number of published individual studies. The score curves by age levels on the F scale show a small but gradual increase from 18 down to 15 but an increasingly steep slope below 15. My system contains a large number of internal adjustments for adolescence (especially regarding psychoticism lest it be too readily presumed to be chronic given the still developing age level). I actually ended up having to start the age adjustment process from less than 22 on down (except 49/94 code where age 21 is already adult).


With the revisions, what are the consequences of using one form or the other? When the MMPI-2 revision committee (Butcher, Dahlstrom Graham, Tellegen, & Kaemmer, 1989) developed the MMPI-2 revision, they decided to separate the adolescents, and they ended up setting cutoffs at up through age 17 or both age 18 and living at home as adolescent (use the MMPI-A) and age 19 on up or both 18 and not living at home as adult (use the MMPI-2). The main issue that the proponents of the MMPI-A (e.g., Archer, 1997, and others) have argued is that the adult form seriously over-pathologizes the teenager – so many get substantially elevated profiles. I have argued that the MMPI-A seriously under-pathologizes the adolescents – a quite sizeable proportion of adolescent psychiatric inpatients get normal range profiles. In truth I think each of us is partly right. My feeling is that the psychological and hormonal turbulence of adolescence really shows in the MMPI-2 scores, but comparing turbulent inpatients to an also relatively turbulent normative teen sample on the MMPI-A makes everyone look comparatively normal. In the end I feel that, with appropriate age allowances, the adult MMPI-2 codetype gives us a better basic understanding of the adolescents’ issues and behaviors as well as considerations as to their origins.


With adults we see long term qualities of behavior in the MMPI profiles, including childhood etiologic factors. What about adolescents? I do think that an adolescent profile should be thought of as much closer to a high speed photograph of a rapidly moving target, whereas an age 22+ adult profile can be much more of a slow speed, in-depth portrait. Such an upper age level is consistent with the early to mid-20’s completion of the myelination of affective cerebral neurons; the inhibitory affective control systems are the last part of the brain to fully mature. This means we must be careful not to project undue stability or fixity over an extended future time to the behavioral implications of a teenage profile. Such parental assertions as, “Sometimes she’s so remarkably responsible, she’s like 11 going on 21,” or “Today I swear he’s 18 going on 12,” reflect how different an adolescent can be from one occasion to another. I think that as events in the adolescent’s life correspond emotionally to prior events, both ugly and beautiful, that those re-stimulations then determine the emotional state of that day (or whatever relatively brief time interval). In any case, my emphasis is that the goal is the most accurate and useful prediction of behavior we can generate, possibly with cues as to what prior emotional episodes are being re-aroused.


Which source of information, then, is more clinically workable? The original MMPI items feel very dated to today’s adolescents, but the MMPI-2 edits make it much more comfortable for them. The MMPI-A benefits from being a bit shorter (which they usually like) as well as having yet a little more comfortable wording for adolescents. Nevertheless, I believe that the very large amount of pattern or code type research on both adults and adolescents using the original version versus so comparatively little configural data on the MMPI-A favors the predictive power of the MMPI-2. This gives us the most useful information from the broadest research base and most extensive accumulation of clinical experience. This is not to say that this is an ideal solution because one still has to remind oneself to allow for the turbulence of adolescence and the greater potential for future change in the results obtained. Considering all this, in my interpretation service I continue to interpret MMPI-2 protocols from adolescents with all the internal age adjustments included. I do not encourage testing below age 15, although bright and mature 14 year-olds can often get profiles that are valid by all available criteria as well as being good “clinical fits.”




Archer, R. P. (1997). The MMPI-A: Assessing adolescent psychopathology (2nd ed.). Mahwah, NJ: Erlbaum.


Butcher, J. N., Dahlstrom, W. G., Graham, J. R., Tellegen, A., & Kaemmer, B. (1989). MMPI-2: Manual for administration and scoring. Minneapolis: University of Minnesota Press.


Hathaway, S. R., & Monachesi, E. D. (1953). Analyzing and predicting juvenile delinquency with the MMPI. Minneapolis” University of Minnesota Press.


Hathaway, S. R., & Monachesi, E. D. (1961). An atlas of juvenile MMPI profiles. Minneapolis” University of Minnesota Press.


Hathaway, S. R., & Monachesi, E. D. (1963). Adolescent personality and behavior: MMPI patterns of normal, delinquent, dropout, and other outcomes.

Minneapolis” University of Minnesota Press.


Marks, P. A., Seeman, W., & Haller, D. L. (1974). The actuarial use of the MMPI with adolescents and adults. Baltimore: Williams and Wilkins.

What does scale 5-Mf measure? Hasn’t it changed over time? What do more extreme scores..

Hathaway (see Dahlstrom & Dahlstrom, 1980, pp. 73-75) primarily used gay vs. straight males to develop the scale. Secondarily he used the old Terman-Miles (1936) test, and thirdly (least weight) he looked at adult male vs. adult female item response differences. He sought to measure what he saw as importantpsychological differences between gay and straight males (in contrast to object choice prediction), and recent neuroscience research looks potentially supportive of his perception.


He did a separate Fm scale for lesbians vs. straight women, but the scale did not work very well, in my hindsight possibly because of a greater heterogeneity of gender identities in the sample of women. For example, Kinsey et al. (1953) showed that gays strongly tended to remain gay, and Hooker studied how difficult it is for gays to shift to (at least outwardly) straight gender roles. Women can shift from sexual relationships with men to women and back – or to being sexually inactive – much more easily than gay males. Hathaway et al. felt it would be less confusing to have a single scale, although four items that report being concerned about sex are scored in the opposite direction. Note that the norms go in the opposite direction (high is resemblance to the opposite gender, so a high score is feminine for men but a low score is feminine for women).


In understanding scale 5-Mf, I find it helpful to distinguish between gender identity, gender role, and object choice. Object choice is highly idiosyncratic and I think often susceptible to chance circumstances in terms of the person’s earliest and often acutely intense adolescent encounters. Self-consciousness and socially problematic behaviors often lead to object choices being concealed. This can also drive discrepancies between one’s role and identity; gender role can be a mask, and it often is. Regarding Mf and possible changes over the years, a followup study showed a high correspondence of the same descriptors showing up for high and low scoring male and female college students after a 30-year period (Todd & Gynther, 1988): there was a remarkable absence of change. Lists of descriptors reflect the expression of both role and identity, but I believe that if there were changes in the underlying gender identities, the behavioral expressions would show at least some changes. I like to think that there has been some increment in the general acceptance of cross-role behavior over time (e.g., unisex clothing, gay marriages, etc.) But to me, the Mf scale is primarily about gender identity, and I see little if any change in this basic underlying dimension.


Regarding interpretation, I propose that the underlying dimension is best summed up as being defined by an orientation toward actions on the masculine end and an orientation toward feelings on the feminine end, construing both “actions” and “feelings” broadly. This helps get us off the hook of such pejorative terms as fag, queer, bull dyke, etc. They hardly follow from the scale anyhow, especially since the use of such words is largely in response to gender role and object choice more than to identity. The lowest scoring males I have seen or known often have strong exploratory urges and individual needs for mastery over nature; they may not understand how women can apparently sit and talk all day. This does not, however, presume the abusive attitudes often attributed to machismo; other scales assess aggressive potentials. Some are well aware of the vulnerability of women, and they can be protective and gentle in a nevertheless very masculine way. High T-scoring males are much more interested in what is happening in the personal (emotional feeling) lives of the people around them as well as their own feelings; they may find hunting, boxing, etc., distasteful if not repugnant.


Adjectives more often descriptive of lower scoring women have included approachable, charitable, emotional, lazy, and tolerant; they have also been seen as worldly, sensitive, and self-dissatisfied. Many seem protective of their rights to the expression of their feminine identities (especially if they have felt exploited or oppressed). Higher scoring women often value physical strength and endurance and may be seen as adventurous, calculating, self-assured, exploitative, and self-confident. They often have tomboy elements in their histories and subtly masculine traits (largely independent of object choice). In one study, by far the strongest behavioral association with scale 5 in women was their answer to the question, “If you had to choose between your mate versus your work which would you choose?” The “action” end chose work and the “feeling” end chose mate.


I see the feminine end of the scale (both genders) as being more strongly associated with verbal expressions of anger and aggression (almost “verbally only” toward the extremes), and that physical expressions of aggression are somehow more a part of the nature of the world for those scoring at the masculine end (lower 5 men and higher 5 women). That would not be a specific prediction of behavior in an immediate sense but rather a shift of potential thresholds. I would add that I think that when low-5 women do strike out physically, it is apt to be more pain- and distress-inflicting (e.g., an old French movie with low SES women working in a laundry-a fight broke out and they were tearing out each others earrings and scratching each others’ faces), and that when high-5 women fight, it is more like male fighting over territoriality and prerogatives. Masculine aggression may often be seen as just enough for an efficient control of a child’s or another person’s behavior, and not necessarily as deliberately mean or cruel or tormenting; the latter, like machismo above, depends on other scale elevations such as 4-Pd, 6-Pa, 8-Sc, and 9-Ma.


Let me close with observations on the highest scale 5 score (female) I have ever seen. A woman came to the psychiatry unit at the University of Minnesota (1950’s) wanting help in order to get married to another woman who also wanted the marriage. At a T just over 90, she reportedly had never worn a dress in her life, she was expert in using tractors, she went hunting alone, and she loved to go fishing alone. Psychologically she was “a man with a practical action problem.” Given a satisfying complementarity of roles, their only problem was they could not have children, but they had accepted that. Unfortunately, the issue was a legal one and not psychiatric, so we were not really able to help her.




Dahlstrom, W. G., & Dahlstrom, L. E. (1980). Basic readings on the MMPI: A new selection on personality measurement. Minneapolis: University of Minnesota Press.


Kinsey, A. C., Pomeroy, W. B., Martin, C. E., & Gebhard, P. H. (1953). Sexual behavior in the human female. W. B. Saunders: Philadelphia & London.


Terman, L. M., & Miles, C. C. (1936). Attitude-interest analysis test. McGraw-Hill Book Company, Inc.


Todd, A. L., & Gynther, M. D. (1988). Have MMPI Mf correlates changed in the past 30 years? Journal of Clinical Psychology, 44, 505-510.

What is the K scale really about?

Rather immediately in the development of the basic MMPI scales – in the early 1940’s – it became apparent that answering such an array of personal items is inevitably subject to biases from the varied attitudes and approaches that subjects take. One alternative was for the clinician to make approximate judgmental efforts to avoid unfortunate over-interpretations of the scores of self-critical respondents as well as serious under-interpretations of the scores of guarded and defensive respondents, but this introduced an unacceptable amount of error in the unreliability of such judgments. They felt, therefore, that the scales they had developed had to have measured adjustments to “correct” for these biases. In a highly regarded and remarkably thought-provoking talk, Meehl and Hathaway (1946; republished in Dahlstrom & Dahlstrom, 1980) detailed their efforts to quantify the potentially distorting effects of such biases and attitudes. In their article, “The K factor as a suppressor variable,” they published the development of all three basic “validity” scales, LF, and K. Here I will discuss the origins of L and F briefly and then K in more detail.


Where did the L scale come from?


The L scale was an explicitly a priori scale influenced in part by prior studies of honesty in grade school students. For this scale the challenge was to create a set of statements that were: (1) usually too good to be true and (2) rarely responded to by normal subjects. In the aggregate, then, it would be extremely rare for anyone to sincerely answer a large proportion of them in the scored direction. (As a marker of their success, there are only 15 items. I can recall having seen a raw score of all 15 items three or four times in my life from tens of thousands of profiles.) This scale immediately became useful in detecting efforts to “look good” in less sophisticated subjects, but it was quite uneven to ineffective with college-educated subjects. In my experience, it is also confounded by having two contrasting sources: (1) deliberate faking good and (2) a naive properness in less educated persons who may have high, rigid, and literal-minded religious beliefs or other strict personal values. (Because the second of those two contingencies is sincere responding, I never use “Lie scale” as the automatic designation of the L scale.)


What is the origin of the F scale?


The F scale was also a non-criterion group scale. It was the selection of 64 items (60 in the MMPI-2) answered in the scored direction (true or false) by less than 10% of their normal sample, and usually by less than 5%. It is an “infrequent response” scale; the letter F could be understood as standing for “frequency/infrequency,” but that seems clumsy. Better, I believe just to think “the F scale” as well as just “the L scale.” The premise was basically that a large number of rarely made responses alerts us to the fact that something may be going wrong with either the person’s approach to the items or with our tabulation of the person’s responses. This scale can also be elevated by a variety of biases and sources of distortion. For present purposes the following list is what I believe to be the three main operating elements: (1) deliberate attempts to “look bad,” (2) marginal or overtly psychotic ideation, and (3) socioeconomic status/education


(discussed below). Secondarily there are such factors as limited literacy, intoxication when taking the test, non-psychotic idiosyncracies of thinking, perhaps mistaken understandings of the instructions, etc.


Besides Potassium, what does K stand for?


The development of the K scale was far more complex. The idea of a non-obvious scale to measure the overall tendency to “look good” or “look bad” seemed reasonably straightforward, but the requirements they put on it made it anything but. These included: (1) They wanted a scale that “worked at both ends,” i.e., effectively discriminated both “looking good” and “looking bad” (many of their preliminary scales worked reasonably well in one direction but less well in the other). (2) They wanted a rigorous empirical selection of the scale items from a large item pool, partly because, “Those items whose significance would not have been guessed by the test-maker will then be equally mysterious to the testee” (see the “K factor article” in Dahlstrom & Dahlstrom, 1980, p. 86). Thirdly, since their goal was a scale to be used to correct their clinical scales for ever-present “look good” or “look bad” tilts or dispositions, they wanted the correction scale not to be weighted with or biased by psychopathology-ideally not at all. This combination of requirements – especially the last – set up a major set of hurdles.


In over two years of intensive work, they developed an untold number of experimental scales (too many, they wrote, to report in detail). There were conscious, i.e., by instructions, fake good and fake bad scales, and they also generated presumptively self-negative scales (functioning normals with disturbed profiles) and presumptively self-favorable scales (psychiatric inpatients with normal range profiles), the latter two sets making assumptions as to the direction of distortion but not as to the extent of conscious intentionality. They finally settled on a 22 item scale derived from a group of 50 inpatients, mostly with diagnoses of “psychopathic personality, alcoholism, and allied descriptive terms indicating behavior disorders rather than neuroses” (Dahlstrom & Dahlstrom, 1980, p. 99) as the best of the lot, although they adamantly stressed that it was the performance of the set of items that mattered and not the group of origin. This 22 item scale was designated L6 as a variation that happened to have a requirement of at or over T-60. (This scale barely beat out scale N, which was Meehl’s own doctoral dissertation, but which scale contained a bit too much loading of psychopathology.)


This preliminary scale L6 still had a serious defect: a subset of (more or less) psychotically disturbed and severely depressed patients consistently got low raw scores reflecting their very low self-esteem. Therefore, L6 still remained undesirably influenced by psychopathology. From their scales for conscious distortion, they set out to identify a set of items that were not influenced by instructions to fake in either direction. From this set they found eight items that nevertheless discriminated the severely disturbed patients from the normals. These eight were scored in the patient response direction. The effect of adding these eight items to the 22 L6 items was to bring the average patient raw score on the 30 items back up to that of the average score of normal subjects. Thus, the final K scale and the K-correction appear to be no more than minimally affected by the presence of psychopathology.


I would see the goal of the K-correction in effect as being to identify an optimal estimation of what the T-score would have been had the person been straightforward. K then is operating as a threshold for the reporting of self-negative feelings and socially problematic attitudes. The basic clinical scale items to which the high K person responded in the scored direction despite a strongly self-favorable bias (whether conscious or not) would then, in effect, carry much more weight per item, that is, reporting distresses and shortcomings despite a strong reluctance to do so. This is compensated for by adding an above average amount of K. Similarly, the basic scale items responded to by someone with a low K score have a much lower threshold for their admission. These should be given less weight, and this occurs as the consequence of adding a smaller than average amount of K. This has a balancing and I believe beneficially homogenizing effect on who is included in which codetype: it becomes the optimal estimation as to which is the person’s appropriate codetype. Note that the non-K-corrected codetypes would quite often be different from the K-corrected (Wooten, 1984). There is little or no research on those non-K codetypes, and their test results would be much more confounded by test-taking-attitudes than are the K-corrected codetypes.


It was not factor derived; why was it called the K factor?


In the “K factor” article Meehl and Hathaway (1946; see Dahlstrom & Dahlstrom, 1980) went on to a factor analysis of a curious group of clinical and arbitrary “variance analyzing” scales. In this analysis the K scale emerged as central to a single factor with negligible residuals. They then went on to argue that there is too much imprecision in our measurement of personality to sacrifice any accuracy for the sake of internal consistency, i.e., factorial purity. Indeed, they argued that, “From both the logical and statistical points of view, the best set of behavior data from which to predict a criterion is the set of data which are among themselves not correlated.” (op cit., p.117; see also McGrath, 2005). They fundamentally rejected the construction of personality scales on a factor analytic basis, and they concluded, “Since scales are so very ‘impure’ at best, there does not seem to be any very cogent reason for sacrificing anything in pursuit of the rather illusory purity involved.” (op cit., p. 116). To my awareness, this carefully developed argument has never been refuted; instead it has been ignored for decades with endless factor-analytic (high alpha) test construction efforts, up to and including the recent Restructured Clinical or “RC” scales. Note also how few tests based on factorial scales have gained extensive clinical usage in personality assessment. I would urge everyone seriously involved with the MMPI-2 – above all if teaching or supervising its use – to study the Kfactor article and make their own decisions regarding these arguments. I believe this is the most important article ever written to understand what has made the MMPI so unique (see Dahlstrom & Dahlstrom, 1980, or contact us for a copy).


What correlational properties potentially affect interpretation of the K scale?


The Caldwell clinical data set (1997) is a mixture of clinical cases plus a good scattering of mildly disturbed and relatively normal subjects, a total of 52,543 individual protocols. The sample is significantly overeducated by the census but significantly less so than the MMPI-2 normative sample, of which latter 45% had graduated from college and 18% of the total normative sample had done postgraduate work.


In this data set, the K scale correlated 65 with the socioeconomic status scale (Ss, Nelson, 1952). This K to Ss correlation suggests that approximately 42% of the variance of K is due to SES and similarly, of course, education (note the analyses of this data set in Greene, 2000). The correlation of K with Mp (Malingering Positive, Cofer, Chance & Judson, 1949) was .50 suggesting that about 25% of the K variance can be explained by conscious defensiveness as measured by the Mp scale. Wiggins’ Sd (social desirability, 1959) correlates .28 with K, which might be another 8% except that Mp and Sd correlate .75; their combined contribution to the Kvariance would be slightly over 25%. The correlations of Mp and Sd with Ss are quite low and thus SES and conscious defensiveness are essentially independent of each other. Curiously, scale R (Welsh, 1965) correlates .30 with K and is negligibly or even negatively correlated with these other three scales, so it is close to a 10% contribution to the variance of K, and it is almost entirely independent of both SES and conscious defensiveness.

An almost totally unappreciated point is that – without our realization or appreciation – K has been correcting for the impact of SES from the time of its invention. The widespread use of reasonably educated and bias-free samples together with their usually middle class or higher SES in most of the negative studies on the K-correction has operated to conceal this function of K In addition, this is consistent with the correlation of Ss with the F scale: an almost startling -.77 (approaching 60% of the variance of F!). This shows how lower SES subjects do not learn what not to say whereas well educated subjects are in effect trained in what not to say as well as what you do say and just how you say it. I believe that understanding these relationships can considerably expand our understanding as to in what ways and how broadly the K factor is crucial to our interpretation of MMPI profiles.


In normal subjects (with no motives to bias their responses) the scores on K are quite stable over time. The longest term of followup of which I am aware is by Leon, Gillum, Gillum, and Gouze (1979) as part of a longitudinal cardiovascular research study. The 30 year reliability of K was .434; of the thirteen basic scales this was exceeded only by 0-Si, 5-Mf, 9-Ma, and 2-D in decreasing order. For the five partial interval retestings, ranging from 6 to 24 year intervals, the reliabilities of K ranged from .502 to .673 with three of the five over .60. Given that these subjects had no incentive to distort, this would be attributable to the effects of socioeconomic status and the emotional reserve of the Welsh R scale; both of these are attributes that one would expect to be reasonably stable over longer periods of time. Circumstantial demands to look good or bad would, of course fluctuate by the occasion, so their impact – had it been present – would have led to much lower long term reliabilities, but the absence of bias here led to correlations that are high considering the lengthy time intervals.


In his discussion of the K-correction, Greene (2000) comments on studies in which K was interpreted as a measure of personality integration and healthy adjustment (p. 95). This is directly consistent with the r of .65 for Ss with K, especially given samples of subjects with no incentive to distort or bias their responses. By many lines of evidence higher levels of SES and education would be expected to be associated with better personality integration. The component of consciously biased responding on K in effect was essentially inactive in these studies. Thus the assertions of an association of K with personality integration and healthy adjustment are validated mainly as a function of SES.


Could they validate the K-correction?


In order to test whether this new 30 item K scale was working as a correction scale, they did several experimental sequences with profiles falling between T-65 and T-80. They regarded this as the problematic or critical range since scores over T-80 would almost always be pathological, and scores below T-65 would be too low to be elevated to an assured level of belonging in the psychopathologic range. They then created four mixed sets of patient plus normal profiles and cut each batch at K over versus K at or below the T-50 average. The hypothesis was that those scoring above T-50 K would disproportionately be defensive patients and those below would more often be self-critical normals. Each of the four sets of data supported this hypothesis; recognizing the major difficulties in fully cross validating such a complex scale, they felt this at least showed that the K scale was working in the direction in which it should.


Is the K-correction still working?


There have been objections to the continued use of the K correction, with statements such as, “The bad news is that the K-correction doesn’t do anything; the good news is that it doesn’t do anything.” The argument then is that it should be abandoned. The main anchoring data of such assertions have mostly been in studies of subjects who responded straightforwardly with no identifiable or consistent-across-subjects incentives to bias or distort their responses. These studies define circumstances in which the K-correction indeed has little to do – especially among reasonably educated subjects responding straightforwardly. These are precisely notthe groups for whom the K-correction was designed.


A problem in at least a few of these studies was that the “criterion” has been a rating sheet filled out based on a single session with the subject. The ratings then are based very largely on what the person just said. But the non-K-corrected scale is also what the person just said. This operates as an experimental bias in favor of the non-K-corrected scale. For example, Barthlow et al. extended this to three hours (three sessions) which is a significant improvement in discovering the person’s self-deceptions, role playing, and other sources of a potential misfit of self-reporting. Three sessions is still much less “tuning in” to the subject’s potential biases and distortions than would be expected from a month’s admission to a University Hospital inpatient service, which was modal in the origin of the MMPI. Considering this experimental bias, it is perhaps a bit surprising that in Barthlow et al.’s data some of the correlations with the K-corrected scores exceeded at all the correlations with the non-K-corrected.


Putzke, Williams, Daniel, and Boll (1999) tested 61 patients with end-stage lung disease waiting for lung transplantations with considerable uncertainty whether a lung might become available in time to save their lives. The context defined a strong “pull” to appear psychologically healthy and deserving of priority. The 30 patients with higher scores on K (using a median split) obtained significantly lower raw scores on all of the K-corrected scales as well as Si (K-to-scale overlapping items having been deleted). After K-correction there were no significant differences except that Hs appeared possibly to have been a bit over-corrected. In this setting where there was a clear and consistent incentive to “look good,” their data strongly supported the use of the K-correction. This was precisely the sort of group for whom the K-correction was designed. As the Putzke et al. study illustrates, the K-correction is working very well when and where it should: when the person has a strong incentive to bias the test responses in order to look too healthy or too disturbed.


In civil forensic actions such as child custody and denied employment, there is an almost universal incentive to appear healthy. In personal injury and workers compensation as well as criminal trials, there can be strong incentives to look damaged or impaired. Thus in contexts where such biasing is so consistently present, the utility of the MMPI would almost invariably be reduced without the K-correction: defensive profiles would be underinterpreted and exaggerated profiles would be overinterpreted. If K went uncorrected, then in the resulting confusion as to who was distorting in what direction and how much, the court’s trust of the MMPI would soon be severely damaged.




Barthlow, D. L., Graham, J. R., Ben-Porath, Y. S., Tellegen, A., & McNulty, J. L. (2002). The appropriateness of the MMPI-2 K Correction. Assessment, Vol 9, No. 3. 219-229.


Caldwell, A. B. (1997). [MMPI-2 data research file for clinical patients.] Unpublished raw data.


Cofer, C. N., Chance, J. E., & Judson, A. J. (1949). A study of malingering on the MMPI.
Journal of Psychology, 27, 491-499. See also Greene, R. L. 2000, The MMPI-2: An interpretive manual (2nd ed.) Boston: Allyn & Bacon, pp. 97-98.


Dahlstrom, W. G., & Dahlstrom, L. E. (1980). Basic readings on the MMPI: A new selection on personality measurement. Minneapolis: University of Minnesota Press.


Greene, R. L. (2000). The MMPI-2: An interpretive manual (2nd ed.). Boston: Allyn & Bacon.


Leon, G. R., Gillum, B., Gillum, R, & Gouze, M. (1979). Personality stability and change over a 30-year period – Middle age to old age. Journal of Consulting and Clinical Psychology, 47, 517-524.


McGrath, R. E. (2005). Conceptual complexity and construct validity. Journal of Personality Assessment, 85,112-124.


Meehl, P. E., & Hathaway, S. R. (1946). The K factor as a suppressor variable in the MMPI Journal of Applied Psychology, 30, 525-564. Reprinted in Dahlstrom, W. G., & Dahlstrom, L. E. (1980). Basic readings on the MMPI: A new selection on personality measurement. Minneapolis: University of Minnesota Press.


Nelson, S. E. (1952). The development of an indirect, objective measure of social status and its
relationship to certain psychiatric syndromes (Doctoral dissertation, University of Minnesota). Dissertation Abstracts International, 12, 782. See discussion in Caldwell, 1997b.


Putzke, J. D., Williams, M. A., Daniel, F. J., & Boll, T. J. (1999). The utility of K-correction to adjust for a defensive response set on the MMPI. Assessment, 6, 61-70.


Welsh, G. S. (1965). MMPI profiles and factor scales A and R. Journal of Clinical Psychology, 
21, 43-47. See also Greene, R. L. 2000, The MMPI-2: An interpretive manual (2nd ed.). Boston: Allyn & Bacon, pp. 243 & 219-225.


Wiggins, J. S. (1959). Interrelationships among MMPI measures of dissimulation under standard
and social desirability instructions. Journal of Consulting Psychology, 23, 419-427. See also Greene, R. L. 2000, The MMPI-2: An interpretive manual (2nd ed.). Boston: Allyn & Bacon, pp. 98-100.


Wooten, A. J. (1984). Effectiveness of the K correction in the detection of psychopathology and its impact on profile height and configuration among young adult men. Journal of Consulting and Clinical Psychology, 52,468-473.

How do computerized reports fit into custody examinations, reports, and testimony?

The computer-generated MMPI-2 report basically describes the patterns of behavior that are characteristic of those who obtain similar profiles. What reactions, what sensitivities, what internal issues, what external interpersonal conflicts, etc., are likely? This is, of course, actuarial hypothesis generation: it alerts the clinician to what to look for, perhaps what to give weight to even if the examinee minimizes the problem. For example, it may alert the clinician as to what may be presented as a relatively superficial problem that could cover over other more uncomfortable issues.


All such statements are probabilistic even though it is not possible to set universal numerical probability values on each statement. Such values would fluctuate far too much when the MMPI-2 is used with quite different populations. My own solution to this is simply to graduate the overall level with such phrases as, “in most cases,” “in many cases,” “in some cases,” “in a few cases,” etc. An example of “a few cases” would be covered over paranoid trends in a profile that is not usually marked by paranoid thinking, but there are signs that this may be an exception. Thus the clinician is alerted to take note if clinically there are such signs.


Distinguishing the Actuarial Function from the Clinical Function The actuarial task is to offer relative baselines for various behaviors. The item responses are entered into a complex computer program that is unaffected and unbiased by any information about the issues being considered or by any information gained by the examiner. The output then becomes an array of hypotheses to which the examiner may want to attend. If a particular report statement happens not to fit and one needs to explain the probabilistic nature of “actuarial” to the court, then one might use the following as an example. The actuarial function is like a professional actuary tabulating the driving records of adolescents with versus without driver training. Everyone can recognize that there are exceptions (i.e., wrong predictions) – some with driver training are still poor drivers, and some without are nevertheless good drivers. But the point is that, on the average, one group has a different record from the other, and the size of this predictable difference becomes an element in setting their insurance rates. That an individual prediction in an actuarial MMPI-2 report does not fit does not take anything away from the fit of the other predictions, given that a good preponderance do fit. So far, Meehl’s prediction has proven right: the whole body of research on statistical vs. clinical prediction remains an amazing 100% in favor of statistical/actuarial predictions as equal to or exceeding clinical judgment. Theclinical function is to accumulate all the available information that one can obtain that is relevant to the determination to be made by the trier of fact. An important part of this can be the testing of the hypotheses based on what has been observed with similar MMPI-2 results. The probabilities are hardly 1.0, so to become practically meaningful they must be verified via interviews, observations, other records, etc. This process is, of course, vulnerable to accusations of bias and selectivity. But as noted above, the actuarial predictions are generated solely from the individual’s item responses and such demographics as age, gender, marital status, or years of education: the computer-generated actuarial characteristics cannot be biased by any clinical information about the person. Thus, whenever the objective predictions are clinically documented to be accurate, they clearly were not originated by observer bias; this strongly supports the objectivity of the examiner. One of my refrains is that to focus on the convergence of the clinical and the actuarial data can enable the most clearly objective and least challengeably biased presentation of one’s opinions and recommendations. A friend recently had an opposing psychologist witness assert that he had no need for nor use of computer-generated reports because “they do not take the person’s circumstances into account.” This is as sensible as saying that a car was poorly designed because it cannot fly, i.e., that it cannot do something for which it was in no way designed. This deceiving dodge was straightforwardly explained to the court.


How are computer-generated reports to be used in the preparation of a custody examination report? How the information from the computer-generated or actuarial report (CBTI or Computer Based Test Interpretation) is to be integrated into the final report is ambiguous. Specific rules or even recommendations have never been formally specified, and consequently individual practices vary considerably. I am not presently aware of any courts having taken any precedent-setting actions on this issue. I only occasionally see the final clinical reports that have made significant use of my (or other) CBTI reports, so my comments are in part based on feedback from practicing examiners. In my awareness, this use has ranged from paraphrasing, to copying a few words, to entering whole intact paragraphs, or to appending the entire CBTI. I do have a concern that copying extended passages, especially whole paragraphs or more with no recognition of the source, might be found misleading by members of the court. I am aware that some professionals may not want to indicate the CBTI source out of concern that they will be forced to produce the entire CBTI report. There may be statements in it which they do not want to be forced to explain or defend, e.g., a serious diagnosis listed in my “Diagnostic Impressions” section or a diagnostically serious discussion elsewhere in my report (see discussion below). Ideally, this should largely be a false fear as I will discuss, but aggressive cross examining attorneys can find ways to make what should be straightforward become tortuous if not torturous. I believe my responsibility is to provide the most accurate, complete, and useful test analyses that I can. The refinement of the material in my interpretive system has proven a lifelong task. For many obvious reasons it is not realistic or even possible for me to “police” how my reports get used beyond doing my utmost to be sure that the clients to whom we send reports are appropriately licensed professionals. The policing of abusive uses must be done by state professional ethics agencies, the A.P.A., or the courts in which they appear.


How can one deal with strong clinical and diagnostic statements? The issue of strong diagnostic statements and possibly formal diagnostic entries with serious implications in CBTIs merits specific comment. In part, their presence in our narrative reports reflects the predominance of clinical cases (often psychiatric inpatients) in the evolution of MMPI and MMPI-2 interpretation as well as being the MMPI’s strongest historical area of application. Much of the original interpretive data came from such settings, although the test has been used with hundreds of thousands of individuals in non-clinical settings. In the custody examination context, for example, such diagnostic CBTI content can best be understood as a possible vulnerability. That is, it is reasonably interpreted as reflecting trends or outside potentials in the person’s makeup. With relatively unelevated profiles, this is essentially the assertion that if more adversities were to befall the person and the person’s life were to go seriously downhill, the categories mentioned would be the most likely “summary labels” as to the direction(s) in which the person’s deteriorating emotional state would evolve and be seen. Assigning a formal diagnosis is not an actuarial function; a formal diagnosis is a clinical opinion based on a hopefully wide range of information. In treatment settings, the actuarial function is to contribute to differential diagnosis by alerting the clinician as to what labels are most commonly associated with psychotherapeutic clients and psychiatric patients who obtain similar patterns on the test (see Caldwell, 1996). It can sincerely be the custody examiner’s straightforward opinion that the multiple requirements for making such a diagnosis are not met in this immediate instance–it could be possible if everything got a lot worse and fell apart for him/her, but such a more extreme state is not now the case. On this basis it would be quite legitimate to dismiss the possible identified diagnoses as largely or even entirely irrelevant for this person at this point in time. Note that whenever the pattern is within the normal range, my reports explicitly state that fact. The diagnostic statement then almost always starts with such phrasing as, “Among psychotherapy patients . . .” and is followed by a statement that the normal range profile may reflect no more than an essentially normal personality or else a situational adjustment reaction (overall a large majority of subjects who obtain normal range profiles are indeed functioning individuals). In some cases with atypical or highly defensive profiles, an additional normality-qualifying statement may comment that the profile is within the normal range but more ambiguous than most because of the degree of defensiveness. This latter in part recognizes the fact that a significant minority of psychotherapeutic client profiles (including psychiatric inpatient profiles) are nevertheless within the normal range (denial and defensiveness, milder problems that benefit from working through, lack of self-awareness, etc.).


How does the CBTI connect to the concluding opinion? Damaging one’s credibility through attributions of bias is, of course, not an infrequent effort in adversarial custody examination proceedings. My belief here is that the use of CBTIs as non-case-biased sources of information can be very helpful in anchoring one’s objectivity and credibility. By emphasizing the hypothesis-generating or “alerting” function of the CBTI as to what are likely to be problematic issues for each of the litigants as parents and in relation to each other, the examiner can start from an uninfluenced and objective basis from which to develop recommendations. The available MMPI-2 interpretive data are not readily organized for searching in depth on a codetype-by-codetype basis (beyond textbook summaries). It would take many hours for the clinician to make a thorough search of the data sources for each profile considered, and the clinician’s own search itself might be made to look selective or biased. By pointing out that MMPI-2 interpretation is a very complex undertaking, it then becomes quite reasonable to the trier of fact for the examiner to consult an expert who has spent his career working on the task. Using the Caldwell Report Custody Report (the interpersonal implications of the MMPI-2 test results) and the Caldwell Report Narrative Report (the intrapsychic processes of each individual) clearly conforms to the nature of an expert consultation. In summary, I believe the direct discussion of the convergence of the clinical data with the actuarially-generated hypotheses can add a strong element of objectivity and logical flow to the process of exploring the particular person’s characteristics as a parent as well as maritally if not more generally. My impression is that the courts typically find this objective anchor to lead to substantial increments in the credibility of the opinions and recommendations provided.




Caldwell, A. B. (1996). Forensic questions and answers on the MMPI/MMPI-2. Los Angeles: Caldwell Report.

Validating data for clinical and forensic use of CBTIs

COMMENTARY ON: Relative user ratings of MMPI-2 computer-based test interpretations
John E. Williams and Nathan C. Weed, (2004). Assessment, 11, #4, 316-329. Caldwell Report will provide you with a copy if requested.


This study set out to do a meaningful competitive comparison of the then eight publicly available Computer Based Test Interpretation services (CBTIs) for the MMPI-2. In a thorough study they answered many prior criticisms of preceding studies, none of which had provided a comprehensive appraisal of all of the available services.


The reports that were rated included protocols from inpatient, outpatient, college counseling, and prison samples. The participants submitted an answer sheet from one of their own clients to the authors, and they received either a CBTI analysis of that profile or else an analysis of a profile that was modal and gender matched for whichever of those four groups to which it corresponded (257 valid protocols were used). The participants then rated the report which they received on 10 variables (they knew it could be their case or a modal profile, but they only found out which after having sent their ratings back to the authors). These ratings were as follows:


1. Conciseness
2. Confirmation of therapist’s impressions of the client
3. Usefulness for diagnosis and/or treatment
4. Accuracy
5. Provision of new and important information
6. Presence of contradictory information
7. Organization and clarity
8. Presence of useless information
9. Omission of important information
10. Appropriateness of diagnostic considerations


The ability to compare the ratings of the actual reports versus the modal reports enabled them to demonstrate that the CBTI reports were adding a large amount of information above and beyond what could be attributed to stereotype accuracy or Meehl’s “Barnum” effect. This latter is the potential for descriptive statements to be considered highly accurate despite their lack of discrimination among individuals (e.g., “Some days are better than other days.”). For clinical purposes this increment is happily reassuring as well as work-facilitating, and for forensic purposes it stands as a strong support of the expectation that the CBTI reports we use really are saying specific things that can potentially make substantial differences in the determinations to be made by the trier of fact.


The averaged ratings were then extensively analyzed by rank ordering the levels of favorability across the eight CBTI programs. The reports by Automated Assessment Associates (Strassberg & Cooper) received the highest ratings on accuracy, clinical usefulness, confirmation of opinion, and diagnostic suggestions. Strassberg and Cooper described their systems as conservative and to be used only in conjunction with other information; conservatism in this predictive context may enhance accuracy. The reports that were offered by Western Psychological Services obtained the highest ratings on being concise and free of useless information. The NCS-Pearson reports rated highest on organization and absence of contradictory statements. Williams and Weed distinguished between their report accuracy versus report style and organization variables; the WPS and NCS-Pearson reports thus topped out on the style variables although not topping out on content (except the authors grouped conciseness with content).


The reports from Caldwell Report had top ratings on the inclusion of new and important information and on not omitting important information (the latter by a wide margin over all of the other reports). I find this gratifyingly consistent with my obvious long-term intent to provide thorough reports that take as full an advantage as possible of the wealth of information that is embedded in the profiles and the other scores. Williams and Weed mentioned a prior study (Adams and Shore, 1976) in which there was a modest relationship between length of a report and its accuracy rating, with longer reports rated as less accurate overall. This makes obvious sense to me in that the more different things one says and the more specific one’s statements, the more opportunities one has to “go wrong.” I would confess some gratification at being third ranked – and close to second – in overall accuracy despite going out on so many “limbs” where my specificity could easily be rated as not “on target.”


Considering the use of a CBTI as a consultation, I believe that providing new and important information and not leaving out important issues is a crucial contribution of the “consultant.” That one’s clinical impression is confirmed is reassuring as well as strengthening of one’s clinical interventions and forensic presentation. But calling attention to what the client may have consciously avoided confronting or unconsciously led attention away from can be a significant gain for the clinician. A psychiatrist with whom I enjoyed working many years ago (Ulrich Jacobsen, M.D.) spoke of using the MMPI either “to confirm or to alert.” Not omitting important information corresponds, of course, to the alert function. I believe that having confidence that the MMPI has been thoroughly searched for overlooked or avoided issues should be strongly reassuring to the therapist or examiner.