Ethical obligations and statndards of practice

Ethical obligations and standards of practice

Ethical Standard 2.02(a)
Testing Standard 1.1
Testing Standard 2.8
Testing Standard 5.1
Ethical Standard 2.04(a) and Testing Standard 6.1

[Page 113 continued]

There are no professional standards available that have been specifically developed for the assessment of violence risk (Borum, 1996). Nevertheless, there are general ethical principles and practice standards applicable to these assessments. Psychologists undertaking risk assessments are obviously obligated to comply with the "Ethical Principles of Psychologists and Code of Conduct" (hereafter referred to as the "Ethical Standards") (American Psychological Association, 1992). The Standards for Educational and Psychological Testing are also applicable to psychologists engaged in risk assessments (hereafter referred to as the "Testing Standards") (American Psychological Association, 1985). The Introduction to the Testing Standards states: "... all professional test developers, sponsors, publishers, and users should make reasonable efforts to observe the Standards and encourage others to do so" (p. 2).

[Page 114]

Ethical Standard 2.02(a)

Ethical standard 2.02(a) states:

Psychologists who develop, administer, score, interpret, or use psychological assessment techniques, interviews, tests, or instruments do so in a manner and for purposes that are appropriate in light of the research on or evidence of the usefulness and proper application of the techniques (American Psychological Association, 1992, p. 1603).

Standard 2.02(a) therefore obligates psychologists to carefully consider the consequences of dealing with predictive variables in a dichotomous context. In particular, risk assessments made for sexual predator hearings will lead to one of the following four outcomes:

	(i) The offender is correctly classified as an individual who would commit future sexually violent offenses if released into the community.
	(ii) The offender is correctly classified as an individual who would not commit future sexually violent offenses if released into the community.
	(iii) The offender is incorrectly classified as an individual who would commit future sexually violent offenses if released into the community, but in fact would not commit such offenses.
	(iv) The offender is incorrectly classified as an individual who would not commit future sexually violent offenses if released into the community, but in fact would commit such offenses.

Standard 2.02(a) consequently necessitates that psychologists undertaking risk assessments address four questions corresponding to the outcomes identified above:

	(i) What is the sensitivity of the assessment procedure used for risk assessment, i.e., what percentage of previously convicted offenders who will recidivate is identified by this assessment procedure?
	(ii) What is the specificity of the assessment procedure used for risk assessment, i.e., what percentage of previously convicted offenders who will not recidivate is identified by this assessment procedure?
	(iii) What is the frequency of false positive classifications associated with this assessment procedure?
	(iv) What is the frequency of false negative classifications associated with this procedure?

These four questions are also consistent with Hanson's (1998) recommendations for evaluating the quality of an assessment procedure for sexual offender evaluations:

	1. What is the measure's predictive accuracy? For instance, given a specific base rate and cutoff score, what is the hit rate? False positive rate? Probability of recidivism?
	2. Do independent raters obtain the same score for the same individual (inter-rater reliability)? [Page 115]
	3. If the scale were developed on a different population, to what extent would it be expected to apply to the current case?
	4. Are there important risk factors that have been neglected? (p. 65).

Testing Standard 1.1

In assessing instruments used for sexual predator evaluations, it is also necessary to consider Testing Standard 1.1. This standard states:

"Evidence of validity should be presented for the major types of inferences for which the use of a test is recommended" (American Psychological Association, 1985, p. 13).

Because of the potential for arbitrarily undermining the civil rights of those evaluated, Testing Standard 1.1 obligates psychologists undertaking sexual predator evaluations to use valid assessment procedures. In particular, the following study would be necessary to adequately establish the validity of any assessment procedure used for a sexual predator evaluation.

	(i) Obtain a sample of 1000 offenders who would have undergone a sexual predator evaluation, but were released into the community prior to the passage of the relevant statute. Hanson (1998) also recommends a sample this size for a study such as this.
	(ii) For this sample, determine whether each offender committed a sexually violent offense within 15 years after their release.
	(iii) Evaluate 500 of these offenders using the assessment procedure proposed for risk assessment of previously convicted sexual offenders. Moreover, determine whether the assessment procedure is used in a reliable manner. Do two or more evaluators, independently evaluating the same offender, reach the same or similar conclusions?
	(iv) Using the proposed assessment procedure, define different cutoff scores -- or related decision-making criteria -- for concluding that an offender does or does not warrant commitment under the relevant statute.
	(v) Determine the levels of sensitivity and specificity, and the frequencies of false positives and false negatives, associated with the different cutoff scores or decision-making criteria.
	(vi) Cross-validate the various cutoff scores -- or decision-making criteria -- against the other 500 offenders from the original sample of 1000.

The study outlined above is consistent with generally recognized and accepted opinion emphasizing the necessity of expressing violence risk assessments in terms of probabilities rather than "yes-no" dichotomies

(Monahan, 1981a; Shah, 1978; Webster, 1984).

In other words, properly conducted sexual predator evaluations report the likelihood of future offending, or not offending, in probabilistic terms. It can be legitimately argued that failing to express risk assessments as probabilities, and neglecting to acknowledge the possibility of error, amount to poor practice and are potentially unethical

(Cunningham & Reidy, 1998; Hart, Webster, & Menzies, 1993).

Conducting the validation study outlined above is especially critical because of the "sensitivity-specificity tradeoff" dilemma (Quinsey, Harris, Rice, & Cormier, 1998). Cutoff scores, or decision-making criteria, for any assessment procedure can

[Page 116]

be adjusted to either decrease the frequency of false positive classifications, or decrease false negative classifications. Adjustments that decrease the frequency of false positive classifications, however, inevitably increase the frequency of false negative classifications. Conversely, adjustments that decrease the frequency of false negative classifications inevitably increase the frequency of false positive classifications.

A trier of fact needs to understand the consequences of this dilemma so it can reduce the type of error it considers most unacceptable:

	(i) mistakenly concluding that a previously convicted offender warrants civil commitment when he does not, or
	(ii) mistakenly concluding that a previously convicted offender does not warrant civil commitment when he does.

Decreasing the frequency of false positives, versus decreasing the frequency of false negatives, is not a decision for a psychologist to make as an expert witness (Melton et al., 1997). Any psychologist who does so presumptuously advocates a particular public policy, intrudes upon the responsibilities of the trier of fact, and also inappropriately expresses a legal conclusion.

Testing Standard 2.8

Testing Standard 2.8 states:

Where judgmental processes enter into the scoring of a test, evidence on the degree of agreement between independent scorings should be provided. If such evidence has not yet been provided, attention should be drawn to scoring variations as a possible significant source of errors of measurements (American Psychological Association, 1985, p. 22).

This standard necessitates demonstrating sufficient levels of inter-rater reliability for the assessment procedure or decision-making criteria used. Without sufficient levels of inter-rater reliability, assessment procedures or decision-making criteria cannot be standardized. In turn, unstandardized procedures invite a wide range of opinion from those who use them. Assume, for example, two or more psychologists report very different findings related to some assessment procedure or decision-making criteria they used. The poor reliability of those procedures inevitably compromises their va1idity.

Testing Standard 5.1

Testing Standard 5.1 states:

A technical manual should be made available to prospective test users at the time a test is published or released for operational use (American Psychological Association, 1985, p. 35).

The availability of a technical manual should be considered with regard to its general availabttily. A generally available manual is best defined as a publication with an ISBN number. Unlike unpublished materials, a published manual with an ISBN number is more readily available to the practitioners who typically undertake predator evaluations.

[}age 117]

Available manuals detail appropriate usage of an instrument, thereby increasing inter-rater reliability by reducing idiosyncratic judgments between evaluators. Manuals are also obligated to identify potential misuses of an instrument, and advise caution to avoid them.

Testing Standards 5.2-5.4 further call for manuals to thoroughly describe the rationale for a test, identify its recommended uses, and summarize the support for those uses. Manuals should moreover cite an objective review of studies related to specific uses of the test. Finally manuals should identify any special qualifications necessary to properly administer and interpret a test.

Ethical Standard 2.04(a) and Testing Standard 6.1

Ethical Standard 2.04(a) states:

Psychologists who perform interventions, or administer, score, interpret, or use assessment techniques are familiar with the reliability, validation, and related standardization, or outcome studies of, and proper applications and uses of, the techniques they use (American Psychological Association, 1992, p. 1603).

Relatedly, Testing Standard 6.1 states:

Test users should evaluate the available written documentation on the validity and reliability of tests for the specific use intended (American Psychological Association, 1985, p.41).

These standards obligate psychologists who undertake predator evaluations to familiarize themselves with the research related to the instruments they use. If asked -- "What data published in peer-reviewed journals are available to support the reliability and validity of the assessment instrument you used?" -- the evaluating psychologist should be able to respond. If no such data are available, then using the instrument in question is obviously ill advised.

Assume that subsequent to undertaking a predator evaluation, a psychologist expresses opinions such as the following:

	(i) In view of the offender's excellent response to treatment, his recidivism risk is substantially reduced.
	(ii) Because the offender was a victim of childhood sexual abuse himself, his recidivism risk is substantially increased.
	(iii) Stable employment is available to the offender if released, and this consideration reduces his recidivism risk.
	(iv) Given that this offender is currently single, in addition to not having been previously married, his recidivism risk is substantially increased.

Opinions such as these invite cross-examination asking:

	(i) Is there a generally available manual for this instrument that supports your opinion in this regard?
	(ii) Without a generally available manual for this instrument, can you cite any data published in a peer-reviewed journal supporting your opinion?

Again, then, if no such data are available, using the instrument in question is obviously ill advised. Parenthetically, the empirical support for the four hypothetical opinions cited above ranges from inconsistent to nonexistent.