Public Safety

Public Safety, Individual Liberty, And Suspect Science

Future Dangerousness Assessments And Sex Offender Laws

Hamilton, Melissa *

* Assistant Professor of Law, University of Toledo College of Law. J.D., The University of Texas at Austin School of Law; Ph.D, The University of Texas at Austin.

Temple Law Review, Apr 05 2010

Introduction 　

I. Special Treatment for Sex Offenders
   A. SVP Laws
   B. (Un)Intended Consequences of Sexual Predator Laws
       1. Sexual Predators and Stranger Danger: Is the Concern Valid?
       2. Do Sexual Offender Laws Work?
       3. Rethinking the Problem 　

II. Expert Evidence and Sex Offenders
A. Expert Evidence Law
B. Experts and Future Dangerousness Assessment

III. Actuarial Testing of Future Dangerousness
      A. The Sexual Recidivism Actuarial Tests
           - RRASOR
              - STATIC-99
      B. Empirical Evaluation of Actuarial Evidence of Future Dangerousness
            1. Testing and the Scientific Method
            2. Validity
                [a] The correlation coefficient
                [b] The Receiver Operating Characteristic (ROC)
            3. Reliability and Objectivity
            4. Training
            5. General Acceptance

IV. Judicial Perspectives on Future Dangerousness Evidence 　
      A. Daubert/Frye Challenges
           1. Frye Challenges
           2. Daubert and Alternatives
      B. The Standard of Likelihood to Sexually Re-offend
      C. Battle of the Experts
      D. (Mis)Interpretation of Actuarial Prediction

V. Conclusions

- - -

Introduction

Arguably one of the most prolonged and contentious debates in legal history centers on the balance between

	protecting basic civil liberties as guaranteed by the Constitution and
	protecting the public from harm.

Certainly the shadow of terrorist threats after the 9/11 attacks reignited this debate and has fueled public interest in changing the balance between security and liberty in favor of introducing a more preventive state. [*1]

While tipping the scale in favor of using a preventive law model might initially serve to calm public fear, be it about terrorism or crime generally, the long-term consequences of such action could be to limit the freedoms of many citizens. This is particularly true in the highly emotional area of predatory sex crimes. [*2]

In recent decades, federal, state, and local governments have become increasingly restrictive on the freedom and privacy of those labeled sexually violent predators (“SVP”s) [*3] in hopes of preventing further sexual violence. The most commonly used tools to manage SVPs are

	involuntary commitments for mental treatment,
	sex offender registration, and
	residency restrictions (hereinafter “SVP laws”).

In an effort to streamline the identification of sex offenders who pose a future danger and thus might be subject to SVP laws, officials place substantive legal emphasis on psychosexual evaluations by individuals accepted as experts. [*4]

These experts are generally mental health practitioners who offer opinion evidence about an individual’s potential for future dangerousness, often using actuarial (statistical calculation of risk) assessments. [*5]

This article critically analyzes whether future dangerousness assessments using actuarial tools are responsive to legal standards contained in SVP laws and whether courts, when confronted with such assessments, are adequately engaging in the gatekeeper role to accept only good science considering the evidentiary benchmarks of Daubert and Frye. [*6]

In Section I, the dominant SVP laws are outlined, to include a discussion of how and why politicians and the public are enamored with the assumed need for these specialized laws. I also provide statistical evidence that challenges whether fear about SVPs is realistic and whether SVP laws serve their preventive goals.

Section II provides a brief review of the current law on the admissibility criteria for expert evidence, with the concomitant summary of Daubert- and Frye-led standards. Additionally, the role of mental health practitioners in providing expert evidence in court when future dangerousness is at issue is discussed.

Section III summarizes two of the most common actuarial risk assessment tools used to identify and label sex offenders who will be subject to SVP laws. I then provide an explanatory analysis of the empirical validity of the actuarial tools in predicting sexual offense recidivism. Assessing whether actuarial-based risk predictions is good science is of extreme importance today. Not only is the implementation of SVP laws growing costlier and more intrusive in scope, new evidence has emerged to significantly undermine the validity of applying these actuarial tests to U.S. offenders.

Section IV offers a review of case law involving the role of the two actuarial assessment tools in SVP status cases, including an assessment of how courts have responded to Daubert- and Frye-based challenges to the instruments. Considering the vast majority of courts decline such challenges, this section also explores how the actuarial assessments instrumentally impact decisions on future dangerousness. In addition, this section contains a critical analysis of evidence that experts engaging in adversarial bias and empirically incorrect representations of actuarial predictions are all too common.

Finally, the conclusion offers cautionary notes and suggestions for reevaluating SVP laws and actuarial assessments of future dangerousness.

Specifically, this article concludes that because of uncritical reliance upon actuarial assessments of future dangerousness, legal professionals have largely failed to grasp the significant empirical limitations of these tests.

Judges and lawyers participating in SVP cases must appreciate the serious challenges to the reliability and validity of actuarial assessments being presented as expert evidence. The potential that criminal justice practitioners use empirically questionable assessments to inform decisions on the SVP status of individual defendants [*7] poses significant negative impacts to the public and to the defendants. SVP laws are very expensive and resource intensive for governmental institutions to implement, maintain, and enforce. [*8]

Defendants also suffer a considerable infringement of their constitutionally protected interests in liberty and privacy through the restrictions which can render them

	labeled as an SVP,
	incarcerated in mental health institutions,
	registered publicly as sex offenders, and
	restricted from places to live and work.

Hence, if pseudo-science greatly impacts these decisions, the significant risk of

	false positives (giving a sexual predator label to a defendant not likely to re-offend) and
	false negatives (not giving a sexual predator status to a defendant highly likely to re-offend)

undermines the preventive goals of SVP laws while rendering significant costs to the public and to individuals defendants.

I. Special Treatment for Sex Offenders
If sex sells, and if violence sells, their combination is exponentially alluring, [*9] and provides sensational headlines for media outlets that suggest the U.S. is in the midst of a sex crimes wave. [*10]

Largely due to the media fomenting moral panic about dangerous sex offenders, [*11] the management of sex offenders is a top priority for legislative action today. [*12]

The connection between the media and legislative efforts to invoke preventive measures on SVPs is clear as the pressure to increase control and punishment of SVPs often comes from citizens after heinous sex acts become high profile cases in the media.[*13] However, the news outlets’ method of coverage has also magnified exceptional cases of sexual violence and incorrectly implied that sex crimes are mostly accomplished by fearsome strangers. [*14]

Through repeated publication of their names and photographs, telegenic victims of sex crimes, particularly those young and cute, literally become the “poster children” for the moral panic and calls demanding officials do something to protect potential future victims. [*15]

Likewise, the media also hyped stories involving repeat sex offenders thereby leading to the iconic image of the recidivist sexual predator, i.e., the SVP. Researchers have studied the cyclical nature of the moral panic to help explain its endurance.

In one study, surveyed legislators admitted that their opinions on the need for restrictive measures for SVPs were informed greatly by media reports and constituent concerns. A terse comment by a legislator is telling:

“You hear about these guys raping and killing kids all the time now. We have to do something. It’s got to stop.” [*16]

In turn, with new SVP legislation, the media responds with further coverage reporting the government’s admission about the particular dangers imposed by the presence of SVPs in communities, thereby continuing the cycle and reinforcing the image of the menacing sexual predator. [*17]

This rhetoric underlying the moral panic continues, despite contradictory evidence. Official statistics show that in the U.S., the rate of rapes/sexual assaults were down by about one-third between 1998 and 2007. [*18] Further, sex offenders are not more likely than other types of offenders to either recidivate or to specialize in committing sex offenses. [*19]

So, in light of legislative convergence upon new types of preventive measures as applying only to sex offenders, one might reasonably consider: why the special treatment?

Many suggest that citizens view sexual deviance as qualitatively unique. While most offense-based laws regulate risky behavior, SVP laws focus on regulating risky persons, as if the sexual violence makes the sexual offender himself a unique characteristic. [*20] Citizens commonly believe that sexual offenders are highly likely to re-offend, recidivate at higher rates than other types of criminals, and are most likely to be strangers. [*21]

Experts indicate that the public tends to believe that sexualized violence has actually become more pervasive and that it is causing greater harm to society. [*22] This is particularly true in more recent years with the proliferation of technology as media portrayals fuel largely mythical fears regarding SVPs using the internet to lure victims. [*23]

As a result, legislators have articulated a number of responses to deter sexual violence through criminal laws, such as

	creating new criminal offenses and
	imposing longer sentences for sexual offenses, as well as
	other measures

aimed at managing the sexual offender population. [*24]

In large part, complicity between public fear, media hype, and political pandering works efficiently because of the imbalance in the debate. Families of murdered victims of sex crimes act as informal lobbyists for increasing control of sex offenders in what is a uniquely bipartisan political environment on the issue. There is virtually no counter movement to represent the interests of those alleged to be SVPs, who are largely reviled by all. [*25]

Often, the legislative drive in imposing new preventive measures on SVPs is espoused in political rhetoric making assertions about the prevalence and danger of sex offenders, and, without being publicly challenged, mostly without any empirical support. [*26]

Because of the widespread fear of SVPs, concerns spread along with this moral panic about the likelihood of sexual recidivism after convicted sex offenders are released from imprisonment.

The management of those convicted of sex-based crimes post-release has been largely through the enforcement of civil (v. criminal) laws. The use of civil regulations as management tools has the distinct benefit of permitting criminal justice officials to restrict the freedom and privacy of sex offenders without abiding by the stricter procedural requirements that would be constitutionally required in the criminal law arena.

The civil law-based tools also permit state officials to monitor and supervise sex offenders beyond traditional parole and probation structures. The most common types of civil laws used today to manage the sex offender population are SVP

	civil commitment laws,
	registration requirements, and
	residency restrictions. [*27]

A. SVP Laws

Civil Commitment Laws

SVP civil commitment statutes are a distinct species of traditional civil commitment laws that permit a state to commit a person to a mental institution who is mentally ill and who poses a danger to others.

The Kansas SVP law is typical. It defines an SVP as a person who meets three main criteria:

	1) the person has been convicted or charged with a sexually violent offense, and
	2) the person “suffers from a mental abnormality or mental disorder” which
	3) “makes the person likely to engage in predatory acts of sexual violence if not confined in a secure facility.” [*28]

A person adjudged to be an SVP under this law is then committed to an indefinite period in a secure institution, despite not having committed a new offense.

In a ground-breaking decision, the U.S. Supreme Court in the case of Kansas v. Hendricks upheld the Kansas SVP law in the face of constitutional challenges. [*29] The Court dismissed Hendricks’ claims that the law violated the double jeopardy and ex post facto clauses. The starting point for the analysis was that these clauses apply only to laws that are intended as punishment. The majority ruled that civil commitment is civil in nature and since it is not intended for deterrent or retributive purposes, it does not constitute punishment. [*30]

Post-Hendricks, most SVP civil commitment statutes are modeled after the Kansas statute. Being civil in nature, defendants in civil commitment proceedings do not enjoy many of the benefits of the Fifth and Sixth Amendments, such as self-incrimination, jury trial, or confrontation, but are generally provided some right to counsel. [*31]

In practice, most states use SVP civil commitment laws when a sex offender is about to be released from prison. The idea is to allow the offenders to serve their sentences under normal custodial arrangements but then to transfer them to other secure accommodations just before their scheduled release dates so that the offenders are actually never released. Currently, 20 states and the federal government have implemented SVP civil commitment laws, and a recent estimate is that over 4,300 SVPs are held in civil commitment facilities under these laws. [*32] Once committed, it is very difficult for defendants to ever be released. [*33]

[See also: Ipce Magazine # 6, February 2013 - Theme: Civil Commitments - Ipce]

Registration Requirements

Sex offender registration systems are public information devices. All 50 states and the federal government in the U.S. have mandated sex offender registries. [*34] Much of the information is widely available to the public on freely accessible internet sites. The type of information that sex offenders are required to provide has expanded dramatically over the years.

As an example, Georgia requires the registrant’s

	name,
	date of birth,
	height,
	weight,
	fingerprints,
	photo,
	residence,
	employment information, and
	vehicle details. [*35]

After initial registration, sex offenders are generally required to update the information from time-to-time. Once a person is labeled an SVP for registration purposes, it may be quite difficult to challenge the designation, even on grounds that the label was erroneously imposed. [*36]

Residency Restrictions

States and local jurisdictions are also active in passing laws banning sex offenders from residing in even larger swaths of areas. Unlike the other two SVP schemes that tend to be federal or state-wide in scope, residency restrictions are more likely to be enacted at the local level by town or city councils. [*37]

The resulting multiplicity of regulations has resulted in a wide variety of types of restrictions that apply to different types of offenders.

	Some laws apply to all sex offenders [*38] while
	others target just the subgroups of violent sex offenders or child victim sex offenders. [*39]
	In some states, residency restrictions apply even to offenders convicted on non-contact sex-based offenses, such as possession of pornography.

A common criterion is to delineate a certain distance, such as 1,000 or 2,000 feet, around specified locations like

	schools,
	parks, and
	bus stops

where the sex offender is not permitted to reside. [*40]

These policies have resulted in effectively banning sex offenders from residing in whole towns, cities, and counties. [*41] Further, lifetime residency restrictions are now common. [*42]

A few items of note apply generally ... to both registration and residency laws

While many states require registration per se after a conviction for one of many enumerated sex offenses, about half of the state laws utilize a future dangerousness assessment model to tailor the exact nature of the registration requirements to the individual. [*43]

Similarly, while many residency restrictions apply per se to those convicted of specified sex offenses, many of the laws correspond to the registration system by invoking residency restrictions on registered sex offenders. A violation of these laws most often constitutes a new criminal offense and a new prison sentence as a consequence. [*44]

Registration requirements and residency restrictions have often been applied retro-actively, meaning applying to offenders convicted prior to the laws’ enactment. [*45] Facing claims of the laws violating the ex post facto clause, the U.S. Supreme Court has ruled that registration is civil in nature and, thus, inapplicable, [*46] and while the question of the constitutionality of residency restrictions has not risen to the highest level, lower courts have generally denied such claims on the same basis. [*47]

Critics, ...

... though, continue to decry that SVP laws are punitive in nature since they serve the legislatures’ and publics’ thirst for retribution against sex offenders [*48] that exceeds the parameters of normal sentencing systems. SVP laws signify that the

“prediction of future dangerousness has begun to colonize our theories of punishment.” [*49]

Moreover, future dangerousness assessment reflects a broader criminal justice policy move toward incapacitation as the key driver, which is

“a reflection of the fact that we have given up on trying to reduce crime by investing in job opportunities, education, assistance to immigrants, drug rehabilitation programs, reentry programs, and the like. . . . In order to become more efficient, we develop actuarial methods to determine who should be exiled to prison and for how long.” [*50]

B. (Un)Intended Consequences of Sexual Predator Laws

In conceptualizing how well SVP laws fit the preventive law model, two key issues emerge.

	One is whether the hype about the dangerous sexual predator is accurate.
	The second is whether these restrictions do more harm than good to those to whom they are applied and to the public welfare in general.

While it is beyond the scope of this paper to more fully explore these issues, it seems relevant to at least mention them here in order to highlight why the potential misuse of actuarial risk tools in SVP litigation is problematic to the notions of law and justice.

1. Sexual Predators and Stranger Danger: Is the Concern Valid?

Much empirical evidence indicates that the iconic image of the SVP is more myth than real. A natural starting point is the relatively simplistic statistic of the base-rate. The base-rate for recidivism essentially means the overall rate of recidivism actually observed in a group of offenders. [*51]

Before reviewing results from official studies, it is important to consider that base-rates can vary by study.

Three important methodological choices between studies can help explain the variation:

	the defining characteristic of recidivism (operationalization of the outcome variable, in social science terms);
	the time period for observation; and
	the demographic characteristics of the group observed.

For example, if the outcome variable is arrest then the base-rate will naturally be higher than if the outcome requires a conviction. The longer period of time for follow-up will yield higher numbers since there is more opportunity to re-offend.

Another consideration is that a group that is at higher risk of recidivism, such as younger offenders who re-offend more often than older offenders, will yield a higher base-rate because of the attribute that is correlated with higher risk.

In a prominent base-rate study sponsored by the U.S. Department of Justice, researchers conducted a 3-year follow-up of over 9,500 male sex offenders released in 1994. [*52] They found that

	5.3% of the released offenders had been re-arrested for a sex crime. [*53] This rate was much higher than the
	1.3% re-arrest statistic on sexual offenses for the non-sex offenders who were released the same year. [*54]

However, comparing the groups for non-sex offenses,

	non-sex offenders had a 68% re-arrest rate, compared to the
	43% re-arrest rate for sexual offenders. [*55]

Another study of male and female offenders released from prisons in Massachusetts in 1992 also compared recidivism rates (there defined as any re-incarceration or parole violation) between offenders who had been incarcerated for sex offenses and separately for offenders committing other types of offenses. [*56]

The recidivism base-rate

	for the sex offenders for any offense was 22%, compared to base-rates of
	32% to 57% for other types of offenders (property, person, drug, and other). [*57]

The point here is that if sex offenders are not substantially higher in recidivism risk than non-sex offenders, then specialized policies do not appear to be justified. [*58]

Other studies of sex offenders also show relatively small percentages of sex crime recidivism after release.

The rate for an Iowa study was

	3% at follow-up of about four years for a demographically mixed group (meaning children and adults and men and women), [*59]
	3% also in Washington at about five years, [*60] and
	8% in Ohio at a ten-year follow-up. [*61]

Nonetheless, the base-rates reported in the foregoing studies are far below what the public seems to believe: one study showed the tendency for the public to dramatically overestimate the recidivism rate of sex offenders at 75%. [*62]

While base-rates of sexual recidivism are relatively low in official studies, there are other reasons base-rates vary. Sex offenders are a heterogeneous grouping, and their variances can have substantial effect on the commission of sex offenses.

For example, there is evidence of differing recidivism rates in sub-groups based on such factors as

	age,
	gender,
	receiving treatment,
	prior incarceration, and
	type of sex offense. [*63]

Still, the foregoing studies involving mixed populations of predominantly male offenders together are strong evidence that the vast majority, up to 95% of released sex offenders in the national study, do not sexually re-offend. [*64]

Thus, empirical studies tend to show that the image of the re-offending sexual predator is, if not mythical, not as accurate as the media and politicians assert.

2. Do Sexual Offender Laws Work?

Even if it were true that sexual offenders pose the extreme risk suggested by the sexual predator symbolism, can and do these SVP laws work in preventing future sexual violence? Presumably, civil commitment is effective, though at enormous cost to state coffers and to liberty and privacy interests.

An assumption of registration and residency laws is that dangerous predators are strangers and that the public can protect itself by being aware of whom these people are and controlling where they live. Statistics belie this assumption.

Of the rapes and sexual assaults reported in the National Crime Victimization Survey for 2007,

	64% of female victims and
	58% of male victims

were attacked by non-strangers. [*65]

Another national report using police reports during a 5-year period in the 1990s indicated that assaultive sex offenses committed against child victims (less than 18 years) were also largely (76%) committed by family members and acquaintances. [*66]

In children under 6 years, the statistic is more depressing, where almost half of the offenders were family members. [*67]

In other words, according to these studies, the majority of victims of sexual assaults are by persons already known to them. [*68]

The obvious supposition here, then, is that SVP laws are, even theoretically, not capable of protecting many victims of sexual violence as most are those with whom offenders are already familiar and may have continuing access. [*69]

Another basic theoretical problem regarding the registration system is that it naively presumes that the information is current, and that citizens access the information and act thereon to protect themselves. [*70]

A recent study found that while a large proportion of survey respondents were aware of the existence of their state’s public sex offender data file, most had never accessed it and, of those who did, few took any precautionary measures as a result. [*71]

Considering a more direct perspective on whether sex offender laws work, studies continue to show that

	neither sex offender registry implementation
	nor residency restrictions

actually reduce recidivism. [*72]

A likely explanation is that there appears little doubt that SVP laws conflict with everything that has been learned in the past few decades about successful reentry efforts. These laws interfere with those important factors that have been shown to mitigate re-offending, such as

	support of family, friends, and community,
	maintaining a job, and
	a healthy place to live.

For example, when the bans involve large areas, sex offenders have congregated in discrete areas, often socio-economically and socially challenged neighborhoods, or become homeless. [*73]

The label of SVP and its public consequences from the sexual predator laws also brings

	social scorn,
	loss of social support,
	reduction in employment possibilities,
	housing difficulties, and
	personal harassment. [*74]

Experts believe that the multiple pressures and lack of legitimate opportunities increases the risk of recidivism. [*75]

There are ramifications to victims of sex crimes, too. Stigmatization of offenders and families may lead to reduced rates of reporting of sexual victimization, particularly when the perpetrators are nonstrangers. [*76]

Regarding intra-familial sex offending, the new laws may further discourage family victims from reporting or cooperating in criminal prosecutions because of the laws’ impact on family members’ freedom and the potential for public humiliation. [*77]

Though many states protect the secrecy of the victims’ information in sex crimes, the publication of their offenders’ information may permit others to extrapolate as to the victims’ identity, particularly when the victims are family members. [*78]

The sheer cost of these policies, in terms of cash outlays and governmental resources, is daunting. An informal survey recently indicated that the average cost of civil commitment of SVPS across the states was $ 100,000 per person per year. [*79] State officials also are complaining about the soaring costs of monitoring sex offenders for compliance with registration and residency laws. [*80]

The estimated cost of implementing the national registry is 1.5 billion. [*81]

3. Rethinking the Problem

Notably, this paper does not argue that sex offender laws are themselves inherently bad policy. The protection of the public from the horrendous damage that sexual violence causes is certainly a laudable goal of the government. And the strong and official condemnation of sexual violence voiced by the policies is commendable. [*82]

But, if the policies are not working to prevent sexual offenses, using bad science to assess future dangerousness exacerbates the problem.

The Center for Sex Offender Management, a project of the U.S. Department of Justice, warns that effective management strategies for SVPs should not be politically based on reactionary public fears of the sex offender population. [*83]

To be fair, risk assessment is inherently difficult, and human behavior is largely unpredictable. Yet, because of the importance of invoking restrictive laws based on judgments of risk, criminal justice officials and the public should be realistic about the situation.

Proponents need to acknowledge that false positives and false negatives will commonly occur, and then determine if continuing with future dangerousness assessments is justified legally, socially, and monetarily. [*84]

If the answer is yes, then at a minimum, we should demand the most empirically sound risk assessment procedures, learn to better interpret the results offered, and fully understand their advantages as well as their limitations.

II. Expert Evidence and Sex Offenders

Specialized SVP laws remain an important focus in the justice system’s response to sex offenders post-release. Because the preventive laws are beyond the traditional application of criminal laws, and, indeed, are directed toward potential future crimes, the application to individual offenders has been substantially based upon assessments of future dangerousness.

Courts commonly require mental health professionals to provide expert testimony to support these risk assessments. In turn, experts rely heavily on actuarial tools developed in an effort to standardize the assessment based on empirical principles.

To make their assessments, experts may, but by no means always, have access to a variety of data, from

	mental health records,
	criminal records (including, e.g., police reports, arrest, and probation/parole results),
	treatment reports,
	reports from other evaluators,
	actuarial assessments,
	diagnostic evaluations, [*85] and
	interviews with the defendants.

This Section begins with a brief overview of evidentiary law as applied to expert witnesses.

A. Expert Evidence Law

The introduction of expert witness testimony in criminal proceedings is not a new idea in the U.S., but it is growing in its influence with the advent of forensics such as fingerprints and DNA.

Studies show that a judge’s admission of expert evidence is quite influential upon jurors’ acceptance of the expert testimony as scientifically valid.[*86]

A recent trend is the growing use of experts to provide evidence based on knowledge gained through use of the “softer” social sciences. [*87] Some of the common areas in which expert evidence draws upon social science in the criminal context involve

	insanity,
	battered women,
	rape trauma,
	eyewitness identification, and
	future dangerousness. [*88]

While much of the social science expert evidence assists the jury in assessing the facts as they previously happened, the future dangerousness issue uniquely entails the expert providing what some have referred to as the social context for determining future facts. [*89]

A special issue with respect to this new expert evidence field is that while the model for reliability in the traditional hard sciences was based on experimentation, replication, and validation, the model that social scientists use generally must replace the experimentation component with observation. [*90] This is true largely because maintaining a truly randomized, controlled experiment in the atmosphere of a laboratory-like setting when it involves people’s social lives is often not practical and may be unethical. [*91]

Before focusing on the expert rules as applied to future dangerousness testimony, a summary review of expert evidence law is appropriate. As a general matter, state courts follow either the Frye test for expert testimony or the more recent Daubert-led standards, though a few states maintain variations of these. [*92]

The Frye test, adopted by a federal appellate court in 1923, mandated that expert testimony involving new scientific evidence was admissible upon proof it was reliable, and reliability is shown if it has been generally accepted in the relevant scientific community. [*93] This is known as an exercise in “counting heads.” [*94]

In 1975, Congress enacted the influential Federal Rules of Evidence, including FRE 702, which provides that expert testimony involving scientific, technical or other specialized testimony is admissible in federal court if it

“will assist the trier of fact to understand the evidence.” [*95]

The question arose, what does this rule do with the Frye standard?

The U.S. Supreme Court weighed in on this issue in a case where experts battled about whether a particular prescription drug was the cause of a birth defect.

In Daubert v. Merrill Dow Pharmaceuticals, Inc., [*96] the Court ruled that, with Congress’ “liberal thrust” for permitting opinion testimony, the Rules displaced Frye and its strict focus on the general acceptance test. [*97]

Instead, the Court invoked the broader perspective that scientific evidence is admissible if it is valid and reliable, with general acceptance not the entire substitute for those criteria. [*98]

	The Court defined validity as: “does the principle support what it purports to show?”
	Reliability asks: “does the application of the principle produce consistent results?” [*99]

Invoking the metaphor of the gatekeeper, the Court concluded that judges need not rely solely upon external groups, but that judges themselves should evaluate the scientific reliability and validity of the proposed evidence. [*100]

The Court provided some “general observations” to guide judges in determining the validity and reliability of the offered expert evidence to the case at hand:

	testability;
	peer review and publication;
	methodological standards, including the error rate; and
	general acceptance. [*101]

Hence, general acceptance remained, but only as one of several potentially applicable criteria. In a strong dissent, Chief Justice Rehnquist lamented that the majority’s approach would cause judges to become “amateur scientists.” [*102]

Despite Daubert and its progeny, the Supreme Court left unanswered many other questions. For example, it is unclear whether the presence of any one of the general guideline criteria, even general acceptance, is sufficient if none of the other criteria are met. [*103]

Since the Court made clear that its “general observations” were not a dispositive list of factors to consider, [*104] judges also struggle with what other factors may be relevant. There remains, too, the question of what degree of reliability is sufficient. [*105]

In the case of SVP laws we might also add a query about whether the standard for reliability should differ depending on the type of deprivation involved, such as the greater infringement on liberty inherent in civil commitment statutes or the arguably less onerous burden imposed by registration or residency requirements.

As between Daubert and Frye, legal practitioners considered the new conceptualization from Daubert as potentially being more generous than the Frye standard, thus substantively minimizing the role of the general acceptability standard in federal court. [*106]

On the other hand, Daubert can also be more limiting. For instance, there may be general acceptance in the field of astrology about the methods and tools for which to predict future events based on planetary movement, but a Daubert-led court likely would exclude the evidence as specious (i.e., unreliable), and thus inadmissible. [*107]

Not all state courts follow Daubert but continue to follow Frye or some equivalent. [*108]

Further, many of the Frye-based courts still distinguish between

	scientific evidence (to which Frye applies) and
	non-scientific evidence (to which Frye does not).

In addition, for those states strictly following Frye, the reliability (via general acceptance) question is entertained only if the expert is offered to testify about a new scientific methodology. [*109]

In states retaining Frye, then, expert evidence that is either not new or not scientific is not subject to the Frye admissibility standard, though other general evidentiary rules apply regarding relevance and prejudice. Nonetheless, a few courts, while purportedly holding onto the Frye test, require some additional evidence on reliability for admission of novel scientific evidence. [*110]

For example, an Illinois court ruled that the harbinger of junk science is sufficiently troublesome that judges should not inquire only about general acceptability but should also consider if the evidence is otherwise reliable. [*111]

Despite these variations, the concept of general acceptance continues to play a strong role in expert evidence regimes.

As a result of this expert evidence body of law, I propose two observations, perhaps from a cynical perspective, of the goal-oriented actions that experts and judges appear to use to balance objectivity with efficiency.

First, a consequence of the intersection between general acceptance and the efficiency of relying upon legal precedence is that, once a novel science is admitted by judges, it maintains its veil of “good science” without meaningful review, despite advances in research and/or legitimate criticism.

The second thought is, as others have nicely put it, applies to potentially self-interested experts:

[W]orkers in a novel area sharing a common goal may develop a technique that furthers their professional aims and they may ‘generally accept’ it regardless of its scientific validity, sometimes despite strong scientific denial of its underlying premises. [*112]

This paper suggests that these problems may help explain the state of admissibility decisions on expert evidence involving dangerousness assessment methodologies using actuarial tools in SVP cases.

The result is tautological:

	if generally accepted, then it is reliable;
	actuarial risk assessments are generally accepted;
	therefore, actuarial risk assessments are reliable (and truthful).

B. Experts and Future Dangerousness Assessment

The legal and constitutional bases for permitting experts to testify as to an offender’s likelihood of recidivism derive from the U.S. Supreme Court in the capital punishment sentencing domain.

In Barefoot v. Estelle, the Supreme Court confronted a Texas death penalty statute that permitted a jury to sentence a capital defendant to death only after answering two questions affirmatively. [*113]

As pertinent here, one question to be answered, with a dichotomous response (yes/no), was whether

“there is a probability that the defendant would commit criminal acts of violence that would constitute a continuing threat to society?” [*114]

The particular issue the Court addressed in the case was the constitutionality (under the Eighth and Fourteenth Amendments) of mental health expert witnesses to competently testify before the jury on the question of the defendant’s future dangerousness. The defendant in Barefoot argued that doing so would violate the cruel and unusual punishment clause applying to capital cases by contributing to arbitrary and capricious decisions in death sentencing. [*115]

In support of the defendant’s position that such testimony could not be reliable, the American Psychiatric Association (APA) submitted an amicus brief in which the professional organization maintained that psychiatric predictions on recidivism were unreliable and that, in its estimate, two out of three predictions by psychiatrists of long-term future dangerousness were erroneous. [*116] Despite acknowledging the APA’s position, the six-justice majority nevertheless ruled against the defendant. [*117]

The majority explained that it was not convinced that expert testimony predicting future violence was “entirely unreliable,” and in any event, any “shortcomings” could be effectively minimized during the adversarial process. [*118]

According to the majority, even the APA does not assert that psychiatrists were always wrong and, even if many psychiatrists disagreed with the reliability of such predictions, others were willing to testify and to give their professional opinions about the defendant’s future risk of violence. [*119]

The three-justice dissent provided a strong rejoinder to this:

One can only wonder how juries are to separate valid from invalid expert opinions when the “experts” themselves are so obviously unable to do so...
[U]ltimately, when the Court knows full well that psychiatrists’ predictions of dangerousness are specious, there can be no excuse for imposing on the defendant, on pain of his life, the heavy burden of convincing a jury of laymen of the fraud. [*120]

Further, the majority opinion was silent, therefore not expressly considering, how the experts at trial derived their opinions that the defendant, Barefoot, was likely to pose a future danger. The Court merely noted that one of the state’s experts testified that he believed he could competently make a risk prediction about an individual defendant “if given enough information.” [*121]

In the entirety of its opinion, the Barefoot majority does not substantively engage with the question of evidentiary reliability standards as applied to violence risk assessment. Instead, the Court appeared more concerned with two collateral issues.

For one, the majority worried that if it categorically barred expert evidence on future dangerousness, such a ruling may present a slippery slope by also undermining the use of experts in other areas of law requiring risk assessments of danger, specifically citing decisions on

	bail,
	sentencing, and
	parole, in addition to
	civil commitment proceedings. [*122]

Secondly, the Court noted that since the law required juries to make this type of factual assessment, jurors should at least get some external help. [*123]

In sum, the Court refused to constitutionally exclude “an entire category of expert testimony” about future dangerousness. [*124]

Hence, with the Supreme Court’s approval of expert predictions on future violence in death penalty cases, [*125] and with the majority’s reference to expert assessments of the risk of violence in civil commitments, albeit dicta, it seems reasonable to extrapolate Barefoot’s general conclusion to future dangerousness assessments of sex offenders.

Some legal academics interpret the Barefoot decision as relaxing the expert evidentiary standard later adopted in Daubert, specifically for future dangerousness expert testimony. [*126

] Still, the Court in Barefoot did not expressly do so as it did not confront the issue.

This paper maintains that courts should re-engage in critically examining the reliability and validity of expert testimony in future dangerousness contexts. This should apply not only to SVP civil commitment proceedings but also to the arguably less restrictive regimes where individual assessments of future dangerousness are relevant to the application of

	sex offender registry requirements and
	residency restrictions.

The legal conclusion from Barefoot that the Constitution does not require expert predictions on future dangerousness to be categorically excluded does not end the analysis.

The Court certainly cannot have meant that just anyone could qualify as an expert witness or that any opinion the expert wishes to give should be admitted. In support of this contention, since Barefoot, the Court has become a bit more wary about mental health expert testimony.

In Ford v. Wainwright, a case involving the insanity determination of death row inmates, the Court, citing Barefoot, warned against simply relying upon an expert’s testimony. [*127]

The Ford majority advocated that focused questioning of the expert would serve the truth-finding function by bringing to light

	the bases for each expert's beliefs,
	the precise factors underlying those beliefs,
	any history of error or caprice of the examiner,
	any personal bias with respect to the issue of capital punishment,
	the expert's degree of certainty about his or her own conclusions, and
	the precise meaning of ambiguous words used in the report.

Without some questioning of the experts concerning their technical conclusions, a fact finder simply cannot be expected to evaluate the various opinions, particularly when they are themselves inconsistent. [*128]

In dicta, the Ford Court also questioned the reliability of the experts’ opinions in the case considering the

“cursory nature of the underlying [joint] psychiatric examination” that appeared “dubious” at best. [*129]

In capital cases since Barefoot, the Court has expressed more concern about the reliability and truth-assisting nature of expert testimony on future dangerousness in capital cases.

Though not overruling Barefoot on this point, the Court clarified the remedy to help ameliorate the deficiencies in expert opinion in predictions of future crime.

For example, if the prosecution offers a psychiatrist to testify in the sentencing phase about future dangerousness, due process requires that the state provide the defendant with his own psychiatrist to rebut the prosecution’s expert. [*130]

The significant consequence of an erroneous decision on future dangerousness based on just the prosecution’s expert, the Court found, was unacceptable. [*131]

While the Barefoot case did not expressly address actuarial-based assessment of future dangerousness for purposes of death penalty sentencing, its relevance here is the indication that the use of mental health experts in risk assessment will continue.

This paper, though, unlike Barefoot, is more interested in bringing Frye, Daubert, and the broader concepts of reliability and validity back into at least one aspect of expert testimony on future dangerousness in SVP determinations: actuarial assessment tools.

III. Actuarial Testing of Future Dangerousness

At the time of Barefoot, experts substantially based their predictions of future dangerousness on their clinical judgments. [*132] Yet critics complained that clinical assessments were inherently unreliable and subject to bias. [*133] Practitioners thereby sought more empirically-based tools that could offer more reliable guides. [*134] The development of actuarial risk tools ensued.

Fundamentally, actuarial risk tools are about deriving statistics from groups.

For example, automobile insurance companies assign policy rates to individuals based on predictive statistics derived from historical, group-based claims data. For car insurance, the common relevant factors include

	age,
	education,
	vehicle model, and
	driving history. [*135]

The general idea for actuarial ratings for any risk at issue is to identify those factors that are correlative to the potential occurrence of the future event at issue, and to effectively assign appropriate weights to each factor based on the observation that some factors have greater correlative abilities than others relating to the particular result.

The theory is that a better model of prediction should be based not on any single risk factor, but an accumulation of relevant risk factors. [*136]

The developers of actuarial instruments, therefore, use existing data in an empirical way to create rules to combine the most relevant factors, provide the applicable weights, and create a final mechanistic score. [*137]

The assessor then compares the score against the experience tables which yield probabilities of the result observed from the reference group data. To support the empirical validity of the instruments, the scales are cross-validated by retesting with other samples.

Understanding the group-based nature of the creation of actuarial assessment tools is crucial.

When determining the relative risk for another individual, those characteristics of the individual common to the similar factors in the actuarial model are compared and ranked based on the group results as to the outcome of interest. [*138]

To return to the automobile insurance example, the insurer’s agent would input a prospective customer’s data into the actuarial model to compute an overall score and draw a prediction rate (and corresponding price) based on the experiential claims data from those in the historical sample with similar scores.

A. The Sexual Recidivism Actuarial Tests

Two of the most commonly used actuarial prediction tools in SVP determinations include

	the Rapid Risk Assessment for Sex Offence Recidivism (RRASOR) and
	STATIC-99. [*139]

The RRASOR is the more stream-lined of the two, assigning points on just four static factors:

number of prior sex offenses
(from

0 points = no convictions or charges to

3 points =

	4 or more convictions or
	6 or more charges),

age at assessment
(meant to be at release)

	(0 = more than 25 years;
	1 = less than 25 years),

victim gender

	(0 = only females;
	1 = any male), and

relationship to victim

	(0 = only related;
	1 = any non-related). [*140]

The items are scored and the sum of the scores is associated with a certain recidivism rate over a 5-year and 10-year period based on group statistics observed in developmental samples. [*141]

For example, a higher score will result for a subject who has

	a greater number of prior convictions or charges of sex offences,
	age less than 25 years,
	only male victims, and
	at least one extra-familial victim.

The developer of RRASOR created the instrument based on a meta-analysis of other studies of sex offender re-ofending and identified which factors tended to be correlated with sexual recidivism. [*142]

His focus on reducing the factors to four was to create a

“brief, efficient actuarial tool that could be used to assess the risk for sexual offense recidivism” and easily scored. [*143]

Because of the goal of efficiency, the developer acknowledged that RRASOR was not a comprehensive assessment tool. [*144]

To develop the experience table, the developer used the sexual recidivism rates observed in seven follow-up studies of samples of released sex offenders in the U.S., Canada, and England. [*145]

Sexual recidivism was variously defined in the studies as

	charges,
	convictions, and
	readmissions to inpatient psychiatric facilities. [*146]

The developer then created the final experience table that associates specific RRASOR scores (1-5) [*147] with risk estimates for five- and ten-year periods.

Instead of using the exact observed rates of sexual recidivism from the samples, the final experience table risk percentages were extrapolated by formula from the observed rates because the samples had varied follow-up periods (2-23 years).

In the end, the experience table’s predictive rates range

	from a low of 4% sexual recidivism for 0 points at 5 years
	to a high of 73% for 5 points at 10 years. [*148]

STATIC-99 was developed by the RRASOR author with another researcher using four samples of male sex offenders, totaling just over 1,000, released from Canadian and English institutions.

It remains the most commonly used actuarial tool in the U.S. for SVP civil commitment hearings because of its resource efficiency. [*149]

The resulting tool is a combination of variables from two other instruments, including incorporating all four RRASOR factors. [*150] The Static-99 [*151] instrument includes 10 static factors: [*152]

Age at assessment: 0 = 25 years or older 1 = between 18 and 25 years	Number of prior sentencing dates: 0 = 3 or less 1 = 4 or more
Having lived with an age-appropriate intimate partner for 2 years: 0 = yes; 1 = no	Any convictions for a non-contact sexual offense: 0 = no; 1 = yes
Any convictions for an Index non-sexual violence: 1 = yes; 0 = no	Any non-familial victims: 0 = no; 1 = yes
Any convictions for non-sexual violence before the Index (most recent sexual offense) offense: 1 = yes; 0 = no	Any stranger victims: 0= no; 1 = yes
Number of prior sex offenses: 0 = none 1 = 1-2 charges or 1 conviction 2 = 3-5 charges or 2-3 convictions 3 = 6 or more charges or 4 convictions	Any male victims: 0 = no; 1 = yes

For STATIC-99, total scores range from 0-12, arranged within seven risk categories organized into four ordinal risk groups (from

	0 = low risk to
	6 = high risk). [*153]

The experience table provides 5-, 10-, and 15-year sexual recidivism rates for each total score from 0 through 6, with 6-12 points sharing the same experience rates. [*154]

Sexual recidivism was operationalized as

	reconviction in three of the samples and
	either charges or re-admission for one sample, and
	the final experience rates simply aggregate them.

A few examples from the experience table may be helpful.

A total score of 3 provides estimates of sexual recidivism of

	12% at 5 years,
	14% at 10 years, and
	19% at 15 years. [*155]

The greatest scores which were grouped into a 6+ range (6-12 points) yields the highest estimates in the experience table of

	39% (5 years),
	45% (10 years), and
	52% (15 years).

Even a score of 0 yields positive estimates of 5%, 11%, and 13% at 5-, 10-, and 15-year intervals, respectively. [*156]

The STATIC-99 coding rules indicate that data can be derived from some combination of

	self-report,
	formal, and
	informal records. [*157]

The test was designed to be used only on adult males who have been charged with or convicted of a sex crime involving a child or a non-consenting adult. [*158]

The authors specifically warn that it is not recommended for

females,

juveniles,

individuals with no prior sex crime,

possessors of pornography, and

individuals who have engaged in certain consensual sex activities that are otherwise considered a crime, such as

	prostitution,
	ex in public, and
	statutory rape. [*159]

The application of the actuarial predictions to an individual is basically an exercise in inductive logic: those in the development samples who were like him re-offended this percentage of time, so that the risk that he will re-offend is similar to his like group. [*160]

With its highest risk group (score of 6 or more) and at the longest follow-up period (15 years), the STATIC-99 indicates a 52% chance of sex re-offending for those in the group with a score of 6+.

So, even for this extreme, while half of those in the highest risk did re-offend in the base study, many did not. [*161]

Further, other studies have failed to replicate the over 50% re-offense rate, even for STATIC-99’s high risk group. [*162]

Neither instrument measures

	the specific type of sexual re-offense (such as rape or child molestation), or
	the severity,
	imminence,
	duration, or
	frequency of future sexual misconduct. [*163]

Neither do they limit their recidivism statistics to

	predatory sexual violence (which civil commitment laws require without clearly clarifying the term “predatory”),
	serious sexual violence,
	nor to contact sexual violence (which registration and residency laws would seem to target).

Both tests consider static factors, that is, those that are pre-existing characteristics at the time of assessment. Dynamic factors, those that are enduring factors which can alter an individual’s risk for re-offense would be highly relevant to changing risk assessments, but are generally ignored. [*164]

Examples of dynamic risk factors that are relevant to relapse include

	treatment,
	impulsiveness,
	anger,
	substance use, and
	interpersonal relationships. [*165]

The lack of dynamic factors is a common lament about the usefulness of actuarial models since, considering the variability of human behavior in general, risk states ebb and flow over time. [*166]

After Barefoot and Hendricks, the use of actuarial tools in risk assessment caught on fire. [*167]

Many mental health professionals who work with risk assessments of future violence claim that actuarial risk assessments are better and more objective tools than merely clinical assessments. Some even argue that actuarial tools represent best practices in the field. [*168]

Indeed, there is evidence from sex offender researchers that a vast majority of mental health evaluators testifying in sexual predator civil commitment hearings use one or more actuarial instruments. [*169]

Several state laws outlining the sexual predator classifications presume reliance on a specified actuarial tests. [*170]

Still, even best practice should not be admissible as evidence in court if it does not meet the legal standards for expert evidence of validity and reliability.

B. Empirical Evaluation of Actuarial Evidence of Future Dangerousness

The ability of lawyers and judges to challenge and evaluate expert testimony on future dangerousness is critically important in the legal legitimacy of SVP litigation. Still, several practical barriers to effective evaluation exist. Humans simply are hard to predict, making assessments of future behavior impractical. [*171]

As a result, the politically charged atmosphere surrounding the management of sex offenders may lead participants in the process to err on the side of confirming SVP status rather than risk the consequences of not applying SVP restrictions to those who eventually reoffend. [*172]

Expert witnesses admit feeling pressure in the adversarial process to provide positive assessments of risk without adequately explaining contrary research, and even distorting the limitations of the actuarial tools. [*173]

Further, the situation is somewhat unique for the treatment field. The mental health professional who has previously worked directly with the individual to be assessed, i.e., his therapist or counselor, and who thereby has greater insight into the individual’s likelihood of relapse outside of the limited factors covered by the tests, is exactly the one who cannot give predictions in criminal proceedings.

Ethics guidelines generally prohibit a professional from being both treating professional and testifying expert because the dual roles are often conflicting considering the legal consequences of SVP laws borne by the client. [*174]

Next, the lack of clarity in the complicated field of actuarial risk assessments for sex offenders in general undermines efforts to logically assess it as a science. With these challenges in mind, the goal of this Section is to provide an empirical assessment of the actuarial tools in a way that is accessible to legal professionals working in SVP litigation.

1. Testing and the Scientific Method [*175]

As applicable to sex offenders in the U.S., the creators of the actuarial tools took some liberties with pure scientific method. These methods are in contrast with the scientific principle that developmental samples underlying actuarial tools intended to be normative should be representative of the larger population for which the tools were intended. [*176]

This could include weighted random sampling to match the population at issue on relevant variables, such as

	age,
	gender,
	geographic location,
	treatment,
	type of sexual offense,
	etc.

The RRASOR and STATIC-99 developmental samples derived from a limited number of small, nonrandom samples from mostly Canadian and English institutions, with one U.S. sample included in RRASOR. [*177]

The samples included sex offenders released from maximum security prisons and mental health institutions, which may have signified higher risk groups than typical sex offenders.

While intending the tools to be applicable on an international scale, there is no sign the developers made any attempt to conduct truly representative sampling to satisfy scientific principles for a more global application.

Another issue is that the developmental samples included widely variable definitions of the outcome variable of recidivism, including

	charges,
	re-admissions, and/or
	reconvictions,

and they used widely varying time frames for follow-up.

The instruments also obscure common scientific standards for determining the reliability of the scoring system by failing to provide error rates, as the Daubert court mentioned. [*178]

For the purpose of risk prediction research, the error rate is normally reported as a 95% confidence internal. [*179]

Other researchers recently attempted to fill this gap by extrapolating confidence intervals from the STATIC-99 data. [*180]

They found that at the highest risk of the original STATIC-99 experience table, of 52% for a 15- year period for the 6+ score,

	the group confidence interval was 43-60%, while
	the individual confidence interval was 6-95%. [*181]

Confidence interval information, a scientifically useful statistic in social science studies, could be important where a fact finder may lean toward the lower or upper bound of the confidence interval in determining whether the defendant meets the relevant legal threshold of future dangerousness.

2. Validity

Measurement of the predictive validity of actuarial tools is useful to appreciating their abilities.

“Evidence demonstrating the predictive validity of any instrument or assessment procedure is of paramount importance when the goal of the clinician is to draw inferences or conclusions about an individual’s likely conduct in the future.” [*182]

Two common statistical measures of the validity of actuarial tests include

	[1] the correlation coefficient and the
	[2] Receiver Operating Characteristic (ROC).

This part will review research results of these statistics and consider the results from practical and critical perspectives.

[a] The correlation coefficient is a statistic ranging from -1.0 to 1.0 that indicates the direction (positive or negative) and strength of the linear relationship between two variables.

For example, height is strongly and positively correlated with weight, such that the taller a person is the more they are likely to weigh. A correlation coefficient of

	0 means no correlation while
	-1.0 or 1.0 indicates perfect correlation.

For our purposes here, we are concerned with how strong the actuarial assessment is positively associated with sex offense recidivism.

In meta-analyses of international samples, researchers have observed correlation coefficients of

	.25183 and .28184 for RRASOR and
	.33 for STATIC-99 [*185].

In the social sciences, the strength of a correlation coefficient

	less than .30 is considered low, while the
	.33 is considered to be moderately predictive. [*186]

Yet, since positive correlation coefficients range from 0 (no correlation) to 1.0, the results are practically not very strong considering the imposition on liberties and privacy of SVP laws that result.

Another statistic supports this claim. The correlation coefficient leads to a percentage of the variance statistic (r²) that permits a better understanding of what the instrument can account for in terms of recidivism. [*187]

Taking the higher correlation listed for STATIC-99, .33, the variance is its square (.33²) which equals .1, or 10%. Thus, 10% of the variance in sexual recidivism can be explained by the STATIC-99 factors.

Alternatively, this means that 90% of what helps influence sexually recidivism is based on other factors.

This further suggests the tool has little practical significance even if there is a statistically significant correlation with sexual re-offense.

Commentators describe this result as meaning that STATIC-99, even at its highest risk category of 6+, the “high risk” label is a misnomer and STATIC-99’s performance is not much “better than a coin flip.” [*188]

[b] A more recently adopted statistical measure of predictive accuracy is the Receiver Operating Characteristic (ROC), which is derived from a plotting of

	true positives and
	false positives. [*189]

Proponents of this statistic indicate that, unlike a correlation coefficient, the ROC is not reliant upon the base-rate of the sample and is therefore useful in order to compare the accuracy of different instruments on samples with differing base-rates. [*190]

ROC statistics range from 0 to 1.0.

	A ROC of 0 means the instrument is completely inaccurate in its predictive ability, whereas
	a ROC of 1.0 means the instrument is completely accurate;
	a ROC value of .50 means that the predictive ability of the instrument is no better than chance, much like the proverbial coin flip. [*191]

Of course, various studies may report different ROC values where sample characteristics vary

	(such as region,
	type of sex offender, or
	treatment success)

or the study methods differ.

	STATIC-99 has been tested by one of its developers using international samples with ROC scores of between .63192 and .70193.
	The developer of RRASOR has also observed ROC scores in various meta-analyses from .59194 to .68195.

In another small study by others, researchers compared directly the ROC rates on the same group of offenders, yielding comparative ROC scores of

	.68 for STATIC-99 and
	.73 for RRASOR. [*196]

Interestingly, STATIC-99 did not fare much better in its accuracy than RRASOR, despite the addition of six factors to the four from RRASOR.

Correctly interpreting the ROC scores is important. The ROC score of .63 for STATIC-99 means that the test yields a 63% chance that a recidivist will receive a higher risk ranking than a non-recidivist.

	This does not mean that 63% of the group with those characteristics will re-offend (since the statistic is not dichotomous).
	It also does not mean there is a 63% chance this individual will recidivate (as the model is based on group statistics). [*197]

Rather, the ROC statistic is about the accuracy of the relative rankings of the test. [*198] The value represents the

“probability that a randomly selected recidivist would have a more deviant score than a randomly selected non-recidivist.”[*199]

Critics of this focus on ROC statistics assert that since it obscures base-rate differences between groups, it leads to over-estimation of risk predictions when base-rates change over time and as applied to other groups with lower base-rates than the developmental sample. [*200]

The authors of STATIC-99 now seem to agree. Recently, the STATIC-99 developers admitted that their original recidivism experience tables over-estimated re-offense risk in light of reductions in base-rates of sexual recidivism. Indeed, they no longer recommend that experts use the original estimates. [*201]

In a meta-analysis of international samples released in 2009, the developers found that the averaged base-rate of recidivism was 11.2% from over 100 studies with follow-up periods ranging from 6 to 276 months. [*202]

Compared to the original STATIC-99 date, the 11.2% base-rate in the more recent samples was two-thirds of what was found in the original samples. The developers issued new recidivism estimates to replace the original with new criteria for the use of the replacement experience table. However, the new table and criteria have not yet been subject to cross-validation and it will likely take some time for the field to determine how to consider the new evidence.

To maintain its validity, then, potential fluctuations in base-rates over time statistically require that developers re-estimate the tool’s risk scores even for the reference group to which it should apply.

It is also important to recognize that one of the most important limitations of actuarial assessments as a rule is the problem of over-generalization or, more empirically, external validity.

One over-generalizes results of research by presuming the results, tested on a sample of one population, are reliable as applied to another population.

If the second population differs in any risk-relevant way from that of the first population (the reference group), then the predictive result is invalid. [*203]

As actuarial testing of future dangerousness for sex offenders is a relatively recent phenomenon and almost exclusively accomplished on adult male offenders released from prison, the reference group is notably limited in several risk-relevant ways.

One engages in over-generalization by applying the same actuarial estimates on sexual recidivism to groups with observed risk-relevant attributes including

	women,
	juveniles,
	incest offenders,
	older offenders,
	first time offenders, and
	those who were not incarcerated. [*204]

A number of studies support this, whereby the instruments varied, sometimes dramatically, when trying to predict sexually violent recidivism in subgroups. [*205]

It is also prudent to be cognizant of the potential differences in re-offending by geographic and cultural region.

For instance, FBI statistics indicate that criminal offending can vary, sometimes significantly, by state and geographic region of the U.S., including for sexual offenses. [*206]

An author of STATIC-99 conducted a 2009 meta-analysis of international samples that underscored geographic disparity in predictive ability: the average STATIC-99 ROC was

	.90 for the United Kingdom, but only
	.60 for the U.S. and
	.58 in Canada. [*207]

Hence, using recidivism statistics based largely on incarcerated populations in England and Canada

“should be a great cause of concern for making recidivism predictions in the U.S.” [*208]

as a whole, much less to any particular region of the U.S. where recidivism risk may vary.

Another conceptualization of the practical significance of the ROC scores considering the relevancy of base-rates concerns the actuarial tool’s positive predictive accuracy, or the accuracy of predicting re-offending. [*209]

If we borrow the Department of Justice sexual recidivism rate of 5.3%, rounding up to 6% for easy interpretation, and then apply a ROC score of .70, the positive predictive accuracy measures indicate the actuarial tool will be wrong 9 times out of 10. [*210]

This is a common problem as the prediction of relatively rare events is inherently unreliable. [*211]

Others tend to agree that because of the high incident of false positives with these actuarial tests, the un-criticized use of them produces systematic overestimation of risk. [*212]

3. Reliability and Objectivity

Actuarial assessments of risk carry an aura of science and objectivity. [*213] Perhaps this is because the use of numerical percentages and rankings of bounded tiers imbues the predictions with a connotation of mathematical precision. [*214] In addition, the consistent use of the same factors and scoring methodology across cases reduces the appearance of bias. [*215]

A relevant measure of reliability involves inter-rater reliability, which measures how similar observers are in rating the same variable with the same value. In small studies, study observers rate quite high at assigning consistent results with the same actuarial tools; the correlation coefficients (1.0 meaning perfect) for inter-rater reliability, for example

	.95 for RRASOR and
	.87 for STATIC-99.216

Still, these scores provide estimates of raters in supervised studies, presumably with guidelines to assist consistency scores. Outside the structured study environment, there may be less consistency in scoring the instruments, particularly when adversarial allegiance occurs.

A study on rater agreement using STATIC-99 and another actuarial tool in adversarial civil commitment hearings found that while inter-rater reliability was high, there was greater variation in ratings by evaluators on opposing sides in SVP hearings than between experts on the same side. [*217] The same study also found that state experts on average reported higher risk score computations than experts retained by the defendants. [*218]

Two other issues pertaining to objectivity are concerning.

Another type of allegiance effect occurs in which developers of several of the actuarial tools may have professional incentives to conduct further studies that report results supporting the validity of their own tools. [*219]

One study found evidence of allegiance in peer-reviewed validation studies in which the instrument’s author(s) participated. [*220]

This study’s researchers compared the validity coefficients between studies conducted by the authors of three actuarial instruments for sexual recidivism, including STATIC-99, and found that on average

	the instrument authors reported significantly larger correlation coefficients (average r=.37)
	compared to non-authors (average r=.28). [*221]

A recent meta-analysis found that the average ROC for STATIC-99 varied dramatically depending on

	whether the studies were published (ROC .80; n = 21)
	or not published (ROC .60; n = 42). [*222]

This suggests that researchers are more likely to reveal studies with statistically stronger results.

It is also noted that while the RRASOR and STATIC-99 developers intended their instruments to be easily and objectively scored, there is some room for error based on the availability and veracity of the data.

With developers encouraging raters to access a broad spectrum of data, including self-reporting, the potential for error is real. Accurate scoring relies upon good data. Similarly, the probability of missing data may also skew results.

Then when the factors involving arrest are considered, another source of error occurs since arrests may be over-inclusive since arrests are not legally sufficient proof of guilt.

Accurate scoring also relies upon adequate training.

4. Training

There are no criteria for the scope, time, or regimen for training or otherwise certifying potential assessors on the actuarial instruments. There are no formal or published coding rules or training manuals.

Mostly, information is vicariously available on the internet and through occasional training classes. The authors of the STATIC-99 simply write in their coding rules that they

“strongly recommend training in the use of STATIC-99 before attempting risk assessments that may affect human lives.” [*223]

5. General Acceptance

The use of actuarial tools for sex offenders in regular treatment decisions is one matter, and there is no reason here to challenge their general acceptance and use in the treatment setting.

But, the use of them in a court, where the stakes are qualitatively higher and the professional standards different, to justify legal restrictions is another.

Many mental health experts who work in sex offender treatment believe the actuarial instruments are currently best practice and are willing to use them and testify as to their conclusions.

There are reasons to believe [that] the tide of approval of actuarial tools, if there indeed was ever general acceptance, is turning in recent years.

Mental health professionals are starting to realize that

	dropping base-rates of sexual recidivism in the U.S. and
	the variability of base-rates among different sex offender populations

undermines the continued viability of ROC scores and the use of the experience tables. [*224]

In an amicus brief in a death penalty case, the American Psychological Association recently contended that any prediction of dangerousness is unreliable in court if it does not consider the base-rate of the specific population for a set period of time. [*225]

The American Psychiatry Association’s recent stance has not been to address the actuarial tools directly, but in a position statement, it asserts,

“Although psychiatrists cannot predict dangerousness with definitive accuracy, they can identify risk factors associated with an increased likelihood of offending.” [*226]

Thus, many legal and mental health practitioners and researchers who work in the sex offender area, and who feel strongly and justifiably about it enough to publish their professional opinions in peer-reviewed journals, warn against the use of actuarial tests in legal settings because their significant limitations make their use questionable in light of the consequences for which they may be put in terms of the deprivation of liberty. [*227]

Putting the use of actuarial evidence in context, a mental health practitioner extrapolated from the Department of Justice sex offense recidivism study to conclude that using STATIC-99

	would have averted only 3% of sexual offenses committed by released offenders (sex offender and non-sex offenders)
	while hundreds of non-recidivists would have been unnecessarily detained. [*228]

Indeed, some suggest it may be professionally unethical for mental health practitioners to testify in court about predictions on the likelihood of the individual reoffending, [*229] at least without being absolutely clear about all the substantive limitations. [*230]

Others argue that the group-based model of tools means that even if using the actuarial assessments may be appropriate at initial assessments to consider pursuing civil commitment, mental health professionals should decline to use them in actual court hearings about individual risk predictions. [*231]

In sum, even if there had been general acceptance of RRASOR and STATIC-99 near the time of their inception, such general acceptance was likely only about their ability for treatment purposes.

It is highly questionable whether there was ever, and in any event likely is no longer, general acceptance in the mental health field about their validity for use in individual assessments of risk in SVP determinations in law.

IV. Judicial Perspectives on Future Dangerousness Evidence

Since the Supreme Court approved mental health experts’ testifying about future dangerousness and found sexual predator civil commitment and registration laws to be constitutional, the introduction of actuarial risk assessments through expert testimony is common practice in SVP determinations.

Empirical observations about actuarial predictions of future dangerousness, outlined in the prior section, did not go entirely unnoticed. In a general, retrospective critique of the journey of sexual predator laws, Janus and Prentky summarized the resulting conflicts between science and law that ensued.

Suddenly, courts were confronted with a number of potentially embarrassing facts:

	the group-based nature of risk assessment,
	the tension of applying probabilistic estimates from life tables to defendants who departed significantly from the membership of the reference groups used to derive the estimates,
	the difficulty of evaluating and incorporating dynamic risk factors, and
	the problem of translating statutory language into scientifically meaningful terms all became quite clear. [*232]

Interested observers believe that the common judicial reaction to these issues was to permit the expert testimony without any meaningful inquiry into the scientific validity and reliability of actuarial assessments. [*233]

This Section examines the evidence that supports this observation by reviewing the main themes that emerge from a comprehensive review of case law in which either of the actuarial tools addressed in this paper was mentioned.

A. Daubert/Frye Challenges

Overall, courts were not inclined to find challenges to the reliability of the actuarial tests to be dispositive about their admissibility of actuarial evidence.

Indeed, in relatively few of the cases referring to RRASOR or STATIC-99 in SVP cases was it evident that the courts conducted any type of reliability analysis, whether referring to

	Daubert,
	Frye, or
	the relevant alternatives.

Most of the discussion in the cases was on other issues, such as

	sufficiency of the evidence,
	due process, or
	ineffective assistance of counsel.

The courts employed common strategies for averting the reliability issue.

1. Frye Challenges

Of those cases referring to Frye, there was a split as to whether Frye applied at all to the actuarial tests.

First, most of the Frye-based courts declined to hold Frye hearings to determine the admissibility of actuarial-based evidence because the courts determined the actuarial tool was not a scientific test, making the Frye general acceptance test inapplicable. [*234]

Most of these courts ended the matter there, without further explication. Still, a few opinions referred to precedent holding that medical testimony did not constitute scientific evidence for purposes of Frye.

The reasoning was that since mental health professionals’ assessment of future dangerousness was medical testimony, it therefore cannot be scientific evidence. [*235]

A few judges also explained that because actuarial tools on future dangerousness had a predictive value “far less than 100%,” then they cannot “have an aura of scientific infallibility” for which Frye was concerned. [*236] This was true despite experts referring to actuarial tools in scientific terms. [*237]

The consequence of the 'no-science, no-Frye' ruling was made quite clear by a California appellate court. In overruling the trial court’s decision that STATIC-99 was unreliable, the appellate court stated that

“while the accuracy rate of 71 percent may not meet certainty requirements applicable to new scientific evidence, such requirements have no application to expert psychological opinion testimony based in part on actuarial instruments.” [*238]

A reasonable implication, then, was that if Frye applied, the appellate panel thought that the actuarial tool would be inadmissible. But, as shall be seen, this implication has not been adopted by other courts.

The alternative Frye path, far less prevalent in the cases than the first, was evident with courts that assumed, with little or no discussion, that an actuarial test of future dangerousness constituted scientific evidence for which a Frye hearing was appropriate. Still, almost all of these courts found the tests to be reliable, asserting they had been generally accepted, thereby requiring no further validation. [*239]

One case stood for the very minority rule that an actuarial test may be invalid as applied. While incorporating by reference a companion case in which actuarial tests were found to be admissible under Frye, the court accepted the argument that the tools were reliable only for adults, and thus inadmissible as applied to the juvenile defendant. [*240]

In both divergent paths of applying Frye, the impact of case precedence was of utmost importance in the courts’ decisions. Most of the decisions did not seek to justify

	either the Frye inapplicable ruling or
	the general acceptance determination

with much of an independent analysis.

Instead, almost all of the cases relied expressly upon case precedent. [*241]

As an example, the Illinois Supreme Court found it important that

“at least 19 other states rely upon actuarial risk assessment in forming their opinions on sex offenders’ risks of recidivism”
and
“eight of these states have directly addressed the Frye question.” [*242]

The court recognized that reliance upon case precedence may be a “hollow ritual,” but justified it, arguing that the issue of general acceptance had been “thoroughly litigated” already in several other states. [*243]

The Illinois high court specifically pointed to a Florida case [*244] which had held a Frye hearing and found general acceptance based on the affirming testimony of the experts in the underlying case and a list of academic papers. [*245]

Notably, a review of the academic papers cited in the Florida case to support general acceptance reveals that almost all of the academic articles list at least one author who is also an original developer of the most commonly used actuarial tools, including RRASOR and STATIC-99.246

2. Daubert and Alternatives

Daubert
is virtually absent. The few courts to have engaged the Daubert standards for the SVP future dangerous issue found the actuarial assessments to be admissible.

In U.S. v. Shields, the district judge summarily admitted actuarial-based predictions, concisely concluding the standards of general acceptance and peer review had been met, without further discussion. [*247]

Another court, finding that STATIC-99 was scientific evidence, also summarily ruled that the tool met Daubert’s four “general observations,” relying upon the state expert’s assertions. [*248]

A court in a state rejecting the Frye general acceptability standard for expert evidence still concluded that actuarial evidence was sufficiently reliable to be admissible, based in large part on finding no other courts had excluded it. [*249]

A few cases eschewed the reliability question by referring to the influence of Barefoot on future dangerousness testimony, although in summary terms. [*250]

One court acknowledged the seeming inconsistency between Daubert and Barefoot when it came to future dangerousness testimony, but dismissed it. [*251]

A few others pointed to Barefoot as holding that the question about reliability in actuarial assessment testimony goes to the weight of the evidence, not its admissibility. [*252]

Another frequent strategy courts employed to avoid directly addressing the reliability issue was to contextualize the actuarial assessment as not being the sole basis for the expert’s opinion. [*253]

One court, for instance, ruled that even if the experts used STATIC-99 heavily in their testimony, the fact they testified that the results were merely a part of their overall assessments meant that the defendant’s

“quibbles with their methodology in employing [STATIC-99] are irrelevant.” [*254]

Another court expressly declined to “second-guess” the experts in their use of the actuarial tests, contending it was up to the defense to have an expert challenge the tests rather than look to the courts to do so. [*255]

Overall, there was little evidence of any substantive critique of the empirical quality of the actuarial-based assessments. Judges who paid such heed suffered mostly in lone dissents or concurrences. [*256]

A rare note occurred when a concurring judge complained that RRASOR used “only” four simple factors in scoring. [*257]

Another judge put it in simple terms:

Intuitively, I find it hard to believe that the knowledge that an 18-year-old man has one conviction for lewd behavior involving an unrelated boy is sufficient information to conclude that there is a 48.6% probability that the man will commit a violent sexual crime during the next decade.
It also troubles me that [the defendant] can successfully complete a full course of rehabilitation and the RRASOR will not have changed its assessment of him.
With or without successful treatment, he has a 48.6% chance of doing a bad act in the future according to this test. [*258]

A judge in a dissenting opinion in another case also criticized the use of STATIC-99 because it failed to incorporate dynamic factors that could reduce the individual’s risk and warned:

“Does anyone remember the Soviets’ misuse of their mental health system for incarcerating enemies of the state? Does this seem at all similar?” [*259]

Notably, the critical opinions were much more likely than the majority opinions to go beyond legal precedent and cite extensively to empirical publications challenging actuarial assessments. [*260]

The eminent Judge Posner of the U.S. Court of Appeals for the Seventh Circuit represented the best job of trying to better understand the limits of actuarial assessment, albeit in a sentencing guidelines case.

In U.S. v. McIlrath, [*261] the defendant called a forensic psychologist to testify at the defendant’s sentencing hearing on a charge involving internet predation of a minor. The doctor testified he used STATIC-99 and derived an estimate of a 9 to 13 percent sexual recidivism risk. [*262]

Upon McIlrath’s appeal on the claim that his sentence was too severe, Judge Posner announced the decision to affirm the sentence, though he pointedly questioned the expert’s use of the actuarial tool.

Judge Posner chastised that neither counsel addressed

	whether STATIC-99 had been validated by generally accepted methods
	or whether the test would pass the Daubert admissibility standard. [*263]

Judge Posner then, seemingly sua sponte, raised the empirical issues that would support the devaluation of the expert’s risk assessment, though without making any conclusions. The court indicated that even the advocates of STATIC-99 admitted its moderate predictive accuracy and that even though

“[i]t may be more accurate than clinical assessments, … that might not be saying much.” [*264]

Evidence abounds that glossing over of the reliability issue may be pragmatic.

This can be implied with courts ruling that the relevant Daubert or Frye standard does not exclude the actuarial evidence, but then adding seemingly unnecessary dicta.

In support of its approval of actuarial evidence, a state court pointed to the fact that

“in several jurisdictions actuarial risk assessment is mandated by either statute or regulation.” [*265]

In a more dramatic ceding of this question, one opinion involving a challenge to STATIC-99 evidence stated that courts must

“respect [the] policy of the legislature with respect to the trustworthiness of psychiatric opinion evidence involving sexually dangerous persons.” [*266]

Other courts, after ruling that neither Daubert nor Frye applied, repeated that,

“where the trier of fact is required by statute to determine whether a person is dangerous or likely to be dangerous, expert prediction may be the only evidence available.” [*267]

An unusual case involving a conflicted expert is useful to this point about pragmatism.

State v. Nichols [*268] was a sex offender registration case. The defendant was classified based, in part, on being assigned a score of 5 on STATIC-99. [*269] In appealing the risk designation to the court, the defendant called as an expert a mental health professional that was at that time a member of the Sex Offender Review Board (SORB), the state agency responsible for initial determinations on sex offender level. [*270]

When challenged about the STATIC-99 score, the expert responded

“I’ve never been a STATIC-99 fan.” [*271]

The judges were clearly appalled:

“This response the Court finds to be somewhat shocking in that the SORB consistently uses this risk assessment tool in deciding the risk to re-offend. One would think that as the professional, a "sex offender treatment specialist" appointed to the board because of his expertise would have challenged the use of this tool in assessing risk.” [*272]

As a result, the court found that the defense expert’s testimony was not credible and it upheld the state’s classification level. [*273]

B. The Standard of Likelihood to Sexually Re-offend

The intersection between legal language and scientific knowledge used in the mental health fields has proven challenging here.

The terminology used in the SVP laws varies somewhat by state and it is unclear whether there is any meaningful difference between them.

For example, in civil commitment laws, the future dangerousness concept for sexual violence is variously described in statutes as

	“likely,” [*274]
	“more likely than not,” [*275]
	“substantial probability,” [*276] and
	“irresponsible for personal conduct with respect to sexual matters.” [*277]

Two statutes with the “likely” term define it further as

	“more likely than not” [*278] while other laws define it to be a
	“propensity . . . of such a degree that as to pose a menace to the health and safety of others” [*279].

These definitions are hardly clear. Does the “more likely than not” mean 51% chance? Some mental health experts working with sexual predator assessments seem to think so. [*280]

The California Supreme Court rejected the idea that “likely” was synonymous with “more likely than not,” ruling instead that it referred to a “serious and well-founded risk.” [*281]

This court implied this standard was less burdensome than “more likely than not” but provided no substantive guidance.

Another court defined the statutory “likely” language to mean “probable rather than merely possible.” [*282]

With the allure of the seeming certainty from the definitive numbers derived from the actuarial tools, one could reasonably desire that legislatures provide more specific guidelines on the minimal limits that equate to likelihood of re-offending considering the dramatic consequences of the laws.

As one commentator has expressed the issue:

	is civil commitment to protect the public acceptable if there is a 1% chance of recidivism
	or, as others have suggested it is only justified if there is at least a 70% chance of recidivism? [*283]

The lack of any meaningful articulation of the legal standards in SVP laws is an important reason, combined with the empirically doubtful nature of the future dangerousness actuarial tools, why the question remains:

	is this good science applied with clear legal standards
	or does it lead to arbitrary and capricious decisions unduly restricting the liberty of those labeled sex offenders at the expense of public coffers?

The criteria for sex offender registry tiers are often no more helpful.

Several state registration acts provide three levels of classification that relate to the specific registration requirements and time periods, differentiating them with simple categorizations of risk of sexual recidivism as

	“low,”
	“moderate,” and
	“high,”

without further statutory definition. [*284]

The statutes also leave open the question about the relevant time period:

	1 day;
	1 year;
	5 years;
	25 years;
	life?

It is also unclear whether there is any difference as to whether the case is

	about civil commitment
	or registration.

Reviewing the case law, few courts recognized the issue of the relevant time period for which future dangerousness assessments were applicable.

This was true despite RRASOR and STATIC-99 experience tables being typically based on 5-, 10-, and 15-year follow-up periods.

Of those few opinions addressing the time issue, one court, noting the legislature had not specified any particular time period of risk for its SVP civil commitment statute, expressly declined to specify one. [*285]

Instead, the court ruled that it would not adopt the one-year risk period the defendant suggested, referring to the legislature’s recognition that sexual predators needed long-term care. [*286]

On the other hand, other courts in reviewing civil commitments also acknowledged a lack of specificity on time in the laws but noted that the present tense language should be construed to mean dangerous at the time of the proposed commitment. [*287]

Still, these courts summarily accepted the long-term actuarial evidence as relevant to determining the defendants’ immediate risk. [*288]

A case with an interesting fact pattern reflects the temporal risk conundrum. The court considered the “high risk” assessment from STATIC-99 as relevant to the defendant’s current risk of re-offending, despite acknowledging that STATIC-99 measured long-term risk potential. [*289]

However, the risk assessment was provided at a 2007 hearing for a continued civil commitment, while the high risk score was based on offenses occurring in 1975, 1982, and 1989, and the defendant had been continuously incarcerated since his 1989 conviction. [*290]

Hence, by almost completely ignoring the temporal issue, yet still accepting the actuarial assessments based on 5-15-year risk periods, this again suggests pragmatism since there are no known empirical tools that adequately assesses imminent risk. Yet, it also implies arbitrary and capriciousness in that even assessments of immediate risk are predicated upon long-term projections.

With respect to the likelihood standard, there was a strong tendency to highlight actuarial predictions that were around or exceeded the 50% mark, even if they used 15-year experience scores to reach it. [*291]

Opinions have clearly stated that the over 50% actuarial scores legally met the “more likely than not” standard, [*292] and a STATIC-99 score of six was enough to conclude the defendant “comes out over 50 percent,” thereby meeting the “likely” to sexually re-offend threshold. [*293]

Other courts were also clear about being strongly influenced by high actuarial scores in confirming SVP status. [*294]

In a unique case, another court upheld the expert’s statement that an actuarial score of 52% meant the defendant was “still above the threshold that is represented by the term, ‘likely,’” despite the defendant’s voluntary castration. [*295]

Despite the accepted relevance of 50% actuarial scores to supporting sufficiency of the evidence queries, many of the opinions upheld sex offender classifications in the face of actuarial scores of, at times, far less than 50%.

Several opinions referred to the state expert as using the actuarial-based percentage and then adding more percentage points on top, reportedly based on additional risk factors beyond the testing instruments to derive a higher end predictive risk. [*296]

Some experts described the actuarial numbers as providing a baseline for which the experts can adjust based on other factors. [*297]

In contrast, another expert testified it was not ethical or empirical to add factors to the STATIC-99 score to get a different percentage. [*298]

Indeed, there is no empirical evidence that modifying actuarial scores improves the accuracy of predictions. [*299]

Simply, the problem with this is that it appeared that the actuarial instrument scores ended up being “little more than empirical window dressing for clinical judgment,” which the actuarial tools were designed to improve upon. [*300]

Many other courts affirmed sex offender status despite actuarial predictions that fell below 50%. They are addressed throughout the next sub-sections that discuss

	cases in which party allegiance appears to have impacted predictions and
	cases which contain erroneously representations of the actuarial evidence.

But, for now, an example of how a prosecutor rhetorically justified a sexual predator label in the face of low actuarial scores is illustrative.

A prosecutor in Pedroza v. State argued that even low numbers from the actuarial tools was sufficient to constitute the legal standard of “likely” to re-offend:

Even taking [the expert’s] tests, the RRASOR, about 11% failure rate after ten years.
The SORAG was, I think-did he say 17% after seven years. Another one, . . . about 12% he’s likely to re-offend. The base rate for all offenders was 22%.
All that sounds like ‘likely to re-offend,’ you know. The only—it’s only about 12% chance here that you have cancer or are going to die. Whoa, whoa, whoa! That’s pretty scary when we’re talking about human lives and behavior. That’s likely. [*301]

The jury agreed and Pedroza was civilly committed. The appellate court affirmed, with a strong dissent that the risk percentages were too low to legally constitute “likely” in face of defendant’s liberty interest. [*302]

C. Battle of the Experts

Case law provided considerable support for the critical suggestions that scoring and reporting on the capabilities of actuarial assessment remain subject to

	the influences of adversarial allegiance and
	subjective bias.

One area of disagreement was on whether the specific actuarial tools were generally accepted or not. Where there was contention,

	the state experts tended to argue they were generally accepted, while
	the defense experts asserted the opposite. [*303]

A defense expert in a case, when asked about professional acceptance of actuarial instruments (including RRASOR), warned:

I think it’s a real concern here that these instruments promise something they don’t deliver. And they have an incredible aura of scientific certainty and preciseness that’s just not there if you peel away the second layer of the onion. Therefore, I think psychologists do a disservice to the profession and psychiatrists, too, for that matter, when they use them and act as if there’s this precision and with a scientific basis that’s not really there. [*304]

Another defense expert more dramatically charged that the state’s experts were unethical in using the actuarial tools. [*305]

On an applied level, there was some disagreement between experts on the two sides in computing the resulting scores using the same actuarial instrument, in the directions consistent with party affiliation. Almost always, the state experts gave higher predictions than defense experts and with more certainty. [*306]

One state expert averred that a California study showed that people like defendant with a STATIC-99 score of 8

“had a 100 percent rate of re-offending.” [*307]

Another state expert opined that the high STATIC-99 score meant simply that the question was not if the defendant would sexually re-offend, only when. [*308]

Many courts permitted the experts themselves to opine as to whether the particular defendants were likely to re-offend. This is not surprising since many states’ expert evidence rules permit experts to testify on the ultimate issue, [*309] here being the risk of recidivism of the individual defendant.

State experts opined that the defendant had

	a “high” likelihood, [*310]
	was “more likely than not,” [*311]
	posed an “unacceptable public risk” [*312] of sexually re-offending, or
	it was “substantially probable” that he would sexually reoffend. [*313]

On occasion, it was clear in the cases that the experts’ opinion on the likelihood of re-offending was based solely on the actuarial evidence. [*314]

As previously noted, many cases found sufficient evidence of the likelihood of the defendants’ recidivism for SVP status when the actuarial results were around or above 50%. Still, many other cases involved actuarial results below that, yet the courts almost unanimously upheld the states’ cases for sex offender status.

In one case affirming the SVP classification (with a “likely” to re-offend criterion) for registration purposes, the court was unconcerned with a STATIC-99 score that associated with a 19% chance of re-offending in 15 years:

“we believe that his one in five chance of committing a sexually oriented offense in the next 15 years could be viewed as unacceptably high.” [*315]

Affirming SVP classifications occurred, interestingly enough, even when the state’s own expert(s) gave low actuarial scores. Notably, in these cases, the state experts consistently supported the continued use and practical abilities of the actuarial instruments.

Presumably this widespread advocacy by state experts is due to the states’ interest in maintaining the role of actuarial tools in sex offender proceedings. In order to accomplish this seemingly contradictory approach, the experts employed, and the courts adopted, two general tactics, often in combination:

	downplaying the absolute need for high rates from the actuarial tools in every case, while
	endorsing clinical judgments to support increasing the actuarial-based rates of risk.

Thus, while opining that the defendant was likely to re-offend, state experts did not claim the actuarial tools were wholly flawed. Instead, their argument was one directional: the experience tables under-estimated risk, either as a whole or as applied to the individual.

In other words, the state experts in these cases tended to convey that the instruments themselves were acceptable but that their experience tables under-estimated (never over-estimated) risk. [*316]

Some experts explained the consistent under-estimates were possible because the tests assumed convictions rather than charges. [*317]

There was virtually no counter challenge that, in fact, the developers of RRASOR and STATIC-99 are clear that the recidivism experience tables are based on samples with a combination recidivism definition that variously included

	charges,
	convictions, and
	re-admissions. [*318]

Nonetheless, most experts did not try to estimate the size of the purported under-estimation, but exceptions included

	one state expert contending that actual re-offense rates were three times that reported by actuarial tables, [*319] and
	another expert claiming the actual rates were five times those reported with the STATIC-99 instrument [*320].

As an example of why the actuarial tool as applied could not fully assess the higher risk of the individual defendant, the state expert in one case scored the defendant as a 3 on the STATIC-99

(for which the experience table provided a 19% risk of sexual re-offense in 15 years),

but then explained that the defendant

“was one of the thirty percent of persons for whom the STATIC-99 test did not diagnose properly.” [*321]

The expert in another case argued that the STATIC-99 rate was inaccurately low as to this defendant since it is

"’slanted towards people who are not very good at hiding their offenses’ and a clever offender tends to get a lower score.” [*322]

In many cases confronting lower actuarial scores, the states’ experts cited the importance of other risk factors, observed in their clinical assessment, which were not considered by the instruments. [*323]

In accepting the clinically-derived risk factors as evidence to support a higher risk of future dangerousness, the courts concluded as a legal matter that the actuarial evidence, while relevant, was not dispositive. [*324]

This was evident in cases where even when the actuarial scoring by the state’s own experts were 0 points, the courts still upheld sex offender status decisions.

In two cases, the state experts scored the defendant with 0’s, the courts upheld the sexual predator classifications based on the experts’ opinions about the presence of other risk factors [*325] and the fact that even with 0 scores the tools indicated at least a 4% risk level of re-offending. [*326]

There were also partisan differences in how experts clarified the abilities and limitations of the actuarial tools.

Defense experts were much more critical about the accuracy of the actuarial instruments as a whole and much more likely to be cited as specifying weaknesses in the tests. [*327]

As just discussed, state experts often claimed that the experience tables underestimated the risk of re-offense, but defense experts were more likely to argue the opposite. [*328]

Various defense experts argued that the experience tables over-estimated risk

	because the test over-included the types of sex offenses that qualified as defined in the SVP law (such as involving violence), [*329]
	or because they were based largely on non-diverse samples of high risk offenders from Canada and England [330] where base-rates of offending exceeded the U.S. [331].

Defense experts were also more likely to have declined to score actuarial tools for the defendants,

	either because they believed the group-based models could not be used for individualized assessments, [*332]
	were generally inaccurate, [*333]
	and/or they were not validated on juveniles [334] or other types of subpopulations in which the defendant was a member [335].

In contrast to the state experts’ wide support of actuarial tools, a defense expert called STATIC-99

“confusing, distracting, and intellectually dishonest.” [*336]

In sum, the cases indicate that scoring the actuarial factors may not be as objective and simple as desired since differences of opinion in the coding rules emerged, despite the few factors involved.

In addition, adversarial allegiance is evident in the divergent paths that state and defense experts took in their approach to actuarial tools and their abilities.

Yet, courts consistently sided with the state experts’ approach. The overall trend suggests arbitrary decisions concerning the influence of actuarial evidence.

When the actuarial tool shows a high risk (>50%), little additional support was needed to support an SVP determination. But even if the actuarial score was less than that, even at 0, state experts and courts accepted other evidence to support a higher risk assessment.

This trend was consistent despite many defense experts providing specific information undermining the reliability and validity of actuarial assessments.

D. (Mis)Interpretation of Actuarial Prediction

By critically analyzing the messages in the appellate opinions, it is clear that many of the experts and courts erroneously interpreted the actuarial tools and their purposes.

This sub-section outlines four types of interpretive problems.

The first, which was quite common amongst the cases, was the improper interpretation of the group-based scoring [*337] scores as providing risk assessment numbers that were individualized to specific defendants. [*338]

In describing the general usefulness of actuarial results to individual predictions, for example, one opinion stated that the STATIC-99 actuarial instrument

"calculates defendant's risk of re-offense,” [*339]

while another court referred to the expert stating that actuarial instruments as

“commonly used to assess an individual’s risk of recidivism.” [*340]

In further complicating the error on the ultimate issue, an opinion described the trial court holding a Frye hearing where experts presented results from multiple actuarial

“instruments that experts used to determine whether a defendant qualifies as a sexually violent predator.” [*341]

In addition, in many cases, the expert directly imputed the actuarial risk statistic from the experience table to the specific defendant. [*342]

Examples included an expert testifying the defendant’s score of 7 on STATIC-99

“means that the likelihood of [him] being convicted of a new sex offense is 39% within 5 years” [*343]

and another expert stating the results of RRASOR and STATIC-99

“indicated that [he] was likely to re-offend” [*344].

An expert in a recent case contended that STATIC-99 could “diagnose properly” an individual’s risk, a judgment the court adopted. [*345]

The individualization of actuarial scores is in direct conflict with the tools’ developers instructions that the tools cannot be used in that way. [*346]

Conveying the experience table result as a group-based statistic was decidedly inclined in favor of the state. Thus, courts were more likely to accept the individualized statistic when the percentage or risk level was higher, while highlighting the group-based nature of actuarial assessment when the result was lower. [*347]

The second interpretive problem occurred with opinions that presented erroneous representations of the actuarial tools.

Many appellate opinions referred to experts’ testifying about actuarial test results that appeared inconsistent with, even contradicting, the actual tests that the experts purported to use.

One expert used the meta-analysis that helped form the resulting RRASOR and STATIC-99 tests as if the meta-analysis were itself an actuarial tool, contrary to the developer’s specific intention. [*348]

In another example of incorrectly applying a test against the developer’s criteria, a state expert used STATIC-99 to support his assessment that the young defendant was likely to re-offend, despite the other state expert’s critique that the tool was not appropriate for juveniles. [*349]

Regardless, the court upheld the evidentiary use of the actuarial score to civilly commit the defendant, even though the defendant had been 14-years-old at the time of his sexual offense and had been incarcerated ever since, without further charges.

In another case, the opinion erroneously indicated that STATIC-99 was a tool to diagnose psychiatric disorder, explaining STATIC-99

“measures both paraphilia and antisocial personality disorder.” [*350]

Some experts created new hierarchies, such as asserting that STATIC-99 places the defendant in a category of

	“very high risk” [*351] or
	“extremely high” risk, [*352]

despite the top STATIC-99 ordinal grouping of merely “high risk.” [*353]

Another expert stated that according to STATIC-99, the defendant was

“in a class almost by himself.” [*354]

Other misrepresentations involved the specifics of

	how the actuarial tests were developed, [*355]
	scoring methodology, [*356]
	the tools’ definition of recidivism, [*357] and
	what type of recidivism a specific actuarial tool was designed to predict. [*358]

While none of these, on their own, may be hugely important to the resulting SVP status decisions, they cause one pause in considering the quality of the training, experience, and care experts took with respect to these actuarial assessments.

These examples are also helpful in that the opinions lacked any indication that such errors were revealed in the cases themselves by Daubert’s and Barefoot’s adversarial check of cross-examination or in the appellate review process.

The third issue involved scoring discrepancies.

Despite the purported ease of RRASOR and STATIC-99 scoring, several cases pointed to state experts changing their own scores on the defendants from time to time. [*359]

In addition, there were cases with discrepancies with scoring of the defendants,

	both as between experts on the two competing sides [*360]
	and as between the two or more experts on the same side. [*361]

The courts in those cases generally treated these discrepancies as issues of fact rather than as problems inherent in the instruments and their scoring rules.

The fourth interpretive problem area involved how the experts conveyed the significance of using multiple actuarial tools.

Here, there was a common empirical flaw in characterizing the use of multiple actuarial tools as somehow strengthening the reliability of scores from the individual tools, at least when they were relatively consistent in the direction of risk predictions.

For example, in People v. Calderon, the opinion represented that the expert used STATIC-99 and “confirmed the accuracy of” the result with that obtained from RRASOR which the court described as "rely[ing] on a different basis of prediction.” [*362]

Other experts also based their predictions of the defendants’ future dangerousness on the reinforcing RRASOR and STATIC-99 scores, without the opinions’ mentioning any issues with their overlapping nature. [*363]

Issues with inter-correlation between variables makes this problematic as RRASOR and STATIC-99 share several factors, [*364] which make sense as their factors were derived from the same offender recidivism literature by the same developers. [*365]

An empirical study of the use of these multiple actuarial tools showed that this practice of using multiple tools does not result in a statistically significant better prediction than what would be provided by using the single best actuarial risk scale. [*366]

On the other hand, a study found that less than 5% of their sample received consistent rankings of high risk or low risk using five actuarial tools, including RRASOR and STATIC-99. [*367]

Despite the multiple criticisms of actuarial risk assessment as conveyed in this section, defendants contested, but ultimately lost, on ineffective assistance of counsel claims that their lawyers failed to properly challenge the

	admissibility,
	validity, and
	interpretation

of the actuarial tools. [*368]

Together with the issues in the cases regarding the variability in the likelihood standard and the adversarial allegiance as previously discussed, the frequent misinterpretations of the actuarial tests merit attention.

The disconnect between legal language and the scientific judgments by mental health professionals does a disservice to the interests of justice and under-scores the inconsistency of the legal standards between cases.

V. Conclusions

In order to balance the goals of protecting the public while adhering to the constitutional principles of liberty and privacy, legal actors in the criminal justice process have an ethical duty to critically evaluate scientific testimony that is offered as expert evidence.

A number of practitioners and academics with joint law and doctoral credentials have argued for a ban on actuarial risk assessment as unreliable science in SVP proceedings. [*369]

With the state of the current assessments and the lack of standards for the legal criteria from the SVP laws, this author concurs.

There are a number of reasons why legal personnel have been loath to challenge mental health experts’ evidence on future dangerousness.

It could be the natural “desire for authoritative methods for generating knowledge.” [*370] The aura of objectivity and science serves this positivist aim. [*371]

Judges may also be relying upon the adversarial process via the litigating lawyers, via Daubert and Barefoot, to weed out bad science. But, by doing so judges are eschewing their gate-keeping obligations onto the attorneys, who may also be ill equipped in the scientific literacy department. [*372]

The discussion of the cases in Section IV emphasizes the failure of both the adversarial process and judicial gate-keeping in challenging this suspect science.

SVP laws are also an area of law for which people are passionate and it invokes moral judgments. [*373]

The argument goes like this: the legislatures created sexual predator laws with the future dangerousness concept and we (mental health experts, judges, and lawyers) therefore need to use the best available evidence to make those decisions; even if the legal standards remain vague; even though current models of actuarial risk assessment suffer large gaps in validity and reliability. [*374] However, this result exalts the political over the legal.

For purposes of my point, assume there is empirical evidence that an astrology-based model of risk assessment is shown to be better than chance. Then assume the astrology model is also empirically shown to be a better predictor than a psychic model of risk assessment.

	Would these facts justify the acceptance of the astrology risk tool as scientifically valid?
	And, would it be enough as evidence in judicial proceedings in which an individual’s liberty and privacy are involved? [*375]

Advocating a new sex offender policy is beyond the scope of this article. Still, considering the myths about the recidivism risk of sex offenders that led to the rise of SVP laws, there is a strong basis for simply getting back to the basics of sentencing policy in terms of being transparent if the intent is to rely directly on punitive incarceration, as well as to engage with the rehabilitative possibilities offered by much work done in recent years in sexual offender treatment. [*376]

In the end, practicality leans toward a resignation that officials will likely continue to believe that political and public interests are served by the current SVP regime. [*377]

The naturalistic fallacy, i.e., confusing what is with what one thinks ought to be, here in terms of having an objective ability to predict risk, is not an uncommon tendency.

Historical juridical authority, though, should not be entirely ceded in protecting what constitutes expert evidence in law.

The justice system must use the Daubert and Frye standards to critically analyze predictions of future dangerousness in the SVP context. Actuarial assessments of risk are couched in terms of science and objectivity and thus should be evaluated regularly for their reliability and validity.

Education is key for those involved in SVP status determinations, not only as to the ability of actuarial science to accurately predict future dangerousness, but how best to challenge actuarial assessments in court to ensure that only those tools that are sufficiently reliable and valid are admissible. [*378]

The ability of legal professionals to better understand and critically analyze science is a matter of additional learning and guidance is available. [*379]

If this article is not convincing on the suspect nature of the science of future dangerousness evidence, at the very least it provides some assistance for digging deeper into the empirical nature of actuarial tests. Constitutional principles and evidentiary rules require adherence to rationality despite the mythical specter of a growing population of dangerous sexual predators.

Public Safety, Individual Liberty, And Suspect Science

Future Dangerousness Assessments And Sex Offender Laws

Introduction

I. Special Treatment for Sex Offenders

A. SVP Laws

II. Expert Evidence and Sex Offenders

A. Expert Evidence Law

B. Experts and Future Dangerousness Assessment

III. Actuarial Testing of Future Dangerousness

A. The Sexual Recidivism Actuarial Tests

B. Empirical Evaluation of Actuarial Evidence of Future Dangerousness

IV. Judicial Perspectives on Future Dangerousness Evidence

A. Daubert/Frye Challenges

B. The Standard of Likelihood to Sexually Re-offend

V. Conclusions