Method

Sample of Studies

To obtain studies using national probability samples in which researchers examined self-reported effects or psychological correlates of CSA, several strategies were employed. Studies using national probability samples already known to the authors were included. Computer searches of Dissertation Abstracts International, Sociofile, PsycLII, and ERIC were made to locate additional studies. The keywords and phrases entered were

	(adjustment or effect or effects) and (national or nationally) and
	(sex abuse) or (sexual abuse) or (child and adult and sexual).

Finally, obtained articles were examined for mentions of additional studies.

These strategies yielded 17 studies, of which 7 were included for analysis in the current article. The remaining 10 studies were eliminated

	either because they were unobtainable (3 studies),
	or because they did not satisfy one or more criteria for inclusion in the current article (7 studies).

These criteria were

	(a) the study had to include both male and female respondents,
	(b) the study had to present either summaries of self-reported effects or measures of psychological or sexual adjustment,
	(c) these summaries had to be reported separately for male and female respondents,
	(d) researchers who reported results of measures of adjustment had to have included a control group, and
	(e) the self-reported effects or the adjustment data had to be presented quantitatively.

The inclusion of both male and female respondents and the separate reporting of their results were required so as to be able to address the question of whether the CSA experiences of males and females are equivalent. Most obtained studies were not retained because they contained no outcome data.

Self-reported effects refers to asking respondents whether they thought their experience of CSA had an effect (i.e., impact on psychological or sexual adjustment) upon their lives that could be classified into the two mutually exclusive and exhaustive categories of negative and nonnegative. Nonnegative effects could include perceptions of neutral or positive impact.
Self-reported effects did not include reports of emotional reactions at the time (e.g., fear or pleasure), because such reactions are not necessarily mutually exclusive and do not necessarily indicate that effects occurred.

Finally, quantitative presentation of the self-reported effects or adjustment data was required so as to be able to compare and combine the results across studies via meta-analysis.

Four of the seven included studies came from the United States, and one each came from Great Britain, Canada, and Spain.

Table 1 lists the seven studies along with the populations of inference, methods of gathering data, definitions of CSA, sample sizes reported for males and females, prevalence rates of CSA, and response rates.
In the case of studies that included measures of adjustment

(Bigler, 1992; Boney-McCoy & Finkelhor, 1995; Laumann, Gagnon, Michael, & Michaels, 1994; López, Carpintero, Hernandez, Martin, & Fuertes, 1995; Los Angeles Times Poll, as reported by Finkelhor, Hotaling, Lewis, & Smith, 1989),

sample size refers to the number of respondents who were included in data analysis.
In the case of studies that included only self-reported effects (Badgley et al., 1984; Baker & Duncan, 1985), sample size refers to the number of respondents who agreed to participate.

The sampling frames (i.e., method for selecting a representative sample) in these studies generally consisted of area probability sampling from nearly all households throughout the countries in which the studies were conducted.
Populations of inference generally consisted of adults aged 18 and over in the countries studied, although a number of researchers used restricted age ranges (e.g., 18 to 59 and 30 to 55); one study was based on respondents from 10 to 16 years of age.
Data gathering consisted of

	face-to-face interviews in three studies,
	a mail survey in one study,
	telephone interviews in two others, and
	a self-administered questionnaire in another study.

Two studies with face-to-face interviews also included a self-administered questionnaire.

Definitions of CSA varied from study to study. In all studies but one, child included both children and adolescents -- i.e., young persons aged younger than 19, 18,17, or 16 years, depending on the study. In one study (Laumann et al., 1994), child referred to prepubescent persons.
In all studies but one, events

[Page 242]

classified as CSA included both contact and noncontact sexual experiences.
Laumann et al. (1994) included only sexual experiences involving physical contact. CSA was restricted to age-discrepant sexual experiences in only two studies.
The other five included abusive peer interactions of a sexual nature as well, although the focus was generally on age-discrepant sexual experiences.

Total sample sizes of respondents who actually participated in these studies ranged from 314 to 3,432.
Excluding the low return rate of 33% in the mail survey study, response rates ranged from 72% to 94% (M = 82%, SD = 8%).
Prevalence rates of CSA varied considerably from study to study. For male respondents, these rates ranged from 6% to 36% (M = 17.7%, SD = 11.3%), whereas for female respondents, prevalence rates ranged from 14% to 53% (M = 28.6%, SD = 16.7%).

This variability in prevalence rates is attributable to differing definitions of CSA.

Two studies employed widely inclusive definitions of CSA. In addition to defining older-younger sexual experiences to be CSA, Bigler (1992) defined CSA to include all sexual experiences persons under 18 years had with family members, regardless of their age.

Badgley et al. (1984) included persons over 18 years of age if this was the earliest age at which they had their first unwanted sexual experience.

As will be discussed later, most unwanted sexual experiences first occurred when respondents in this study were under 18, thus qualifying as CSA.
Excluding these two studies, prevalence rates ranged from 6% to 15% for males (M = 11.4%, SD = 3.9%) and from 14% to 28% for females (M = 19.2%, SD = 5.8%).

Measures

Self-reported effects

In three studies researchers collected and reported data on self-reported effects (Badgley et al., 1984; Baker & Duncan, 1985; Laumann et al., 1994), Badgley et al. (1984) asked their Canadian respondents whether they had experienced sex with someone when they "didn't want this." If they answered affirmatively, they were then asked to indicate at what age or ages this occurred and whether they had been physically injured or emotionally or psychologically harmed by the first such incident of this type at the time it occurred.

Baker and Duncan (1985) asked their British respondents with a history of CSA to indicate the effect on them of this experience by choosing one of four options:

	(a) unpleasant and harmful at the time but had no lasting effects,
	(b) permanently damaging with long-term effects,
	(c) had no effect at all, or
	(d) had improved the quality of their life.

Laumann et al. (1994) asked their U.S. respondents with a history of CSA whether this experience had affected their lives since it happened.

Psychological or sexual adjustment

In five studies researchers used various measures to assess psychological or sexual adjustment among their control and CSA respondents

(Bigler, 1992; Boney-McCoy & Finkelhor, 1995; Finkelhor et al., 1989; Laumann et al., 1994; López et al., 1995).

Bigler (1992) measured sexual functioning in his sample of U.S. respondents with two instruments:

	(a) the Sexual Esteem subscale of the Sexuality Scale (Snell & Papini, 1989) and
	(b) the Golombok-Rust Inventory of Sexual Satisfaction (Rust & Golombok, 1986), which measured sexual dysfunction.

Bigler also used the Impact of Event Scale (Horowitz, Wilner, & Alvarez, 1979) to measure the level of trauma associated with CSA.

Boney-McCoy and Finkelhor (1995) created an instrument to measure trauma related to posttraumatic stress disorder (PI'SD), which contained

[Page 243]

items asking their sample of U.S. children and adolescents how often they experienced in the past week each of 10 symptoms. These symptoms were all associated with PT'SD and were modified from the SCL-90-R.
Boney-McCoy and Finkelhor also assessed lifetime depression, occurrence of sadness in the past month, and occurrence of trouble with a teacher in the last year.

In López et al.'s (1995) study conducted on a national Spanish sample, the Self Reporting Questionnaire was used to assess current psychological adjustment.

Finkelhor et al. (1989) reviewed the results of the Los Angeles Times Poll, which was conducted by telephone on a national sample of U.S. residents, asking them to respond to a series of items about CSA.
Finkelhor et al. argued that three items were relevant to respondents' current level of functioning and adjustment:

	extent of marital disruption,
	satisfaction with intimate relationships with the opposite [sic!] sex, and
	being a religious non-practitioner.

Finkelhor et al. argued that this last item was a valid indicator of long-term harm because "Russell ... found that victimized women were more disillusioned with religion than were non-victimized peers" (p. 393).
Despite this argument, we did not retain this last item in our analysis because, unlike marital disruption and sexual satisfaction, religious non-practice is not a face-valid measure of adjustment.
Finkelhor et al. also analyzed respondents' attitudinal responses to questions concerning CSA. They argued that attitudinal differences between sexually abused (SA) and control respondents were also indicative of the long-term negative impact of CSA.

We analyzed the 89 attitude items presented by Finkelhor et al. ( 1989) to determine which items were face-valid indicators of long-term negative impact. Four judges (the two authors and two other sex researchers, all of whom are familiar with the CSA literature and are currently involved in CSA research) judged the validity of each item by answering two questions.

	The first read, "Assuming that sexual abuse has a long-term negative impact, would you expect to find a difference in the proportion of SA vs. control respondents' responses to this item (and in what direction)?" If a judge did expect a difference, then the judge answered
	the second question, which read, "If there is a difference in the expected direction, was it most likely caused by a long-term negative impact from the SA experience, or could some other factor reasonably account for this difference?"

If the judge decided that the difference was most likely caused by the SA experience, then the judge considered the attitude item to be a valid measure of long-term harm. Each judge rated the items independently. The mean pair-wise inter-judge agreement across all items was 74%.
Next, the judges convened to discuss discrepancies on items where their judgments were not unanimous.
For 43 items, unanimous agreements of validity were reached; these items were retained as the final set of valid indicators of harm.

In Laumann et al.'s (1994) study using a U .S. sample, respondents were asked a series of questions relevant to their current psychological and sexual adjustment, as well as to their level of sexual activity.
Both authors independently examined these questions to determine which ones were valid measures of adjustment and which ones measured only sexual activity without implication of adjustment.
We reached 100% agreement on our first evaluation of the items.
We judged 11 items for males and 10 items for females to be valid measures of adjustment -- most of these items assessed sexual difficulties. We judged the remaining seven items to be measures of sexual activity without implication of adjustment problems.

Procedure

For the self-reported effects data, the percentages of males and females with a history of CSA who reported negative effects resulting from their CSA experiences were tabulated.
For each study reporting self-reported effects data, males and females were compared by

	(a) contrasting the proportion of each gender that reported negative effects and
	(b) computing the effect size of this contrast.

The effect size used for these comparisons was Pearson's T. Formulas for calculating T were taken from Rosenthal (1984, 1995). Positive Ts indicated that males reported fewer negative effects, or more neutral or positive effects, than females.

The effect sizes comparing the genders were then meta-analyzed using formulas from Rosenthal (1984) and Shadish and Haddock (1994).

The meta-analysis involved several steps.

First, the effect sizes were combined by

	(a) transforming each T to a Fisher z,
	(b) multiplying each Fisher z by the degrees of freedom associated with its sample (N-3),
	(c) summing these weighted Fisher zs,
	(d) summing the degrees of freedom across all samples,
	(e) dividing the sum of the weighted Fisher zs by the sum of the degrees of freedom, and
	(f) converting the resulting mean Fisher z to a Pearson T.

The resulting T represents the mean weighted effect size and is referred to as the unbiased effect size estimate ( r_u). The unbiased effect size estimate is used to estimate the effect size in the population and is considered to be unbiased because it weights more heavily larger samples whose effect sizes are generally considered to be more precise population estimates.

In addition to combining the effect sizes, they were compared as well.
Comparing effect sizes determines whether they are all of the same general magnitude and in the same direction (i.e., positive or negative). If the set of effect sizes has these commonalities, then the effect sizes are said to be homogeneous, and their combined value can be taken as a valid population estimate.

Comparing a set of effect sizes is achieved by summing the products of each sample's degrees of freedom (N-3) times the square of the difference between a sample's Fisher's z and the mean Fisher z across all samples.
The resulting statistic is

[Page 244]

distributed as Chi-square with k-1 degrees of freedom, where k represents the number of samples.
A non-significant result indicates that the effect sizes are homogeneous.

Next, a 95% confidence interval around the unbiased effect size estimate was computed using the formula presented by Shadish and Haddock (1994).
This interval provides the range of effect sizes that has a probability of .95 of containing the population effect size. This interval is also useful for evaluating the statistical significance of the effect size estimate; if the interval does not contain zero, the effect size estimate is significant.

For the adjustment data, the statistics assessing the difference between the CSA and control groups in terms of psychological or sexual adjustment were converted to effect sizes (Pearson rs). Positive rs indicated that CSA was associated with poorer adjustment, whereas negative rs indicated the reverse relation.

In most studies reporting adjustment data, more than one measure comparing the CSA and control respondents in terms of their adjustment was used.
For studies with multiple measures of adjustment, an effect size r was computed for each measure, and then these rs were averaged using Fisher z transformations to obtain a single mean effect size, termed the study-level effect size.
This practice has been used in other meta-analyses (e.g., Erel & Burman, 1995; Neurnann et al., 1996) and has been reconunended by Rosenthal (1984). Individual and study-level effect sizes were computed separately for males and females.
The study-level effect sizes were then meta-analyzed for males and females separately.

Meta-analysis consisted of combining and comparing effect sizes, as well as computing 95% confidence intervals.