Author: Vaughn, Sharon
Date published: July 1, 2011
The literature on multitiered, research-based reading interventions provides strong evidence for the critical role of early reading instruction and the benefits of early intervention for children who are struggling to learn to read (Denton, Fletcher, Anthony, & Francis, 2006). Although applying effective prevention programs that address early reading difficulties is fundamental to preventing furthet reading difficulties in at-risk children, intervention approaches for students who have already exhibited reading failure have less empirical support (Kami! et al., 2008). In particular, few experimental studies document the effects of multitiered intervention approaches in reading for students in the middle grades. To address this need, we designed a series of studies to determine the effects of these interventions on students with reading disabilities in Grades 6 through 8. This article reports findings from a study in which students who were inadequate responders to a previously provided Tier 1 intervention (enhanced instructional practices in vocabulary and comprehension) and Tier 2 intervention (supplemental daily reading instruction for at-risk students; Vaughn, Cirino, et al., 2010; Vaughn, Wanzak, et al., 2010) received one of two conceptually different tertiary interventions (Tier 3): individualized or standardized treatments. At-risk students randomly assigned to the comparison condition in Year 1 remained in that condition.
BACKGROUND ON READING INSTRUCTION FOR OLDER STUDENTS WITH READING DIFFICULTIES
A significant number of students demonstrate reading difficulties that persist into their middle and high school years. In 2007, the National Assessment of Educational Progress reported that 69% of eighth-grade students were unable to successfully derive meaning from grade-level text. With such a high prevalence of reading problems in the middle grades and an increasing focus on improving high school retention and preparing students for postsecondary learning, adolescent reading instruction has become increasingly important (Kamil et al., 2008).
Older students demonstrate a broad and complex range of difficulties related to reading. These include problems in recognizing words, understanding word meanings, and understanding and connecting with text; students often lack background knowledge required for reading comprehension (Biancarosa & Snow, 2004). We examined several syntheses on interventions for secondary students with reading difficulties to identify effective interventions to meet this range of reading difficulties. Edmonds et al. (2009) conducted a meta-analysis examining the effects of adolescent reading interventions (Grades 6 through 12) that included instruction in decoding, fluency, vocabulary, or comprehension on reading comprehension outcomes. Analyses revealed a mean weighted effect size in the moderate range in favor of treatment students over comparison students. Promising approaches were those that provided targeted reading intervention in comprehension, multiple reading components, or word-recognition strategies.
In a related meta-analytic synthesis, Scammacca et al. (2007) examined single- and multicomponent interventions to determine the effect of various intervention components on readingrelated outcomes, including and in addition to reading comprehension outcomes. Most of the studies reported outcomes using nonstandardized measures, which inflated the overall effect sizes. Of the 11 studies that used only standardized measures, the mean effect size was 0.42, with lower effects for word-study interventions than for comprehension- or vocabulary-focused studies. Using only standardized outcome measures (not available for vocabulary), the impact of moderator variables shifted, with word study and comprehension strategy instruction demonstrating the highest effect sizes. The researchers found higher effect sizes associated with researcherimplemented interventions and middle school participants rather than high school participants.
There are, however, several significant differences between the studies reported in these syntheses and the current study: (a) none of the studies from the syntheses focused on students with demonstrated low response to previous interventions, whereas the study reported here included the lowest responders from a previous year-long intervention; (b) none of the synthesis studies were large-scale (e.g., multiple schools across sites) as was our study - these large-scale, multiple-site studies are consistently associated with lower effects; and (c) many studies in the syntheses used researcher-developed measures, and we used only standardized measures. Moreover, none of diese studies provided a multicomponent, comprehensive approach to remediating reading difficulties for secondary students with significant and persistent reading disabilities.
INDIVIDUALIZED OR STANDARDIZED APPROACHES TO TERTIARY INTERVENTIONS
We were interested in determining the efficacy of two conceptually different but empirically derived treatment approaches to remediating reading disabilities with individuals whose response to a Tier 2 intervention was inadequate. Within a response-to-intervention (RTI) framework, researchers have highlighted two approaches to intervention: a standard protocol and a problemsolving approach (e.g., Fuchs, Mock, Morgan, & Young, 2003). A standard protocol intervention uses research-based instructional programs and is provided in a specified manner to all students with learning difficulties. Typically, a standard protocol includes a well-specified treatment furnished in a step-by-step sequence. Educators consider that standatd protocols are easier for school personnel to implement because they typically (a) have teachers' guides and student materials that give instructional support; (b) furnish clear expectations allowing for ease of implementation and fidelity determination; (c) enable schools to document what educators have taught, thereby guiding decision making and placement in special education; and (d) leverage school resources more efficiently by allowing districts to focus interventions on fewer options.
Rather than implement a problem-solving approach that is grounded in school psychology and involves behavioral problem solving (Bergan, 1977) with limited empirical support for reading interventions, we chose an individualized approach with roots in special education. We derived this approach from a clinical teaching perspective that designs instruction to meet students' instructional needs and documents it through daily instructional monitoring and weekly progress monitoring. Minimal data exist about establishing the effectiveness of individualized intervention approaches (Fuchs et al., 2003), especially those related to adolescent populations (Scammacca et al., 2007). Even at the elementary school level, where research on effective reading interventions is prevalent, information about the effectiveness of individualized intervention approaches is scarce (Wanzek & Vaughn, 2007). Historically, special education provides specialized instruction to students with disabilities to accommodate individual differences (DeStefano & Snauwaett, 1989). Within an individualized approach, educators make instructional decisions and adaptations on the basis of student progress and individual variation (Deno, Fuchs, Marston, & Shin, 2001; Fuchs, Fuchs, & Compton, 2004). Although individualized interventions epitomize clinical approaches to intervention with students with reading disabilities, few empirically based intervention studies use such techniques (Fuchs et al., 2003), particularly for older students.
PREVIOUS STUDIES AND CURRENT RESEARCH QUESTIONS
This article reports findings from the second year of a multiyear middle-school study. During the Year 1 study (2006-2007), in which we randomly assigned students to treatment and comparison conditions, we identified middle school students with reading difficulties and provided a year-long, Tier 2 intervention (more than 100 hours; Vaughn, Cirino, et al., 2010; Vaughn, Wanzek, et al., 2010) to treatment students. All students received the benefits of content-area teachers (e.g., social studies, science) who participated in researcher-provided professional development designed to integrate vocabulary and comprehension practices throughout the school day. Treatment students showed small gains on measures of decoding, fluency, and comprehension over the course of the year (median d = +0. 16).
After the Year 1 treatment, we identified students who failed to attain benchmarks for RTI. For the subsequent school year (2007-2008), students whom we had assigned in Year 1 to treatment and comparison groups remained in those conditions, and treatment students received either individualized or standardized conditions. We hypothesized that students who participated in the individualized intervention would outperform students who participated in the standardized intervention on reading-related outcomes. We also hypothesized that both treatment conditions would yield outcomes that were statistically significantly higher than those for students in the comparison condition. We expected to see benefits for decoding, fluency, and comprehension outcomes. However, we expected that the overall effect of the intervention would be less than that suggested by the Edmonds et al. (2009) and Scammacca et al. (2007) reviews, given that the students participating in this Tier 3 intervention were significantly more impaired, as indicated by their low response to a previous year-long intervention in reading, than students who participated in most studies reported in previous syntheses.
School Sites. The researchers conducted this study with institutional review board approval in two urban cities (one large district; one mediumsized district) in the southwestern United States, with approximately half the sample from each site. School populations ranged from 498 to 1,145 students. Seventh- and eighth-grade students from six middle schools participated in the study.
Criteria for Participation. In 2006-2007 (Year 1), we selected all struggling readers by using the state accountability test results (Texas Assessment of Knowledge and Skills [TAKS] ; Texas Education Agency, 2004) to identify struggling readers with a scale score that approximates the 30th percentile (TAKS below 2150) on other norm-referenced reading comprehension assessments. We also included students exempted from the TAKS because of special education status attributable to very low reading achievement. Students randomly assigned to treatment in Year 1 received an intensive standardized protocol in groups of 10 to 15 students. Students who were adequate responders (i.e., those with TAKS scale scores at or above 2,150) exited the intervention, and we did not include them in this study. We randomly assigned inadequate responders (i.e., students who scored below 2, 1 50) to one of two Tier 3 treatments for a second full year of intervention, either another year of standardized intervention protocol or an individualized intervention. Year 1 comparison students with TAKS scores below benchmark remained in the comparison condition for the current study.
Student Participants. The prospective design of the study required two randomizations: one in Year 1 and another in Year 2. To accommodate the Year 2 randomization, the researchers overrandomized students in Year 1 (i.e., assigned a greater number of qualifying students to the treatment condition) at three treatment students for each one comparison student. The sample for this study included a total of 182 students (86% free and reduced lunch): 42 comparison students (10% White, 31% special education, 29% limited English proficiency [LEP]); 71 individualized students (7% White, 39% special education, 21% LEP); and 69 students in the standardized condition (16% White, 35% special education, 20% LEP).
The researchers randomized students after posttesting in the spring of Year 1 (2007). We included all eligible cases (i.e., N= 182) in the analysis sample. Of the 182 sampled cases, 150 returned in fall 2007: 38 comparison students, 55 in the individualized condition, and 57 in the standardized protocol condition. In spring 2008, there were 36 comparison students, 51 students in the individualized condition, and 46 in the standardized protocol condition. The researchers compared pretest scores for students not continuing through spring 2008 across the three groups. There were no statistically significant differences. More than 90% of the coverage estimates (amount of data present in each cell of the measure by occasion matrix) were at or above .75 across all outcomes and measurement occasions.
Teacher Participants. The researchers hired six female intervention teachers with a median of 8.5 years of teaching experience. They received 60 hr of professional development before teaching. They also participated in biweekly staff development meetings with ongoing on-site feedback and coaching (once every 1-2 weeks).
DESCRIPTION OF INTERVENTIONS
Students in both treatment conditions received interventions during their elective periods, in classes of four to five students for 50 min daily for approximately 160 lessons during their elective periods. To ensure that effects from the treatment could not be attributed to teachers, the researchers trained all teachers systematically on both treatments and then randomized each teacher's classes to standardized or individualized conditions.
Standardized Intervention Protocol. The standardized treatment protocol reflected three phases of intervention. Phase I (about 20 to 25 lessons over 6 to 7 weeks) began with an emphasis on word study and fluency, with additional instruction on vocabulary and comprehension.
* The intervention supported fluency through daily repeated reading practice in a partnerreading format that paired skilled readers with less skilled readers.
* The teachers tracked student progress through regularly administered assessments of oral reading fluency.
* The researchers addressed word study by using REWARDS (Archer, Gleason, & Vachon, 2003), a program designed to teach advanced strategies for decoding multisyllabic words. Students received daily instruction and practice with individual letter sounds, letter combinations, and affixes and learned a segmentation strategy for decoding and spelling multisyllabic words.
* The protocol addressed vocabulary daily by teaching the meanings of words from text being read. Vocabulary instruction included providing student-friendly definitions along with examples and nonexamples of the proper use of new words.
* The teachers taught text comprehension by asking students to answer questions of varying levels of difficulty (literal and inferential) while reading a passage and after they had finished reading it to check for understanding and model active thinking during reading. Students learned to use text as a resource for answering questions and justifying their responses.
Phase II (about seventeen to eighteen weeks) focused on vocabulary and comprehension, with additional instruction and practice on the word study and fluency skills and strategies from Phase I. After reading new vocabulary words, students engaged in practice activities, including identifying the appropriate word to match various examples or descriptions. In addition, the teachers introduced students to word relatives and parts of speech (e.g., preserve, preservation, preservable). Finally, students reviewed applying word study to spelling words. The researchers selected vocabulary words for instruction from the text used in the fluency and comprehension component.
Three days a week, teachers addressed fluency and comprehension using REWARDS Plus Science text and lesson materials (Archer et al., 2003). Teachers taught vocabulary related to the text reading and dictated spelling words, and then students previewed the passage. Teachers next guided the students in reading the passage, asking questions to check for understanding and to model active thinking during reading. While students read, they completed a graphic organizer that summarized key information. Students engaged in repeated reading activities to increase fluency and answered questions about the passage content, providing the reasons for their choices. Students also engaged in writing content summaries. Two days a week, teachers used novels with researcher-developed lessons that reflected the use of strategies learned in REWARDS Plus.
Phase III occurred over approximately eight to ten weeks. In Phase III, students continued the instructional emphasis on vocabulary and comprehension, with application of skills and strategies in expository texts. Teachers taught comprehension and critical thinking at the sentence, paragraph, and multiparagraph levels.
Individualized Intervention Protocol. Students in the individualized intervention received instruction tailored to meet their individual needs, with students' test scores dictating the reading components that made up each lesson and the amount of time allocated to the different reading components; advancement and pacing of lessons was based on individual student mastery rather than group mastery. Educators used assessment data to develop students' profiles in phonics, word reading, fluency, vocabulary, and comprehension. For example, we used diagnostic data from the Woodcock-Johnson III subtests (WJ-Hh Woodcock, McGrew, & Mather, 2001) to identify students with a standard score of 95 or above on the Word Attack subtest and focused their instruction on upper-level multisyllabic word reading, as well as vocabulary and comprehension strategies. Students who scored below a 95 standard score received a more intensive focus on word-study instruction, as well as vocabulary and comprehension strategy instruction. Teachets documented these relative emphases for each student within their group and then used weekly progress monitoring to adjust the emphasis. Overall, teachers followed a similar scope and sequence of research-based comprehension strategy instruction (e.g., strategies for finding main ideas and summarizing text) for all students in the individualized condition but had access to a variety of instructional materials and could modify pacing and materials in response to students' needs. Teachers used a variety of narrative and expository texts to teach; and they scaffolded use of the sttategies before, during, and after reading. Wordstudy instruction was also flexible; but teachers primarily used an explicit, intensive multisensory word-study program (Wilson Reading System, 1996) that targeted both reading and spelling. Teachers progressed through the program in a flexible manner, varying pacing and lesson implementation according to students' needs. In addition, a motivation component built into the daily individualized lessons included weekly expectations for purposeful and motivational text selection, student and teacher goal setting, evaluation conferences, and positive telephone calls home.
Individualized instruction included specified guidelines for use of instructional time. Students with higher-level decoding skills received 35 to 45 min of instruction in vocabulary/morphology, 170 to 180 min in comprehension/text reading, and 1 5 to 25 min of the motivational component during a 5-day week. Students with below-average word reading received 100 to 110 min of word study/text reading instruction, 35 to 45 min of vocabulary/morphology instruction, 70 to 80 min of comprehension/text reading instruction, and 15 to 25 min of the motivational component. Teachers made decisions to modify instruction by relying on biweekly curriculum-based measures (CBMs). Teachers developed CBMs on the basis of instructed objectives to determine skill mastery and guide instruction, whereas they based more formal decisions regarding student progress on standardized monthly progress-monitoring checks.
The researchers conceptualized fidelity as the difference between the intended (or normative) program model and the implemented model (Chen, 1990). To document teachers' adherence to program elements and quality of implementation, trained observers collected fidelity data for each condition four to five times a year for each teacher. Teachers following specified program elements/activities (i.e., fluency, vocabulary instruction, oral blending activities) within specified time limits represented the normative model for the standardized condition. The researchers collected fidelity information by using a 3-point Likert-type rating scale ranging from 1 (low) to 3 (high) to assess the extent to which the teacher completed each required instructional program element/activity and the overall quality of implementation, which included the active engagement of the students during each instructional program element/activity. The researchers conceptualized the individualized program model to respond to the "individualized" intent of the treatment. Teachers taught particular reading components (e.g., word study/text reading; vocabulary/ morphology) for a set time on a weekly basis according to student needs. The researchers collected fidelity information by using the same 3-point Likert-type rating scale used in the standardized condition to assess compliance with each required reading component. The researchers also collected overall quality of implementation data for each specified reading component.
Individual teachers' mean implementation scores for the individualized intervention ranged from 2.13 to 3.0, with a group average of 2.61 on a 4-point scale (0 to 3). Mean quality scores for the individualized intervention ranged from 2.13 to 2.93, with a group average of 2.53. Teachers' mean total fidelity rankings, which included implementation and quality ratings, ranged from 2.13 to 2.98 for the individualized intervention, with an average of 2.58. Teachers' mean implementation scores for the standardized intervention ranged from 2.44 to 3.0, with an average of 2.72 on a 4-point scale (0 to 3). Mean quality scores for the standardized intervention ranged from 2.19 to 2.88, with an overall average of 2.55. The mean total fidelity ranking, including implementation and quality ratings, ranged from 2.31 to 2.90 for the standardized intervention, with a group average of 2.66 (to obtain copies of both the standardized and individualized fidelity protocols, see http://www.texasldcenter.org/ research/project3.asp).
For further descriptions and more reliability and validity data on these measures, see http:// www.texasldcenter.org. The researchers obtained all measures at pretest and posttest in Years 1 and 2 unless otherwise indicated.
Decoding and Spelling. We assessed wordreading accuracy for real words and pseudowords with the Letter- Word Identification and Word Attack subtests of the WJ-III Tests of Achievement (Woodcock et al., 2001). Educators administered the WJ-III Spelling subtest at posttest. Coefficient alphas based on a sample from the previous year of 327 struggling readers and 249 typical readers who contributed data throughout the year for Letter-Word Identification and Word Attack ranged from .93 to .97; coefficient alpha for Spelling at posttest was .84.
Fluency. The Sight Word Efficiency and Phonemic Decoding Efficiency subtests from the Test of Word Reading Efficiency (TOWRE; Torgesen et al., 1999) assessed word list fluency for real words and pseudowords. Internal consistency for different forms of this well-standardized test exceeds .90.
Comprehension. The Texas Assessment of Knowledge and Skills (TAKS; Texas Education Agency, 2004), a criterion-referenced reading comprehension test, is the Texas accountability test. The TAKS is not timed and uses different assessments for each grade; these criterion-referenced assessments align with grade-based standards from Texas Essential Knowledge and Skills (TEKS). The internal consistency (coefficient alpha) of the Grade 7 test is .89 (Texas Education Agency, 2004). We used it as an initial screening assessment and then as a benchmark assessment because it is reliable, represents an accountability high-stakes assessment implemented in all states, and has good construct validity as a measure of reading comprehension. In preliminary latent-variable analyses of the students in Grades 6 to 8, the TAKS measure loaded strongly on the WJ-III Passage Comprehension subtest and the Group Reading Assessment and Diagnostic Evaluation (GRADE; Williams, 2001). The WJ-III Passage Comprehension subtest is a cloze-based assessment in which students read a passage and fill in a missing word. Coefficient alphas in the entire sample of 327 struggling readers and 249 typicals were .94 at pretest and .85 at posttest.
Latent variable growth modeling (LGM) as a type of structural model has advantages over observed-score approaches. It provides more precise score estimates by explicitly estimating measurement error. LGM generates indexes of overall model fit, making possible the evaluation of a model's adequacy and the comparison of competing models. It handles missing data by using a direct maximum likelihood (ML) estimator to compute a likelihood function for each case using all available data; it is more efficient than listwise deletion or imputation of missing values and yields more precise estimates and greater power. Also, because LGM analyzes covariance structures representing different levels of aggregation, it is more appropriate than traditional approaches when data are nested, whether by design (i.e., stratified sampling strategy); circumstance (e.g., students in schools); or in the case of growth models, the nesting of time within students. Finally, LGM provides a flexible framework for analyzing the differential effects of covariates.
Modeling linear growth requires at least three data points; we used spring 2007, fall 2007, and spring 2008 data (the relatively brief timeframe and the results of preliminary analyses suggested a linear model). This timeframe encompassed the summer months, suggesting the possibility of learning loss from Time 1 to Time 2 and casting doubt on the usefulness of fitting the raw data to a linear model. Accordingly, we used standard scores, thereby making the linear model easier to fit and minimizing the confounding effects of summer learning loss. We modeled intercept as the end point of the trend (i.e., Time 3 or spring 2008) to accommodate the comparison of posttest differences. We modeled slope as a fixed effect for purposes of parsimony and based it on the statistically nonsignificant slope variance within groups. "Expected" growth in this model has slope of 0, given the use of standard scores. In addition, nonzero slopes in Table 1 are in a counterintuitive direction, because the slope estimate represents movement from right to left (i.e., from Time 3 to Time 1). In this way, the researchers can evaluate both final performance and progress since the end of the previous intervention (i.e., Year 1 intervention). Intercept represents performance at the end of Year 2, whereas slope indicates the rate of change over time. We did not specifically model school-level clustering effects because of difficulties with model identification (there were more parameters than clusters). Previous work using data from these students on these measures (Vaughn, Cirino, et al., 2010; Vaughn, Wanzek, et al., 2010) found minimal clustering effects at the teacher/interventionist level, and inclusion of such effects did not substantively alter results.
The researchers used multigroup modeling with nested comparisons to evaluate the statistical significance of slope and intercept estimates (Bovaird, 2007; Mehta & Neale, 2005). The difference test involved constraining the groups as equal on parameters of interest and comparing the fit of the constrained and the fully specified models. If groups were comparable on spring 2008 performance, for instance, the fit for the constrained and full models did not significandy differ. Constraints resulting in less adequate fit suggested significant group differences. We conceptualized the main effects of treatment as differences between the full treatment group and the comparison group on mean intercept and slope. The comparison group is generally lower in spring 2007 (i.e., Time 1) than the two treatment groups, suggesting a possible confounding of spring 2008 differences with spring 2007 differences. We interpreted the absence of statistically significant Time 3 (i.e., intercept) differences as evidence of no effect. For the same reason, in cases of significant Time 3 differences, we also contrasted the groups' slope estimates as a followup to determine whether the rate of progress differed regardless of Time 1 status. We interpreted a statistically significant slope difference as evidence of a main treatment effect. We also considered differences between the two treatment conditions and between each treatment condition and the comparison as a test of the moderating effect of intervention type. We incorporated other moderators (special education studies, LEP) into the models, as well.
Table 2 presents descriptive summaries. There were no statistically significant group differences in mean scores at pretest. Across the 18 comparisons (i.e., three comparisons each for six measures), the mean standardized difference (using the absolute value for each effect) was .07 and the average 95% confidence interval for these effects was about ±2.5 standard score points. Results for the multigroup unconditional model, as previously described, are in Table 1 . The intercept in Table 1 is the model-derived score for performance in spring 2008. Variance is not indicated for slope because we estimated it as a fixed effect. We calculated effect sizes as the difference in estimated means (i.e., model-derived intercepts) divided by the weighted, pooled standard deviation for each estimate and adjusted for small sample size using Hedges's g formula. Progress on reading comprehension for the combined treatment group was statistically significandy greater than progress in the comparison group.
[TABLE 1 OMITTED]
[TABLE 2 OMITTED]
Decoding and Spelling. The multigroup model (i.e., diree groups) for WJ-III Letter Word Identification fit the data well (χ^sup 2^ = 10.33 , p = .324; comparative fit index [CFI] = .99, Tucker-Lewis index [TLI] = .99; root mean square error of approximation [RMSEA] = .05). The score estimates for spring 2008 in the two treatment groups were 89.77 and 91.09 for the individualized and standardized protocol groups, respectively, and 90.48 for the treatment groups combined. These values represent standard scores. The comparison group average was 86.34. The difference between the total treatment group (i.e., individualized and standardized protocol combined) and the comparison was not statistically significant (Δχ^sup 2^ = 3.29, df= 1, p > .05), where Δχ^sup 2^ represents the difference in χ^sup 2^ between the constrained and fully specified multigroup models. The performance of the individualized group did not differ significantly from the comparison (Δχ^sup 2^ = 1.68, df= 1,p > .05), nor did the average performance of participants in the standardized protocol (Δχ^sup 2^ = 3.71, df= 1,p > .05). Differences between the individualized and standardized protocol groups were not statistically significant (Δχ^sup 2^ = .40, df= 1, p > .05). The effect sizes on WJ Letter Word Identification were .28 and .44 for the individualized protocol and the standardized protocol, respectively. Readers should consider these effect estimates in light of the previously discussed Time 1 differences in comparison and treatment groups. The effect size for the difference between the individualized and standardized groups was -.11, favoring the standardized group (i.e., the standardized protocol group outscored the individualized group by about . 1 0 standard deviations).
The model for WJ-III Word Attack also fit the data well (χ^sup 2^ = 4.95 , p = .838; CFI = .99, TLI = .99; RMSEA = .001). The estimated Time 3 standard score for the comparison group was 90.25, somewhat lower than the average score for the individualized group (γ^sub ij^ = 91.78) and for the standardized protocol (γ^sub ij^ = 94.64), although not statistically significantly so (Δχ^sup 2^ = .42, df = 1, p > .05 for individualized and Δχ^sup 2^ = 3.78, df= 1, p > .05 for standardized protocol). The effect sizes were .14 and .45 for the individualized and standardized protocol conditions, respectively. There were no significant differences between the individualized and standardized protocols (Δχ^sup 2^ = 1.93, df= 1, p >.05; Hedges's g unbiased = -.27).
Like the models for Word Attack and Letter Word Identification, the fit for WJ-III Spelling was excellent (χ^sup 2^ = 6.36 , p = .703; CFI = .99, TLI = .99; RMSEA = .001). The estimated average scores for the two treatment groups (γ^sub ij^ = 88.11 and γ^sub ij^ = 88.86 for individualized and standardized protocol) were higher than the comparison group estimate (γ^sub ij^ = 84.07), although the differences were not statistically significant (Δχ^sup 2^ = 1.74, df= 1, p > .05 and Δχ^sup 2^ = 2.42, df=1, p > .05). The effect size for the individualized condition was .31. For the standardized protocol, the standardized mean difference was .36. The standardized difference size between the standardized protocol and the individualized group was -.06, again favoring the standardized group.
Fluency. Fit was marginal for TOWRE-Sight Word Efficiency (χ^sup 2^ = 21.62 , p = .010; CFI = .97, TLI = .97; RMSEA = .154, confidence interval = .071 - .238), given the high RMSEA value. Estimated Time 3 scores were 90.74 for the individualized group, 90.57 for the standardized protocol, and 87.91 for the comparison condition. Group differences were not statistically significant (Δχ^sup 2^ = 1.32, df= 1, p > .05 and Δχ^sup 2^ = 1.28, df= 1, p > .05 for the individualized and standardized protocols, respectively); and effect sizes for individualized (.24) and standardized protocol (.27) were in the small to moderate range. The model for TOWRE - Phonemic Decoding represented a good fit with the data (Δχ^sup 2^ = 11.48 , p = .245; CFI = .99, TLI = .99; RMSEA = .068). The treatment conditions scoted comparably at posttest (γ^sub ij^ = 91.91 and γ^sub ij^ = 91.70; Hedges's g unbiased = .01). Both outscored the comparison (87.81), on average, although the difference was not statistically significant (Δχ^sup 2^ = 1.38, df = 1, p > .05 and Δχ^sup 2^ = 1.26, df- 1, p > .05). The effect sizes were .26 and .27 for the individualized and standardized protocols, respectively.
Comprehension. The model for the WJ-III Passage Comprehension measure fit the data well (χ^sup 2^ = 13.67 , p = .135; CFI = .99, TLI = .99; RMSEA = .094, .90 confidence interval = .00 -.19), although the RMSEA value was higher than is typically desirable. The treatment groups collectively (i.e., a two-group comparison of treatment and comparison groups) outperformed the comparison in spring 2008 (Δχ^sup 2^ = 6.50, df= 1, p < .01). The intercept estimate for the individualized group was 84.84. For the standardized protocol condition, it was 85.10. The comparison group average intercept was 79.20. Given the statistically significant differences in intercept, we also contrasted slope estimates. Slope for the combined treatment group was -.51 (.00 in the standardized group and -.97 for the individualized). For the comparison, slope was .45. This difference was also statistically significant (Δχ^sup 2^^sub slope^ = 3.68, df= 1, p < .05). The contrast of the individualized slope and standardized slope, although comparatively large (.00 versus -.97), was not statistically significant (Δχ^sup 2^ = 5.91, df= 3, p > .05). Effect sizes were moderately sized, - .52 for the individualized group and .56 for the standardized protocol condition. The standardized difference between the standardized protocol and individualized conditions was -.03.
We estimated variation in treatment effects across special education status and language minority status by regressing the variables of interest (e.g., special education and primary language status) on the group-specific intercepts derived from the earlier fit models (i.e., Table 1). The values in Table 3 are unstandardized regression coefficients for each treatment condition. They represent differences in standard score points for different levels of the variable in question. Table 3 also summarizes differences related to special education and primary language across the three treatment conditions (i.e., moderated effects).
Special Education and Limited English Proficiency. On WJ-III Letter Word Identification, students in special education who were in the comparison condition performed more poorly than their counterparts who were not in special education by almost 14 scale score points (γ^sub a^ = -13.57, p < .001, where γ^sub α^ represents the coefficient for a dummy-coded variable representing special education status and the associated p-value reflects the extent to which it differs statistically from 0). In the individualized group, the difference between special education students and students who did not participate in special education was approximately 17 points (γ^sub α^ = -16.72, p < .001). Special education students fared less well than their peers who did not participate in special education in the individualized and comparison conditions. By contrast, special education-related differences within the standardized protocol were not statistically significant (γ^sub α^ = -7.99, p = .158), suggesting that the two groups were comparable in spring 2008. Across conditions (i.e., the moderating effect of special education status), special education students in the standardized protocol were at less of a disadvantage than special education participants in the individualized group (Δχ^sup 2^ = 5.72, p < .05). In other words, the special education-related disadvantage in the individualized condition was greater than the disadvantage for students in the standardized protocol. There were no statistically significant differences between the comparison group and either of the treatment conditions.
On WJ-III Word Attack, special education students lagged significantly behind their peers who were not in special education in all three conditions (γ^sub α^ = -11.90, p < .001; γ^sub α^ = -8.76, p < .001; γ^sub α^ = -10.99, p < .001 for individualized, standardized protocol, and comparison conditions, respectively), although these values did not significantly differ across the groups.
On WJ Passage Comprehension, special education students in the individualized condition were significandy less successful than participants who were not in special education in the same condition (γ^sub α^ = -7.97, p = .002). Differences in the standardized protocol and the comparison conditions did not differ statistically from 0. Across conditions, the difference between individualized participants and the comparison group was statistically significant (Δχ^sup 2^ = 3.95, p < .05). Patterns on WJ-III Spelling, TOWRE - Sight Word, and TOWRE - Phonemic Decoding were generally similar. Students in special education performed less well than students who were not in special education in the same condition, and no statistically significant differences occurred across conditions in the effect of special education on the spring 2008 performance.
[TABLE 3 OMITTED]
There were no statistically significant differences across treatment conditions in the effect of LEP status on intercept. On the TOWRE-Phonemic Decoding, LEP students outperformed non-LEP students in the standardized protocol (γ^sub α^ = 11.79, p = .004) and in the individualized (γ^sub α^ = 11.46, p = .016) treatment conditions. Within groups, performance on WJ-III Passage Comprehension was significantly lower for LEP students than for non-LEP students in the comparison (γ^sub α^ = -9.57, p = .014) and in the standardized protocol condition (γ^sub α^ = -7.28, p = .016).
We report findings from a study in which we randomly assigned students who did not meet exit criteria from a previously provided Tier 1 (enhanced instructional practices in vocabulary and comprehension for all content-area teachers) and Tier 2 intervention (supplemental daily reading instruction for at-risk students; Vaughn, Cirino, et al., 2010; Vaughn, Wexler, et al., 2010) to one of two conceptually different interventions: standardized or individualized treatments. Students whom the researchers had randomly assigned to the comparison condition (Tier 1 only) and were low responders in Year 1 remained in the comparison condition for Year 2.
STANDARDIZED VERSUS INDIVIDUALIZED TREATMENTS
The primary question addressed is the efficacy of two tertiary interventions for students with reading disabilities. Findings did not confirm our research hypothesis that students in the individualized treatment would outperform students in the standardized treatment. These findings aligned with previous research with beginning readers who had reading difficulties in which researchers compared a more standardized treatment with a more responsive or individualized treatment and found no statistically significant differences between the two treatments (Mathes et al., 2005). We do not believe that this single study provides convincing data that standardized approaches might be at least as effective as more individualized approaches for secondary students with intensive reading difficulties, but we do think that it provides compelling data to consider when designing interventions for older students with reading disabilities. Educatots should also consider the findings in light of the significant level of training, supervision, and feedback provided to the teachers in this study, because it might not represent the level or quality of training typically provided to teachers.
When considering the findings for the treatments combined (standardized and individualized combined), we found statistically significant differences for reading comprehension but not for tasks involving word reading, word attack, or fluency. We consider the impact on reading comprehension meaningful in light of the challenge to successfully influence the reading comprehension of students with significant reading problems.
SPECIAL EDUCATION STATUS AND ENGLISH LANGUAGE PROFICIENCY
We had hypothesized that students with identified disabilities might perform significantly better in the individualized condition than in the standardized condition; however, the results did not confirm our hypothesis. Overall, students identified with disabilities (i.e., served by special education) were at more of a disadvantage (i.e., had pooter outcomes) in the individualized condition than in the standardized condition. This finding was valid for word attack and for reading comprehension. Additionally, the word attack and reading comprehension outcomes for students with disabilities were significantly lower than outcomes for their peers who did not have identified disabilities. We found this outcome for all three conditions (standardized, individualized, and comparison).
We also anticipated that students with LEP might perform better in the individualized condition because there was a greater focus on responding to students' individual learning needs and increasing students' academic motivation through building home-school connections, goal setting, and choice. Neither treatment condition was significantly more effective for students who were identified as having LEP. Students with the LEP designation demonstrated overall lower scores on reading comprehension than non-LEP students. However, on phonemic-decoding, LEP students outperformed non-LEP students.
THE IMPACT OF READING INTERVENTION ON OLDER STUDENTS' COMPREHENSION
The findings from several similar studies (Corrin, Somers, Kemple, Nelson, & Sepanik, 2008; Kemple et al., 2008; Lang et al., 2009), as well as our previous studies (Vaughn, Cirino, et al., 2010; Vaughn, Wanzek, et al., 2010), indicate that a 1 -year-long intervention will adequately meet the needs of relatively few struggling middle school readers and that most students, particularly those with significant reading problems, will require more intensive interventions that last for longer than 1 year. Considering these findings, we believe that the progress of the students in our treatment conditions in reading comprehension is noteworthy. These students made significant gains in reading comprehension, as evidenced by a moderately high impact (ES = .56); and we found an association between these gains and acceleration in their standard score performance of about one third of a standard deviation, suggesting that not only were students improving in their overall outcomes in reading comprehension but that they were also closing the gap between their current reading performance and grade-level expectations. We do not believe that students in middle grades with significant reading problems are likely to make rapid and readily remediated progress in reading. Many of these students with low comprehension also demonstrate low vocabulary and limited background knowledge, which is associated with low reading comprehension (Cromley & Azevedo, 2007); and compensating for these challenges is not going to happen in a 50-min-long daily intervention.
LIMITATIONS AND FUTURE DIRECTIONS
The context for this study is within relatively large urban settings in which most of the students are from low-income homes. Findings for students in other settings in which more home resources mitigate the daily challenges of schooling may be different. The findings might also have differed if the researchers had specifically selected students with defined reading disabilities for the study or if the study had used a benchmark other than performance on a state reading comprehension test. However, our sample included many identified for special education, and the benchmark is valid.
The cost of this intervention is an issue to consider. In this intervention, well-trained, experienced teachers taught students in groups of five students. This group size is cosdy when compared with typical class instruction, in which there may be one teacher for every 20 to 25 students. We believe that in addition to improved student outcomes in reading (realized by students in the treatment conditions), determining whether treated students are also more likely to remain in school and graduate would be valuable. The impact from this study on reading comprehension for treatment students yields an effect size that is larger than that in many studies of struggling adolescent readers who received a year of intervention, even though we selected the group because of evidence of intractability to a previous year of intervention, representing a group with severe reading problems. The study points to the need to seriously consider the intensity needed to remediate reading difficulties in middle school. Even with 2 years of intervention, most students do not evidence grade-level reading for understanding and will require further intervention.
Archer, A. L., Gleason, M. M., & Vachon, V. L. (2003). Decoding and fluency: Foundation skills for struggling older readers. Learning Disabilities Quarterly, 26, 89-101.
Bergan, J. R. (1977). Behavioral consultation. Columbus, OH: Merrill.
Biancarosa, G., & Snow, C. E. (2004). Reading next-A vision for action and research in middle and high school literacy: A report to Carnegie Corporation of New York. Washington, DC: Alliance for Excellent Education.
Bovaird, J. A. (2007). Multilevel structural equation models for contextual factors. In T. D. Little, J. A. Bovaird, & N. A. Card (Eds.), Modeling contextual effects in longitudinal studies (pp. 149-182). Mahwah, NJ: Lawrence Erlbaum Associates.
Chen, H.-T. (1990). Theory-driven evaluations. Newbury Park, CA: Sage.
Corrin, W., Somers, M.-A., Kemple, J., Nelson, E., & Sepanik, S. (2008). The enhanced reading opportunities study: Findings from the second year of implementation (NCEE 2009-4036). Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education.
Cromley, J. G., & Azevedo, R. (2007). Testing and refining the direct and inferential mediation model of reading comprehension. Journal of Educational Psychology, 99, 311-325. doi: 10.1037/0022-06188.8.131.521
Deno, S. L., Fuchs, L. S., Marston, D., & Shin, J. (2001). Using curriculum-based measurement to establish growth standards for children with learning disabilities. School Psychology Review, 30, 507-524.
Denton, C. A., Fletcher, J. M., Anthony, J. L, & Francis, D. J. (2006). An evaluation of intensive intervention for students with persistent reading difficulties. Journal of Learning Disabilities, 35, 447-466.
DeStefano, L., & Snauwaert, D. (1989). A value-critical approach to transition policy analysis (No. 300-85-0160). Washington, DC: U.S. Department of Education.
Edmonds, M. S., Vaughn, S., Wexler, J., Reutebuch, C. K, Cable, A, Tackett, K K, & Schnakenberg, J. W (2009). A synthesis of reading interventions and effects on reading comprehension outcomes for older struggling readers. Review of Educational Research, 79, 262-300.
Fuchs, D., Fuchs, L. S., & Compton, D. L. (2004). Identifying reading disabilities by responsiveness-to-instruction: Specifying measures and criteria. Learning Disabilities Quarterly, 27, 216-227.
Fuchs, D., Mock, D., Morgan, P. L., & Young, C. L. (2003). Responsiveness-to-intervention: Definitions, evidence, and implications for the learning disabilities construct. Learning Duabilities Research & Practice, 18, 157-171.
Kamil, M. L., Borman, G. D., Dole, J., Krai, C. C., Salinger, T., & Torgesen, J. (2008). Improving adolescent literacy: Effective classroom and intervention practices: A practice guide (NCEE 2008-4027). Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education. Retrieved from http:// ies.ed.gov/ncee/wwc
Kemple, J., Corrin, W, Nelson, E., Salinger, T, Herrmann, S., & Drummond, K. (2008). The enhanced reading opportunities study: Early impact and implementation findings (NCEE 2008-4015). Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education.
Lang, L., Torgesen, J., Vogel, W, Carol, C, Lefsky, E., & Petscher, Y. (2009). Exploring the relative effectiveness of reading interventions for high school students. Journal of Research on Educational Effectiveness, 2, 149-175. doi: 10.1080/19345740802641535
Mathes, P. G, Denton, C. A, Fletcher, J. M., Anthony, J. L., Francis, D. J., & Schatschneider, C. (2005). The effects of theoretically different instruction and student characteristics on the skills of struggling readers. Reading Research Quarterly, 40, 148-182.
Mehta, P. D., & Neale, M. C. (2005). People are variables too: Multilevel structural equations modeling. Psychological Methods, 10, 259-284.
National Assessment of Educational Progress. (2007). The nation's report card: Trial urban district assessment reading 2007 (NCES 2008-455). Washington, DC: National Center for Education Statistics, Institute of Education Sciences, U.S. Department of Education.
Scammacca, N., Roberts, G., Vaughn, S., Edmonds, M., Wexler, J., Reutebuch, C. K, & Torgesen, J. K. (2007). Intervention for adolescent struggling readers: A meta-analysis with implication for practice. Portsmouth, NH: RMC Research Corporation, Center on Instruction.
Texas Education Agency. (2004). Appendix 10 - Technical digest 2004-2005. Retrieved from http://www. tea.state.tx.us/index3.aspx?id=4397&menu_id=793
Torgesen, J. K, Wagner, R K, Rashotte, C. A, Lindamood, P., Rose, E., Conway, T, & Garvan, C. (1999). Preventing reading failure in young children with phonological processing disabilities: Group and individual responses to instruction. Journal of Educational Psychology, 91, 579-593.
Vaughn, S., Cirino, P. T., Wanzek, J., Wexler, J., Fletcher, J. M., Denton, C. A ..... Francis, D. J. (2010). Response to intervention for middle school students with reading difficulties: Effects of a primary and secondary intervention. School Psychology Review, 39, 3-21.
Vaughn, S., Wanzek, J., Wexler, J., Barth, A., Cirino, P. T., Fletcher, J ..... Francis, D. J. (2010). The relative effects of group size on reading progress of older students with reading difficulties. Reading and Writing: An Interdisciplinary Journal, 23, 931-956. doi: 10.1007/ s11145-009-9 183-9
Wanzek, J., & Vaughn, S. (2007). Research-based implications from extensive early reading interventions. School Psychology Review, 36, 541-561.
Williams, J. P. (2001). Commentary: Four meta-analyses and some general observations. Elementary School Journal, 101, 349-354.
Wilson Language Training Corporation. (1996). Wihon Reading System. Willbury, MA: Author.
Woodcock, R. W., McGrew, K., & Mather, N. (2001). Woodcock-Johnson III tests of achievement. Itasca, IL: Riverside.
The University of Texas at Austin
AMY A. BARTH
PAUL T. CIRINO
MELISSA A. ROMAIN
University of Houston
CAROLYN A. DENTON
University of Texas Health Science Center at Houston
ABOUT THE AUTHORS
SHARON VAUGHN (Texas CEC), H. E. Hartfelder/Southland Corp. Regents Chair; Executive Director, The Meadows Center for Preventing Educational Risk, and Professor, Department of Special Education; JADE WEXLER (Texas CEC), Research Assistant Professor, Department of Special Education and Dropout Institute Director, The Meadows Center for Preventing Educational Risk; and GREG ROBERTS (Texas CEC), Director, Vaughn Gross Center for Reading and Language Arts, and Associate Director, The Meadows Center for Preventing Educational Risk, The University of Texas at Austin, AMY A. BARTH (Texas CEC), Research Assistant Professor, Texas Institute for Measurement, Evaluation, and Statistics; PAUL T. CIRINO, Research Associate Professor, Department of Psychology; MELISSA A. ROMAIN (Texas CEC), Research Professor, Department of Psychology; DAVID FRANCIS, Hugh Roy and Lillie Cranz Cullen Distinguished Professor, Texas Institute for Measurement, Evaluation, and Statistics; and JACK FLETCHER (Texas CEC), Distinguished University Professor, Department of Psychology, University of Houston, Texas. CAROLYN A. DENTON (Texas CEC), Associate Professor, Department of Pediatrics, Center for Academic and Reading Skills, University of Texas Health Science Center at Houston, Texas.
Correspondence concerning this article should be addressed to Sharon Vaughn, Department of Special Education, The University of Texas at Austin, 1 University Station, D4900, Austin, TX 78712-0365 (e-mail: SRVaughnUM@aol.com).
This research was supported by grant P50 HD052117 from the Eunice Kennedy Shriver National Institute of Child Health and Human Development and from The Greater Texas Foundation. The content is solely the responsibility of the authors and does not necessarily represent the official views of the Eunice Kennedy Shriver National Institute of Child Health and Human Development or the National Institutes of Health.
Manuscript received March 2010; accepted 2010.