Prospective Students Newly Admitted Students Current Students International Students Faculty & Staff Alumni Home Graduate Life Center Calendar Contact Us Site Index Virginia Tech Home Page Graduate School Home Page
 







 

 

TOEFL Score Interpretation

D.W. McKeon, Director, ESL & GTA Training Programs

Executive Summary
I. Introduction
II. Overall Scores and Subscores
III. Other Factors Affecting TOEFL Scores
IV. The Old GRE Verbal Section and the TOEFL
V. The New GRE Analytical Writing Section and the TOEFL


Executive Summary

The Test of English as a Foreign Language, TOEFL, developed in the early 1960’s, has become the main instrument of measuring English language proficiency for international students around the world. A total score (range of 310-677) was determined based on the scores of three subsections: Listening, Structure, and Reading; (note: an optional essay was administered at every other international testing date, the Test of Written English, TWE, with a score range of 0 to 6).

In the late 1990’s, the TOEFL Board produced a computer-based test (CBT; range: 40 to 300. Other than the supposed ease of electronic test-taking, the CBT TOEFL made a few notable improvements on the paper-based test (PBT); to mention two: (1) it includes linear testing (from easy to difficult) and computer-adaptive testing (tailoring to the proficiency level of the individual test-taker); and (2) it includes a compulsory essay, whose score is factored into the Structure section and is reported in a separate Essay Section, equivalent to the former TWE. A Concordance Table was published in 1998, providing equivalences between the familiar PBT scores and the new CBT ones. Since the CBT is not administered at every site where the PBT was, the PBT continues to be used in various parts of the world.

Graduate School admissions committees have generally used a 550 (=213 CBT) cutoff score, but often without consideration of the total profile of subscores. In light of their near magical aura to the aggregate TOEFL scores, below are some recommendations on the use of TOEFL (either form) for admissions committees:

  • Regardless of the total score, subscores must be taken into consideration
  • Listening subscores should not be below 50/16, regardless of the total, aggregate score
  • Essay Writing subscores should not be below 4.0 (same rating as TWE scores)
  • Since neither PBT nor CBT TOEFL versions measure one’s speaking ability, admissions committees considering offering teaching assistants with teaching duties should not use TOEFL scores alone in such decisions (certainly Listening scores should be a factor)
  • The number of times (and length of intervals) a prospective student has taken the TOEFL should be considered as a factor (read, test-wiseness as an issue)
  • Date of last administration and activity in using English since then should be factored in the decision to accept an applicant or not
  • Finally, TOEFL scores have never meant to be predictors of academic success.
Return to top

I. Introduction

Having been developed in the mid 1960’s, the TOEFL has become the major standard of measuring proficiency in English as a second language throughout the world . Because of its widespread use in determining admissions into colleges and universities, TOEFL scores have taken on a sort of magical aura to them. One rather unfortunate consequence is the emergence of innumerable TOEFL preparation courses—not to mention TOEFL preparation centers—throughout the world: the tail is indeed wagging the dog. Even more unfortunate is the temptation to view one’s TOEFL scores as an indicator of general intelligence or as a predictor of academic success, despite the TOEFL Board’s insistence to the contrary.

Although the TOEFL Board introduced a computer-based version in 1997 to replace paper-based testing, the latter is still prevalent in many parts of the world. Thus, in general, TOEFL scores in the 310-677 range (paper-based test) are still more familiar than those in the 40-300 range (computer-based test).

Like the paper-based test (PBT), the computer-based test (CBT) has a total score and three subscores: Listening, Structure/Writing, and Reading. However, there is an important difference in the CBT: the Structure/Writing subscore includes an essay rating, worth approximately one-half of the total subscore. Additionally, that essay rating (0-6 scale) is reported separately. Thus, whereas the PBT had an optional Test of Written English (TWE) component (0-6 scale), the CBT has a compulsory writing component whose rating is not only reported as a separate subscore but also factored into the Structure/Writing subscore. Since the PBT Structure/Written Expression section contained only short-answer questions, we may assume that the CBT Structure/Writing subscore is a more accurate measure of one’s ability to write than was the PBT counterpart. This point leads us to the next discussion.

Return to top

II. Overall Scores and Subscores

The PBT total score (310-677 range) was calculated as an average of the three subscores multiplied by 10, with adjustments of raw scores made depending on the variation in difficulty among the many editions of the test. With respect to the scoring methods in the CBT, the calculations are based on “the difficulty of the items in the sections (which is determined through pretesting), the examinee’s performance on the items, and the number of items answered.”

In the interpretation of the TOEFL score, whether PBT or CBT, it is important to realize that two individuals may have essentially the same total score but different subscores reflecting differences in language skills; e.g., one student might score 50 (16 CBT) in the Listening section, 60 (25 CBT) in the Structure/Writing section, and 55 (21 CBT) in Reading, arriving at a 550 (actually 210 in this example, but 213 in the Concordance Tables ); on the other hand, a second student might score 60 (25) in Listening, 50 (18) in Structure/Writing, and 55 (21) in Reading, resulting in a total score of 550 (213)—two identical total scores but very different subscores (of two sections in this example). Based on such subscores, we would expect the first student to have more difficulty understanding spoken English than the second individual, while the second one would presumably have more difficulty in writing with grammatical correctness. In neither case could a definitive conclusion be made about the individual’s oral fluency, since neither version of the TOEFL measures spoken English. However, there is some correlation between listening comprehension (first section) and oral fluency.

The bottom line for admission committees is that an applicant’s subscores need to be examined as well as the total score. Here are a few pointers:

  • Regardless of the total score, Listening subscores should not be below 50/16; in my experience, those with scores in the 50/16 range have great difficulty understanding a significant percentage of lectures, and even more difficulty comprehending—much less participating in—class discussions, not to mention everyday spoken English in general.

  • The Essay Writing score in the CBT, or the TWE in the PBT (where available ), both with 0-6 scales with .5 increments, should not be below 4.0, and definitely not below 3.5. This subscore is a better indicator of an applicant’s ability to write than is the Structure/Writing subscore.

  • Above all, admissions committees should not consider one’s overall TOEFL score (PBT or CBT) to be an accurate indicator of an applicant’s ability to speak!

Return to top

III. Other Factors Affecting TOEFL Scores

The number of times an applicant has taken the TOEFL may also be revealing. Occasionally, students take the test several times within a year or two, often without concurrent formalized language instruction. Although each test has novel items, a certain amount of test-wiseness is likely to affect the outcome. Again, consider the fact that many students take TOEFL preparation courses. The question remains as to how much real language learning takes place amid the exposure to the finer points of the exam and strategies for successful test-taking.

Finally, the date that the test was taken may be significant, especially if it is more than a year old. Because it is a test of language proficiency, ETS will not release scores more than two years old. What really matters is the applicant’s use of English since the exam was taken. In the case of transfers from another English-speaking university, we may assume that the student has made some improvements in his/her language ability, whereas for those who do not reside in an English-speaking environment, there is more cause for concern.


IV. The Old GRE Verbal Section and the TOEFL

Guidelines for the Use of GRE Scores...

The following statement found in the GRE 1993-94 Guide needs to be underscored: “Since the GRE tests are developed for those who have been educated in the United States, cultural and educational backgrounds must be considered, along with linguistic factors.” Not surprisingly, in a 1979 study, ETS reported that the non-native speakers found the GRE to contain the hardest verbal aptitude test. The 186 graduate international subjects, who had a mean TOEFL score of 523, achieved a mean GRE Verbal score of only 274 (200-900 scale; S.D.=70).

· Nevertheless, in the past decade, knowing that there are schools that do consider the GRE Verbal score to be as important as, if not more important than, the TOEFL, many applicants have gone to considerable effort to memorize scores of word lists taken from previous GRE Verbal tests. Ironically, a number of us found that many international students who scored above 500 on the GRE Verbal could not use English very well at all. So striking was the trend that I did a correlation study of the GRE Verbal scores, the TOEFL, and the TWE of over 60 new students from Eastern Asia in Fall 2000; I found that although the mean GRE Verbal score was 572, the mean TWE score was only 3.83 (out of 6).

Return to top

V. The New GRE Analytical Writing Section and the TOEFL

Interpreting Scores from GRE...
Score Level Descriptions from GRE...

The new GRE Test includes an Analytical Writing section, using a 6 point scale (with holistic scoring). It purports to measure “critical thinking and analytical writing skills rather than…grammar and mechanics”; more specifically, these writing skills are to be evaluated:

  • Clear and effective articulation of complex ideas

  • Examination of claims and evidence

  • Support for ideas with reasons and examples

  • Focused and coherent discussion

  • Mastery of standard written English
In a recent study conducted by the GRE Board of some 93,000 examinees (three-quarters of whom were native speakers of English), the Analytical Writing scores were identified with the following percentiles:

Score Level
% of Examinees Scoring Lower
6
94
5.5
82
5
65
4.5
45
4
27
3.5
15
3
8
2.5
5
2
3

With respect to its use with ESL examinees, the GRE Board recommends that the Analytical Essay be used to supplement information from the TOEFL and TWE (Essay Writing subsection of the CBT), since the latter writing test assesses one’s command of language skills, not the expression of “high levels of thinking and analytical writing.”

One might argue that it is difficult, in practice, for examiners to separate effective analysis from clear linguistic expression, but the GRE Board is correct in concluding that “if ESL examinees don’t understand the task being posed to them, their performance will be affected.” Consequently, it seems safe to make one conclusion: if an ESL examinee obtains a very high Analytical Writing score (5-6), he/she is likely to have a good command of English.

Return to top

 

 

 

 

Virginia Tech Home Page

© Copyright 2000 - 2008 . All rights reserved.
Graduate School
Graduate Life Center at Donaldson Brown(0325)
Blacksburg, VA 24061
Contact Us

Updated:
Friday, November 9, 2007, 16:21 EST
Privacy Statement