The Words We Choose

May 23, 2013

By Nirmaldasan


—This article appeared in the Jan-March 2013 issue of Vidura, a quarterly journal of the Press Institute of India. —

A writer who thinks and feels is a writer who knows words that engage the reader. John Ayto, in his introduction to the Bloomsbury Dictionary of Word Origins, tells us that the average English speaker knows about 50,000 words. If the print and the broadcast media function within this vocabulary-range, readership and rating points are sure to increase. But unfamiliar words have the potency to turnoff the audience.  

Edward Thorndike found that there was a relationship between familiarity and frequency. He spent about a decade preparing The Teacher’s Word Book (1921) of 10,000 words. “The list,” he writes, “makes it much easier than it has been in the past to put standards for word knowledge, by grades, by ages, or by mental ages, into clear, definite comprehensible form. For example, we may say that at a certain mental age or grade the minimum standard should be knowledge of the meanings of 95 per cent of the first 2500 words, 80 per cent of the next 1000, 60 per cent of the next 1500, and 20 percent of the next 5000.” This list he expanded to 30,000 words in 1944, teaming up with Irving Lorge.

Alfred Lewerenz discovered an unusual pattern in the frequency of words. In ‘Proposals For British Readability Measures’, Harry McLaughlin writes about him: “I have always had a soft spot in my heart for the genius who predicted readability from the percentages of words beginning w, h or b (which he considered easy) and of words beginning i or e (considered hard).” George Johnson, in ‘An Objective Method Of  Determining Reading Difficulty’, writes: “Alfred S. Lewerenz reported a study made by the Educational Research Division of the Los Angeles Public Schools. By comparing the number of different words beginning with each letter of the alphabet in a given selection with that of the standard provided by Webster’s Elementary School Dictionary, five critical letters were selected as indicators of reading difficulty. Words beginning with W, H, and B were found frequently in easy material while there were comparatively few beginning with I and E. With difficult reading material the situation was reversed.”

Edgar Dale compiled a list of 3000 words, familiar to 80 percent of 4th graders in the U.S. This list was revised in 1983 and is a factor in the new Dale-Chall readability formula of 1995. Notable among other lists are the Oxford 3000 and Voice of America’s Special English Word Book. The Oxford 3000 also includes some important and familiar words that are not frequent.

Zipf’s law

George Kingsley Zipf was also interested in word frequencies. Two of his books are The Psycho-biology Of Language (1935) and Human Behaviour And The Principle Of Least Effort: An Introduction To Human Ecology (1949). He observed that words of high frequency were usually short or became shorter with frequent use (e.g. bicycle to bike; omnibus to bus; cafeteria to cafe). Moreover, what is called Zipf’s law states that the frequency of a word in a corpus is inversely proportional to its rank. The frequency of the top-ranked word is twice that of the second-ranked word, thrice that of the third-ranked word and so on.   

Since there is a strong correlation between frequency and the length of words, it has become easier for writers to identify words that are familiar to most of their readers. The length of a word may be measured in characters or syllables. The Raygor Estimate Graph of Alton L. Raygor (1977) considers words of six or more characters difficult; the SMOG Grading of Harry McLaughlin (1969) counts polysyllables as a marker of reading difficulty. My research, presented in Readability Monitor, suggests the following measures: reading factor for print and the listening factor for broadcast.

Broadcast Listening Factor

Let P3 be the number of polysyllables in three sentences of a broadcast copy. The Broadcast Listening Factor (BLF) = P3. The lower the score, the higher the listenability. A score of zero means that the story is very easy and a score of 10+ means that it is very hard.

We will get a better estimate if we take 10 samples of three sentences each from various parts of the copy and calculate listenability. If we take just one long sample of 30 sentences, then the BLF = P30/10.

Newspaper Reading Factor

I have argued elsewhere that the average syllable has three letters; and so a polysyllable may have nine letters or more. So a long word is one that has more than eight letters.

The number of long words other than the names of persons and places in five sentences may be called the Newspaper Reading Factor. Names of persons and places are exempted from the count as they are usually supposed to be very easy to understand. This formula measures newspaper texts on a five-point scale: 0 – 4 (very easy); 5 – 8 (easy); 9 – 12 (standard); 13 – 16 (hard); and 17+ (very hard).