The Lemma Readability Index

January 17, 2016

By Nirmaldasan


The Dale-Chall readability formula uses a list of 3000 familiar words. This formula has a very high correlation with text difficulty. However, readability formulae that do not use a list such as Robert Gunning’s Fog Index are more popular as they are easy to apply. But there is no reason to discard the list as it tests each word of a text. Let us look at a shorter list of 100 commonest words, which typically covers 50% of the over two billion words in the Oxford English Corpus. This list in rank order is found in an article titled ‘The OEC Facts About The Language’:

The list uses the idea of lemmas, ‘a lemma being the base form of a word’.  An alphabetical arrangement of the words would help us use the list for measuring readability.

Commonest Lemma List

a  about  after  all  also  an  and  any  as  at  (10 lemmas)

back  be  because  but  by  (5 lemmas)

can  come  could  (3 lemmas)

day  do  (2 lemmas)

even  (1 lemma)

first  for  from  (3 lemmas)

get  give  go  good  (4 lemmas)

have  he  her  him  his  how  (6 lemmas)

I  if  in  into  it  its  (6 lemmas)

just  (1 lemma)

know  (1 lemma)

like  look  (2 lemmas)

make  me  most  my  (4 lemmas)

new  no  not  now  (4 lemmas)

of  on  one  only  or  other  our  out  over  (9 lemmas)

people (1 lemma)

say  see  she  so  some  (5 lemmas)

take  than  that  the  their  them  then  there  these  they  think  this  time  to  two  (15 lemmas)

up  us  use (3 lemmas)

want  way  we  well  what  when which  who  will with  work  would  (12 lemmas)

year  you  your (3 lemmas)

New Formula

The Lemma Readability Index (LRI) measures texts on a scale of 1 to 17 years of schooling. The LRI is the number of words per sentence not in the Commonest Lemma List. Take a sample of n sentences from a text. Count the Words Not in List (WNL). Then, LRI = WNL/n.

Counting Guidelines

  1. Do not count proper names (names of people, places, days, months, organisations … )
  2. Do not count numerals, symbols, abbreviations, acronyms
  3. Do not count lemmas that are in the list
  4. Do not count words that are grammatically associated with the lemmas in the list. Some examples:
  5. Since be is in the list, do not count being, am, are, is, was, were
  6. Since take is in the list, do not count taken, taker, takers, takes, taking, took
  7. Since new is in the list, do not count newer, newest, newly, news, newsy
  8. Since time is in the list, do not count timed, timely, timer, times, time’s, timing
  9. Do not count compound words, if each part is in the list. Some examples:
  10. Since some and how are in the list, do not count somehow
  11. Since any and way are in the list, do not count anyway
  12. Since an and other are in the list, do not count another
  13. Since good and will are in the list, do not count goodwill
  14. Count compound words as many times as they appear even if one part is not in the list. Some examples:
  15. Since how is in the list but ever is not, count however
  16. Since will is in the list but free is not, count freewill
  17. Count every single word (even repetitions) which is neither in the list nor grammatically associated with the lemmas in the list

These guidelines solve most of the counting problems. But one is likely to come across a number of deceptive words. For instance, a and do are in the list, therefore ado begs not to be counted (fifth guideline). However, ado has to be counted because it is not a compound word. Again, take the words better and more. Though both are not in the list, one is tempted to exclude them from the count because of semantic reasons. Better is related to good, and more is related to some and most. Resist the temptation and count every deceptive word.  Remember that if we do not count more, then we cannot also count moreover. Let’s not quibble.


Let us apply the formula on the following paragraph:

“The first batch of students of the Certificate in Online Journalism programme announced that they are online at the viva voce examination on Saturday (29 November 2014). They have created for themselves a website, a blog and a twitter account.”

Let us follow the counting guidelines.

  1. Proper names are not counted (Saturday, November)
  2. Numerals are not counted (29, 2014)
  3. Lemmas in the list are not counted (The, first, of, in, that, they, at, on, have, for, a, and)
  4. Words grammatically associated with the lemmas are not counted (are)
  5. Compound words, if each part is in the list, is not counted (— )
  6. Compound words even if one part is not in the list is counted (Online, online, themselves, website)
  7. Every single word which is neither in the list nor grammatically associated with the lemmas in the list is counted (batch, students, Certificate, Journalism, programme, announced, viva, voce, examination, created, blog, twitter, account)

In the order of appearance, here is the list of words not in the list: batch, students, Certificate, Online, Journalism, programme, announced, online, viva, voce, examination, created, themselves, website, blog, twitter, account. WNL = 17.

Since the number of sentences in the sample is 2, LRI = WNL/n = 17/2 = 8.5 years of schooling.


Let us compare the LRI with the Fog Index (FI).

Average Words per Sentence (AWS) = 40/2 = 20

Percentage of hard words (P) = (1/40)*100 = 2.5 [Not all polysyllables are hard. In this example, Certificate and Journalism are not counted as hard because they are part of the name of a programme. The only hard word is examination]

FI = 0.4*(AWS+P) = 0.4*(20+2.5) = 0.4*22.5 = 9 years of schooling.

The LRI compares very well with the FI. One needs to test the validity of LRI on at least a 100 samples. Please go ahead and put the LRI to the test. Thank you.


Related Articles

Direct  Dale-Chall Grading:

Plain Fog Index:

Readability Conjectures:



Speakability: The EMLU Formula

December 18, 2015

By Nirmaldasan


Speakability is the child’s skill in producing meaningful utterances. Words of an utterance may be divided into prefixes, roots and suffixes – the smallest units of meaning called morphemes. The Mean Length of Utterance is the total number of morphemes divided by the total number of utterances. Usually, a sample of 100 utterances is taken for calculating the MLU. Graham Williamson’s Mean Length Of Utterance: is a very fine and comprehensive article on the subject.

The Expected Mean Length of Utterance (EMLU) is a simple formula that can diminish parental anxiety about a child’s speakability. If M is the age of the child from Months 18 to 60, then EMLU = (M – 5) / 10. For example, if the child’s age is 25 months, then EMLU is 2 morphemes per utterance. Parents should understand that some children may be fast or slow in gaining speakability skills. The EMLU formula just gives a ballpark figure. Parents may be happy if their children produce more morphemes than the formula indicates, but should not worry if they produce less. Sooner or later, children are bound to pick up their native language.


The Simplicity Score Of Business Writing

October 27, 2014

By Nirmaldasan

The average sentence length is arguably the best indicator of text difficulty. A writer who uses this yardstick has to divide the number of words by the number of sentences. If we choose a sample of 10 sentences, then the calculation becomes simpler. “Sentences in Time and Reader’s Digest vary considerably in length, but the average sentence length, issue after issue, is only about 17 words,” writes Robert Gunning in How To Take The Fog Out Of Writing.

If our writing measures up to this standard, then in 10 sentences there may be about 170 words. Too much of counting, you say? I have solved this problem with the help of a short sample of words, a count of complete sentences and a simple scoring system.

The Simplicity Score (SS) of a business text is the number of complete sentences in a sample of exactly 35 words. It is obvious that text simplicity increases with the number of complete sentences in the sample. The SS may vary on a five-point scale as follows: 0 (very hard), 1 (hard), 2 (standard), 3 (easy) and 4+ (very easy).

What’s the SS of the following paragraph from Gunning?

“But, while the Fog Index is handy for judging readability, it is not a formula for how to write. Don’t feel that you have written clearly just because your Fog Index is low. Anyone could put together a mumbo jumbo of short words in short sentences that would convey nothing at all to the reader.”

Let’s first draw an exact 35-word sample: “But, while the Fog Index is handy for judging readability, it is not a formula for how to write. Don’t feel that you have written clearly just because your Fog Index is low. Anyone could …”

The SS is 2 (standard).

All writers should do a bit of counting words and sentences and revise their writing for the sake of their readers. Before we send an article to the Press or a business proposal to a prospective customer, we should ask, “What’s the SS?”

Vocalic Readability Index

September 17, 2014

By Nirmaldasan


Spotting vowels is easy; even a computer can do it. The vowels (a e i o u y) may not predict reading levels as reliably or as accurately as syllables can. But being closely associated with the syllables, vowels can measure text difficulty.

A syllable may have one or more vowels: by has one, tie has two, course has three and queue has four. In ‘The Vocalic Cloze Procedure’, I wrote: “The average syllable has three letters, of which two are usually consonants and one is a vowel.” I chanced upon a table of relative frequencies of alphabetic characters in Simon Singh’s The Code Book. H. Beker and F. Piper’s table had first appeared in Cipher Systems: The Protection Of Communication.

Based on a sample of 100,362 letters, the authors calculated the frequency of each letter of the alphabet. I summed the frequencies of only the vowels and obtained the figure 40.2%. This means that there are 1.2 vowels per syllable and 2 vowels per word.

That should suffice for us to derive the Vocalic Readability Index (VRI) = AVS / 4. The AVS is the average vowels per sentence, which is divided by 4 to match the text to the reading or grade level from 1 to 17+. The VRI can be easily derived from the W-Index or the S-Index or the L-Index; these indices of mine are discussed in another article titled ‘Seven Indices Of Readability’.

I tested the VRI on the 10 graded samples found in the appendix of Jeanne S. Chall and Edgar Dale’s Readability Revisited (the new Dale-Chall readability formula). The VRI predicts within two grade levels on all the tested samples; and within one grade level on 50 % of the samples. The VRI was able to predict exactly the reading level of the passage beginning ‘The controversy over the laser-armed satellite …’, which has a reading level 9-10. There were 189 vowels in 5 sentences. Therefore, the AVS is 189/5 = 37.8 and the VRI is 37.8/4 = 9.45.

To obtain a better estimate, let V25 be the number of vowels in 25 sentences. Then the VRI = V25 / 100. What is more, this formula can be easily computerised.

Basic Polyvowel Words

December 19, 2013

By Nirmaldasan


C.K. Ogden’s Basic English has 850 words, just enough to communicate with a global audience. Ogden’s list along with 50 international words could define or describe any word in a dictionary. Winston Churchill was impressed but Rudolf Flesch was not.

There have been arguments for and against controlled English. I would suggest a mix of control and freedom. But before I present the details, here is a new classification of words based on vocalic length. Monovowels are words that have just one vowel letter; divowels, two vowel letters; and polyvowels, three or more vowel letters.

To find the vocalic length of a word, count all occurrences of a e i o u. Now y must also be counted if a syllable of a word has no a e i o u. Here are some examples: rhythm (monovowel; y is counted), stay (monovowel; only a is counted) youth (divowel; only o and u are counted), agony (polyvowel; a o and y are counted).

My first assumption is that polyvowels contribute to reading difficulty with the exception of those found in the Ogden’s list. My second assumption is that all monovowels and divowels are easy to read whether they be present in Ogden’s list or not. As I suggested before, let us have a mix of freedom and control: freedom to use any monovowel or divowel; and control, to use only the words in the following list of Basic Polyvowels, consisting of just 212 words from Ogden’s list:


about account addition adjustment advertisement agreement again against amount amusement animal apparatus approval argument association attention attitude attraction authority automatic awake (21 words)

balance beautiful because before behaviour belief between boiling building business (10 words)

camera carriage cause certain cheese chemical colour committee community company comparison competition complete computer condition connection conscious country culture curtain cushion (21 words)

damage daughter decision degree delicate dependent desire destruction detail development different digestion direction discovery discussion disease distance distribution division (19 words)

education elastic electric engine enough environment equal every example exchange existence expansion experience (13 words)

family feather feeble feeling female fertile fiction foolish frequent future (10 words)

general government guide (3 words)

harbour harmony healthy hearing helicopter heredity history hospital house humour (10 words)

idea important impulse increase industry instrument insurance interest invention (9 words)

journey (1 word)

knowledge (1 word)

language learning leather library liquid loose (6 words)

machine manager married material measure medical meeting memory military minute motion mountain (12 words)

nation natural necessary needle noise (5 words)

observation office operation opinion opposite orange organisation ornament (8 words)

parallel peace physical picture please pleasure poison political position possible potato private probable produce property punishment purpose (17 words)

quality question quiet quite (4 words)

reaction reading ready reason receipt regular relation religion representative request responsible (11 words)

science secretary selection separate serious sneeze society special square statement station structure substance suggestion surprise (15 words)

teaching technology tendency theory together tomorrow tongue trousers trouble (9 words)

umbrella (1 word)

value violent voice (3 words)

waiting weather (2 words)

yesterday (1 word)

NOTE: The Basic Polyvowel Words may be used as a spelling scale too by administering a vocalic cloze test based on this list of just 212 words.

Seven Indices Of Readability

November 8, 2013

By Nirmaldasan


In ‘The Average Sentence Length’, I suggested that a sentence should not be measured only in words but also in syllables and letters. And I gave this rule of thumb: “Over the whole document, make the average sentence length 15-20 words, 25-33 syllables and 75-100 characters.”

Look at this sentence from M.J. Moroney’s Facts From Figures: “Most people are little removed from average intelligence, but geniuses and morons tend to occur in splendid isolation.” Words (W) = 18; Syllables (S) = 34; Letters (L) = 99. Excepting a minor syllabic transgression, Moroney’s sentence seems to flatter my rule of thumb.

These variables W, S and L are good predictors of the readability of a text. Independently and in combination, these factors constitute seven indices of readability — three are mono-variable, three di-variable and one tri-variable. Each index shows the years of schooling (1 to 17+) required to understand a particular text. 

W-Index = W/2 = 18/2 = 9

S-Index = S/3 = 34/3 = 11.3

L-Index = L/10 = 99/10 = 9.9

WS-Index = (W/4) + (S/6) = (18/4) + (34/6) = 10.2

WL-Index = (W/4) + (L/20) = (18/4) + (99/20) = 9.5

SL-Index = (S/6) + (L/20) = (34/6) + (99/20) = 10.6

WSL-Index = (W/6) + (S/9) + (L/30) = (18/6) + (34/9) + (99/30) = 10.1

Writers and teachers may choose any one of the seven indices and use it to measure the readability of any text. They may try out all the seven on different texts and heuristically choose that index which may be the most reliable.

The Words We Choose

May 23, 2013

By Nirmaldasan


—This article appeared in the Jan-March 2013 issue of Vidura, a quarterly journal of the Press Institute of India. —

A writer who thinks and feels is a writer who knows words that engage the reader. John Ayto, in his introduction to the Bloomsbury Dictionary of Word Origins, tells us that the average English speaker knows about 50,000 words. If the print and the broadcast media function within this vocabulary-range, readership and rating points are sure to increase. But unfamiliar words have the potency to turnoff the audience.  

Edward Thorndike found that there was a relationship between familiarity and frequency. He spent about a decade preparing The Teacher’s Word Book (1921) of 10,000 words. “The list,” he writes, “makes it much easier than it has been in the past to put standards for word knowledge, by grades, by ages, or by mental ages, into clear, definite comprehensible form. For example, we may say that at a certain mental age or grade the minimum standard should be knowledge of the meanings of 95 per cent of the first 2500 words, 80 per cent of the next 1000, 60 per cent of the next 1500, and 20 percent of the next 5000.” This list he expanded to 30,000 words in 1944, teaming up with Irving Lorge.

Alfred Lewerenz discovered an unusual pattern in the frequency of words. In ‘Proposals For British Readability Measures’, Harry McLaughlin writes about him: “I have always had a soft spot in my heart for the genius who predicted readability from the percentages of words beginning w, h or b (which he considered easy) and of words beginning i or e (considered hard).” George Johnson, in ‘An Objective Method Of  Determining Reading Difficulty’, writes: “Alfred S. Lewerenz reported a study made by the Educational Research Division of the Los Angeles Public Schools. By comparing the number of different words beginning with each letter of the alphabet in a given selection with that of the standard provided by Webster’s Elementary School Dictionary, five critical letters were selected as indicators of reading difficulty. Words beginning with W, H, and B were found frequently in easy material while there were comparatively few beginning with I and E. With difficult reading material the situation was reversed.”

Edgar Dale compiled a list of 3000 words, familiar to 80 percent of 4th graders in the U.S. This list was revised in 1983 and is a factor in the new Dale-Chall readability formula of 1995. Notable among other lists are the Oxford 3000 and Voice of America’s Special English Word Book. The Oxford 3000 also includes some important and familiar words that are not frequent.

Zipf’s law

George Kingsley Zipf was also interested in word frequencies. Two of his books are The Psycho-biology Of Language (1935) and Human Behaviour And The Principle Of Least Effort: An Introduction To Human Ecology (1949). He observed that words of high frequency were usually short or became shorter with frequent use (e.g. bicycle to bike; omnibus to bus; cafeteria to cafe). Moreover, what is called Zipf’s law states that the frequency of a word in a corpus is inversely proportional to its rank. The frequency of the top-ranked word is twice that of the second-ranked word, thrice that of the third-ranked word and so on.   

Since there is a strong correlation between frequency and the length of words, it has become easier for writers to identify words that are familiar to most of their readers. The length of a word may be measured in characters or syllables. The Raygor Estimate Graph of Alton L. Raygor (1977) considers words of six or more characters difficult; the SMOG Grading of Harry McLaughlin (1969) counts polysyllables as a marker of reading difficulty. My research, presented in Readability Monitor, suggests the following measures: reading factor for print and the listening factor for broadcast.

Broadcast Listening Factor

Let P3 be the number of polysyllables in three sentences of a broadcast copy. The Broadcast Listening Factor (BLF) = P3. The lower the score, the higher the listenability. A score of zero means that the story is very easy and a score of 10+ means that it is very hard.

We will get a better estimate if we take 10 samples of three sentences each from various parts of the copy and calculate listenability. If we take just one long sample of 30 sentences, then the BLF = P30/10.

Newspaper Reading Factor

I have argued elsewhere that the average syllable has three letters; and so a polysyllable may have nine letters or more. So a long word is one that has more than eight letters.

The number of long words other than the names of persons and places in five sentences may be called the Newspaper Reading Factor. Names of persons and places are exempted from the count as they are usually supposed to be very easy to understand. This formula measures newspaper texts on a five-point scale: 0 – 4 (very easy); 5 – 8 (easy); 9 – 12 (standard); 13 – 16 (hard); and 17+ (very hard).

The Conversational Style

February 8, 2013

By Nirmaldasan

—This article appeared in the July-September 2012 issue of Vidura, a quarterly journal of the Press Institute of India —

The most readable feature stories in magazines and newspapers are written in the conversational style. Plain English experts have laid much emphasis on the write-the-way-you-talk principle. In How To Take The Fog Out Of Writing, Robert Gunning says: “A conversational tone is one of the best avenues to good writing.” The choice of words, the syntax and the human voice constitute the conversational style.

This style is easy to achieve on radio and television. In The Art Of Plain Talk, Rudolf Flesch writes: “When we are talking, of course, we don’t use any punctuation marks. We use a system of shorter or longer pauses between words to join or separate our ideas, and we raise or lower our voice to make things sound emphatic or casual. In other words, we make ourselves understood not only by words but also by pauses and by stress or pitch.”      

But how to reproduce the conversational tone in print? Flesch has an answer: “Punctuation gets pauses and stress (but not pitch) on paper.” His punctuation system takes care of normal pause, shorter pause and longer pause between words and between sentences. His system also indicates whether utterances have normal stress or emphasis or no stress. Let us take a brief look at pause and stress:


Shorter pause between words: use hyphen (eg. If you say no-work no-pay, then I say no- pay no-work.)

Shorter pause between sentences: use semi-colon (eg. I came; I saw; I conquered.) or colon (eg. Three things I like most: chess, poetry and mathematics.)

Normal pause between words: use usual spacing (eg. I came and saw and conquered.)

Normal pause between sentences:  use the full stop (eg. I came. I saw. I conquered.)

Longer pause between words: use em-dash (eg. The greatest symbol — zero.)

Longer pause between sentences: use a new paragraph


No stress: use parenthesis ( )

Normal stress: use the usual type of upright letters  

Emphasis: use italics or bold type

Here are some other considerations for achieving a conversational style:

* Use words that are short and easy to say (monosyllables or disyllables)

* Use words that are familiar to the average reader

* Use contractions such as I’ve, isn’t, haven’t and aren’t

* Use words that are concrete, which refer to people and things

* Use the active voice instead of the passive

* Use questions and exclamations wherever appropriate

Human Interest Measure (HIM)

Flesch developed a formula called Human Interest Score (Scale: 0 to 100) based on two variables: personal words and personal sentences. The greater the score, the greater the human interest. Flesch also used a five-point scale to describe the level of human interest in a feature story. He measured science magazines (dull), trade publications (mildly interesting), digests (interesting), New Yorker (highly interesting) and fiction (dramatic).

His formula is complicated as it involves two factors 3.635 and 0.314. Those who are fond of decimals may read Flesch’s original article of 1948 titled ‘A New Readability Yardstick’ in William H. Dubay’s book Unlocking Language.

Here I wish to present a useful simplification of his formula. Let us call it HIM (human interest measure). The formula involves the number of personal references (pr) in 100 words and the number of conversational sentences (cs) in 10 sentences.

Personal references are what Flesch calls ‘personal words’: “(a) All first-, second-, and third-person pronouns except the neuter pronouns it, its, itself, and they, them, their, theirs, themselves if referring to things rather than people, (b) All words that have masculine or feminine natural gender, e.g. Jones, Mary, father, sister, iceman, actress. Do not count common-gender words like teacher, doctor, employee, assistant, spouse.      Count singular and plural forms, (c) The group of words people (with the plural verb) and     folks.”

Conversational sentences are (a) utterances within quotes or indirect speech (b) imperative sentences (c) interjections and (d) sentence fragments (eg. With a dagger.) whose meaning depend on their previous sentences (eg. How did Brutus kill Caesar?)

The formula is simple: HIM = pr + cs

Scale: 0 to 3 (dull); 4 to 6 (mildly interesting); 7 to 13 (interesting); 14 to 19 (highly interesting); and 20+ (dramatic).

Rule of thumb

In every 10 sentences, let there be at least two conversational sentences; and in every 100 words, at least 7 personal references.

Now for a final quote from Jyoti Sanyal’s Indlish: “All the stories we heard as children were full of dialogue. We heard what the fox said to persuade the tiger to re-enter the cage the Brahmin had freed it from, and what the tiger said to justify his decision to gobble his benefactor. We all remember what the ants told the grasshopper, who’d only fiddled the whole summer, while they’d worked to save food for the winter. Dialogue and description made those tales live — and often, dialogue was the more important device.”

The Seven Rs Of Sub-editing

October 1, 2012

By Nirmaldasan


— This article appeared in the April-June 2012 issue of Vidura, a quarterly journal of the Press Institute of India:

A well-edited report has no factual, grammatical and stylistic errors. Accuracy, brevity and clarity help readers or listeners to quickly get the news and remember the key points. Unlike Rudyard Kipling’s elephant, people may not have insatiable curiosity unless they are told who-what-when-where-why (5Ws) and how (1H) in a language that obeys the principles of clear writing. An understanding of the news values of timeliness, prominence, proximity, conflict and human interest is essential for sub-editors to choose news stories and suitably edit them for different media.

The single act of processing news copy may be divided into what may be called the seven Rs of sub-editing: 1. Read 2. Remove 3. Rectify 4. Replace 5. Reorder 6. Rewrite and 7. Revise. But this division is arbitrary and is not without overlaps. Sub-editors usually skip some of the Rs when they sprint against the clock to meet deadlines. This perhaps explains why there are more mistakes in the first editions of newspapers. Later, the night editors and their team settle down to tackle the errors with the help of the seven Rs. Consequently, the later editions are more reader-friendly.


Any raw report must be read twice. A casual first reading would tell us the sense of the story. This should be followed by a second critical reading, which would reveal the copy’s merits and faults. Some reporters turn in such fine self-edited reports that the other six Rs become unnecessary; and the sub-editors have nothing more to do than write some effective headlines for such stories.


Philip A. Yaffe, in his book titled The Gettysburg Approach To Writing & Speaking Like A Professional, says: “Nothing in a text is neutral. Whatever doesn’t add to the text, subtracts from it.” It is, therefore, the sub-editor’s job to remove from a report anything that does not enrich it. This could be a superfluous word or phrase, a libelous sentence or an optional paragraph. The reporter may not like it, but it is a job that must be done in the interest of the readers. Some examples may help clarify this point:

The panda eats, shoots and leaves

(The comma changes the meaning)

Major crisis

(Major is a superfluous word. But water crisis makes sense)

The ship will arrive in the month of May

(The phrase the month of is superfluous)

The secretary and the treasurer

(One must be careful here. If the phrase refers to two persons, then it is correct. But if one person holds both these posts, then the correct phrase is the secretary and treasurer)


Spot and correct all spelling and capitalization errors. Insert appropriate honorifics such as Mr or Ms or Dr before names of persons. Wrong dates and figures must also be rectified. Yaffe says that long sentences should be checked for logical coherence and short ones for logical linkage. A long sentence with unrelated ideas must be split up into shorter sentences; and short sentences comprising related ideas must be fused into a longer sentence.


The fourth R replaces unfamiliar words with the familiar; the long with the short; and the ambiguous with the precise. Malapropisms, as in Richard Brinsley Sheridan’s Rivals) must be spotted and replaced with the right words. Here are some fourth R examples:

Wend one’s way to the market

(Go to the market)

Dismount from a bus

(Get down from a bus)

Released from hospital

(Discharged from hospital)

To illiterate him

(To obliterate him)


A news report must have the inverted pyramid structure. This means that events are arranged in the order of diminishing significance. So there is a need to reorder the paragraphs of news stories written in the chronological order.

The order of words may alter the meaning of a sentence. In some cases it can improve the rhythm. Thomas Elliott Berry, in his book titled The Most Common Mistakes In English Usage, says: “Whenever possible, modifiers should be arranged according to length, with the shortest preceding the others.” He suggests that the sentence He was disheveled, dirty, and untidy should be reordered as He was dirty, untidy and disheveled. Berry also says that modifiers should always be arranged in a logical sequence. The same is true of verbs too. Here are some fifth R examples:

to go boldly

(to boldly go is rhythmic though the infinitive is split)

A policeman misbehaved with a woman in a drunken state

(A policeman in a drunken state misbehaved with a woman)

She ate, dressed and bathed

(She bathed, dressed and ate)


Inexperienced sub-editors with remarkable linguistic skills have the irresistible urge to rewrite every report. This urge must be resisted for it is the job of the reporters to rewrite their stories. However, sub-editors may rewrite for the following reasons: 1. Merging different stories on the same topic; 2. Summarizing a story for want of space; 3. Highlighting the news point; and 4. Simplifying the copy for average readers. But a rewriter should as far as possible use the original words of the reporter.


Revise the edited report to check whether the changes are justified. The revision may help either fix hitherto unspotted errors or fine-tune the report so that the readers get a newsy copy that is easy to read and easy to remember.


The Vocalic Cloze Procedure

August 21, 2012

By Nirmaldasan


The World Bank commissioned the National Council of Educational Research and Training (New Delhi) in February 1995 to assess the readability of primary level text-books in collaboration with CIIL (Mysore). Six states were covered: Assam, Haryana, Kerala, Karnataka, Maharashtra and Tamil Nadu. The results were published in IER: Special Number 1995.

The analysis was based on the assumption that ‘if 20 per cent of the children score above 75 per cent of the marks and less than 16 per cent of the sample score below 25 per cent of marks, the book could be considered fairly appropriate in terms of readability’. “This rationale is based on,” the report says, “(a) the assumption of normal distribution, and (b) the principle followed in textbook writing of pitching the level a little higher than the average.”

J. Charles Alderson discusses the several techniques for testing reading in his book titled Assessing Reading. Frederick J. Kelly’s multiple-choice questions and Wilson Taylor’s cloze procedure are two of the popular techniques. These tests are easy to administer and it has been found that there is a mathematical relationship between the scores obtained by each of them.

The average syllable has three letters, of which two are usually consonants and one is a vowel. Alderson points to the fact that the English consonants convey more information than the vowels. “Thus it is easier to restore vowels in distorted words than the consonants: _n _ngl_sh th_ c_ns_n_nts _r_ m_r_ _nf_rm_t_v_ th_n v_w_ls.” Why shouldn’t this fact be used to test reading? We will call this the vocalic cloze procedure.

By deleting all the vowels in a sample of 100 words, the vocalic cloze procedure may be administered to a class of students, whose task is to fill in the blanks till time is called. Fifteen minutes may be more than sufficient for the test. Count every word that is completely filled and ignore the rest. The text from which the sample is drawn may be considered suitable for the class if: a) At least 20 per cent of the students score more than 75 marks; and b) Less than 16 per cent of them score below 25 per cent.

If the class takes a test on at least three samples from the text, then the scores would make the vocalic cloze procedure more reliable.


Get every new post delivered to your Inbox.