Basic Polyvowel Words

December 19, 2013

By Nirmaldasan


C.K. Ogden’s Basic English has 850 words, just enough to communicate with a global audience. Ogden’s list along with 50 international words could define or describe any word in a dictionary. Winston Churchill was impressed but Rudolf Flesch was not.

There have been arguments for and against controlled English. I would suggest a mix of control and freedom. But before I present the details, here is a new classification of words based on vocalic length. Monovowels are words that have just one vowel letter; divowels, two vowel letters; and polyvowels, three or more vowel letters.

To find the vocalic length of a word, count all occurrences of a e i o u. Now y must also be counted if a syllable of a word has no a e i o u. Here are some examples: rhythm (monovowel; y is counted), stay (monovowel; only a is counted) youth (divowel; only o and u are counted), agony (polyvowel; a o and y are counted).

My first assumption is that polyvowels contribute to reading difficulty with the exception of those found in the Ogden’s list. My second assumption is that all monovowels and divowels are easy to read whether they be present in Ogden’s list or not. As I suggested before, let us have a mix of freedom and control: freedom to use any monovowel or divowel; and control, to use only the words in the following list of Basic Polyvowels, consisting of just 212 words from Ogden’s list:


about account addition adjustment advertisement agreement again against amount amusement animal apparatus approval argument association attention attitude attraction authority automatic awake (21 words)

balance beautiful because before behaviour belief between boiling building business (10 words)

camera carriage cause certain cheese chemical colour committee community company comparison competition complete computer condition connection conscious country culture curtain cushion (21 words)

damage daughter decision degree delicate dependent desire destruction detail development different digestion direction discovery discussion disease distance distribution division (19 words)

education elastic electric engine enough environment equal every example exchange existence expansion experience (13 words)

family feather feeble feeling female fertile fiction foolish frequent future (10 words)

general government guide (3 words)

harbour harmony healthy hearing helicopter heredity history hospital house humour (10 words)

idea important impulse increase industry instrument insurance interest invention (9 words)

journey (1 word)

knowledge (1 word)

language learning leather library liquid loose (6 words)

machine manager married material measure medical meeting memory military minute motion mountain (12 words)

nation natural necessary needle noise (5 words)

observation office operation opinion opposite orange organisation ornament (8 words)

parallel peace physical picture please pleasure poison political position possible potato private probable produce property punishment purpose (17 words)

quality question quiet quite (4 words)

reaction reading ready reason receipt regular relation religion representative request responsible (11 words)

science secretary selection separate serious sneeze society special square statement station structure substance suggestion surprise (15 words)

teaching technology tendency theory together tomorrow tongue trousers trouble (9 words)

umbrella (1 word)

value violent voice (3 words)

waiting weather (2 words)

yesterday (1 word)

NOTE: The Basic Polyvowel Words may be used as a spelling scale too by administering a vocalic cloze test based on this list of just 212 words.

Seven Indices Of Readability

November 8, 2013

By Nirmaldasan


In ‘The Average Sentence Length’, I suggested that a sentence should not be measured only in words but also in syllables and letters. And I gave this rule of thumb: “Over the whole document, make the average sentence length 15-20 words, 25-33 syllables and 75-100 characters.”

Look at this sentence from M.J. Moroney’s Facts From Figures: “Most people are little removed from average intelligence, but geniuses and morons tend to occur in splendid isolation.” Words (W) = 18; Syllables (S) = 34; Letters (L) = 99. Excepting a minor syllabic transgression, Moroney’s sentence seems to flatter my rule of thumb.

These variables W, S and L are good predictors of the readability of a text. Independently and in combination, these factors constitute seven indices of readability — three are mono-variable, three di-variable and one tri-variable. Each index shows the years of schooling (1 to 17+) required to understand a particular text. 

W-Index = W/2 = 18/2 = 9

S-Index = S/3 = 34/3 = 11.3

L-Index = L/10 = 99/10 = 9.9

WS-Index = (W/4) + (S/6) = (18/4) + (34/6) = 10.2

WL-Index = (W/4) + (L/20) = (18/4) + (99/20) = 9.5

SL-Index = (S/6) + (L/20) = (34/6) + (99/20) = 10.6

WSL-Index = (W/6) + (S/9) + (L/30) = (18/6) + (34/9) + (99/30) = 10.1

Writers and teachers may choose any one of the seven indices and use it to measure the readability of any text. They may try out all the seven on different texts and heuristically choose that index which may be the most reliable.

The Words We Choose

May 23, 2013

By Nirmaldasan


—This article appeared in the Jan-March 2013 issue of Vidura, a quarterly journal of the Press Institute of India. —

A writer who thinks and feels is a writer who knows words that engage the reader. John Ayto, in his introduction to the Bloomsbury Dictionary of Word Origins, tells us that the average English speaker knows about 50,000 words. If the print and the broadcast media function within this vocabulary-range, readership and rating points are sure to increase. But unfamiliar words have the potency to turnoff the audience.  

Edward Thorndike found that there was a relationship between familiarity and frequency. He spent about a decade preparing The Teacher’s Word Book (1921) of 10,000 words. “The list,” he writes, “makes it much easier than it has been in the past to put standards for word knowledge, by grades, by ages, or by mental ages, into clear, definite comprehensible form. For example, we may say that at a certain mental age or grade the minimum standard should be knowledge of the meanings of 95 per cent of the first 2500 words, 80 per cent of the next 1000, 60 per cent of the next 1500, and 20 percent of the next 5000.” This list he expanded to 30,000 words in 1944, teaming up with Irving Lorge.

Alfred Lewerenz discovered an unusual pattern in the frequency of words. In ‘Proposals For British Readability Measures’, Harry McLaughlin writes about him: “I have always had a soft spot in my heart for the genius who predicted readability from the percentages of words beginning w, h or b (which he considered easy) and of words beginning i or e (considered hard).” George Johnson, in ‘An Objective Method Of  Determining Reading Difficulty’, writes: “Alfred S. Lewerenz reported a study made by the Educational Research Division of the Los Angeles Public Schools. By comparing the number of different words beginning with each letter of the alphabet in a given selection with that of the standard provided by Webster’s Elementary School Dictionary, five critical letters were selected as indicators of reading difficulty. Words beginning with W, H, and B were found frequently in easy material while there were comparatively few beginning with I and E. With difficult reading material the situation was reversed.”

Edgar Dale compiled a list of 3000 words, familiar to 80 percent of 4th graders in the U.S. This list was revised in 1983 and is a factor in the new Dale-Chall readability formula of 1995. Notable among other lists are the Oxford 3000 and Voice of America’s Special English Word Book. The Oxford 3000 also includes some important and familiar words that are not frequent.

Zipf’s law

George Kingsley Zipf was also interested in word frequencies. Two of his books are The Psycho-biology Of Language (1935) and Human Behaviour And The Principle Of Least Effort: An Introduction To Human Ecology (1949). He observed that words of high frequency were usually short or became shorter with frequent use (e.g. bicycle to bike; omnibus to bus; cafeteria to cafe). Moreover, what is called Zipf’s law states that the frequency of a word in a corpus is inversely proportional to its rank. The frequency of the top-ranked word is twice that of the second-ranked word, thrice that of the third-ranked word and so on.   

Since there is a strong correlation between frequency and the length of words, it has become easier for writers to identify words that are familiar to most of their readers. The length of a word may be measured in characters or syllables. The Raygor Estimate Graph of Alton L. Raygor (1977) considers words of six or more characters difficult; the SMOG Grading of Harry McLaughlin (1969) counts polysyllables as a marker of reading difficulty. My research, presented in Readability Monitor, suggests the following measures: reading factor for print and the listening factor for broadcast.

Broadcast Listening Factor

Let P3 be the number of polysyllables in three sentences of a broadcast copy. The Broadcast Listening Factor (BLF) = P3. The lower the score, the higher the listenability. A score of zero means that the story is very easy and a score of 10+ means that it is very hard.

We will get a better estimate if we take 10 samples of three sentences each from various parts of the copy and calculate listenability. If we take just one long sample of 30 sentences, then the BLF = P30/10.

Newspaper Reading Factor

I have argued elsewhere that the average syllable has three letters; and so a polysyllable may have nine letters or more. So a long word is one that has more than eight letters.

The number of long words other than the names of persons and places in five sentences may be called the Newspaper Reading Factor. Names of persons and places are exempted from the count as they are usually supposed to be very easy to understand. This formula measures newspaper texts on a five-point scale: 0 – 4 (very easy); 5 – 8 (easy); 9 – 12 (standard); 13 – 16 (hard); and 17+ (very hard).

The Conversational Style

February 8, 2013

By Nirmaldasan

—This article appeared in the July-September 2012 issue of Vidura, a quarterly journal of the Press Institute of India —

The most readable feature stories in magazines and newspapers are written in the conversational style. Plain English experts have laid much emphasis on the write-the-way-you-talk principle. In How To Take The Fog Out Of Writing, Robert Gunning says: “A conversational tone is one of the best avenues to good writing.” The choice of words, the syntax and the human voice constitute the conversational style.

This style is easy to achieve on radio and television. In The Art Of Plain Talk, Rudolf Flesch writes: “When we are talking, of course, we don’t use any punctuation marks. We use a system of shorter or longer pauses between words to join or separate our ideas, and we raise or lower our voice to make things sound emphatic or casual. In other words, we make ourselves understood not only by words but also by pauses and by stress or pitch.”      

But how to reproduce the conversational tone in print? Flesch has an answer: “Punctuation gets pauses and stress (but not pitch) on paper.” His punctuation system takes care of normal pause, shorter pause and longer pause between words and between sentences. His system also indicates whether utterances have normal stress or emphasis or no stress. Let us take a brief look at pause and stress:


Shorter pause between words: use hyphen (eg. If you say no-work no-pay, then I say no- pay no-work.)

Shorter pause between sentences: use semi-colon (eg. I came; I saw; I conquered.) or colon (eg. Three things I like most: chess, poetry and mathematics.)

Normal pause between words: use usual spacing (eg. I came and saw and conquered.)

Normal pause between sentences:  use the full stop (eg. I came. I saw. I conquered.)

Longer pause between words: use em-dash (eg. The greatest symbol — zero.)

Longer pause between sentences: use a new paragraph


No stress: use parenthesis ( )

Normal stress: use the usual type of upright letters  

Emphasis: use italics or bold type

Here are some other considerations for achieving a conversational style:

* Use words that are short and easy to say (monosyllables or disyllables)

* Use words that are familiar to the average reader

* Use contractions such as I’ve, isn’t, haven’t and aren’t

* Use words that are concrete, which refer to people and things

* Use the active voice instead of the passive

* Use questions and exclamations wherever appropriate

Human Interest Measure (HIM)

Flesch developed a formula called Human Interest Score (Scale: 0 to 100) based on two variables: personal words and personal sentences. The greater the score, the greater the human interest. Flesch also used a five-point scale to describe the level of human interest in a feature story. He measured science magazines (dull), trade publications (mildly interesting), digests (interesting), New Yorker (highly interesting) and fiction (dramatic).

His formula is complicated as it involves two factors 3.635 and 0.314. Those who are fond of decimals may read Flesch’s original article of 1948 titled ‘A New Readability Yardstick’ in William H. Dubay’s book Unlocking Language.

Here I wish to present a useful simplification of his formula. Let us call it HIM (human interest measure). The formula involves the number of personal references (pr) in 100 words and the number of conversational sentences (cs) in 10 sentences.

Personal references are what Flesch calls ‘personal words’: “(a) All first-, second-, and third-person pronouns except the neuter pronouns it, its, itself, and they, them, their, theirs, themselves if referring to things rather than people, (b) All words that have masculine or feminine natural gender, e.g. Jones, Mary, father, sister, iceman, actress. Do not count common-gender words like teacher, doctor, employee, assistant, spouse.      Count singular and plural forms, (c) The group of words people (with the plural verb) and     folks.”

Conversational sentences are (a) utterances within quotes or indirect speech (b) imperative sentences (c) interjections and (d) sentence fragments (eg. With a dagger.) whose meaning depend on their previous sentences (eg. How did Brutus kill Caesar?)

The formula is simple: HIM = pr + cs

Scale: 0 to 3 (dull); 4 to 6 (mildly interesting); 7 to 13 (interesting); 14 to 19 (highly interesting); and 20+ (dramatic).

Rule of thumb

In every 10 sentences, let there be at least two conversational sentences; and in every 100 words, at least 7 personal references.

Now for a final quote from Jyoti Sanyal’s Indlish: “All the stories we heard as children were full of dialogue. We heard what the fox said to persuade the tiger to re-enter the cage the Brahmin had freed it from, and what the tiger said to justify his decision to gobble his benefactor. We all remember what the ants told the grasshopper, who’d only fiddled the whole summer, while they’d worked to save food for the winter. Dialogue and description made those tales live — and often, dialogue was the more important device.”

The Seven Rs Of Sub-editing

October 1, 2012

By Nirmaldasan


– This article appeared in the April-June 2012 issue of Vidura, a quarterly journal of the Press Institute of India:

A well-edited report has no factual, grammatical and stylistic errors. Accuracy, brevity and clarity help readers or listeners to quickly get the news and remember the key points. Unlike Rudyard Kipling’s elephant, people may not have insatiable curiosity unless they are told who-what-when-where-why (5Ws) and how (1H) in a language that obeys the principles of clear writing. An understanding of the news values of timeliness, prominence, proximity, conflict and human interest is essential for sub-editors to choose news stories and suitably edit them for different media.

The single act of processing news copy may be divided into what may be called the seven Rs of sub-editing: 1. Read 2. Remove 3. Rectify 4. Replace 5. Reorder 6. Rewrite and 7. Revise. But this division is arbitrary and is not without overlaps. Sub-editors usually skip some of the Rs when they sprint against the clock to meet deadlines. This perhaps explains why there are more mistakes in the first editions of newspapers. Later, the night editors and their team settle down to tackle the errors with the help of the seven Rs. Consequently, the later editions are more reader-friendly.


Any raw report must be read twice. A casual first reading would tell us the sense of the story. This should be followed by a second critical reading, which would reveal the copy’s merits and faults. Some reporters turn in such fine self-edited reports that the other six Rs become unnecessary; and the sub-editors have nothing more to do than write some effective headlines for such stories.


Philip A. Yaffe, in his book titled The Gettysburg Approach To Writing & Speaking Like A Professional, says: “Nothing in a text is neutral. Whatever doesn’t add to the text, subtracts from it.” It is, therefore, the sub-editor’s job to remove from a report anything that does not enrich it. This could be a superfluous word or phrase, a libelous sentence or an optional paragraph. The reporter may not like it, but it is a job that must be done in the interest of the readers. Some examples may help clarify this point:

The panda eats, shoots and leaves

(The comma changes the meaning)

Major crisis

(Major is a superfluous word. But water crisis makes sense)

The ship will arrive in the month of May

(The phrase the month of is superfluous)

The secretary and the treasurer

(One must be careful here. If the phrase refers to two persons, then it is correct. But if one person holds both these posts, then the correct phrase is the secretary and treasurer)


Spot and correct all spelling and capitalization errors. Insert appropriate honorifics such as Mr or Ms or Dr before names of persons. Wrong dates and figures must also be rectified. Yaffe says that long sentences should be checked for logical coherence and short ones for logical linkage. A long sentence with unrelated ideas must be split up into shorter sentences; and short sentences comprising related ideas must be fused into a longer sentence.


The fourth R replaces unfamiliar words with the familiar; the long with the short; and the ambiguous with the precise. Malapropisms, as in Richard Brinsley Sheridan’s Rivals) must be spotted and replaced with the right words. Here are some fourth R examples:

Wend one’s way to the market

(Go to the market)

Dismount from a bus

(Get down from a bus)

Released from hospital

(Discharged from hospital)

To illiterate him

(To obliterate him)


A news report must have the inverted pyramid structure. This means that events are arranged in the order of diminishing significance. So there is a need to reorder the paragraphs of news stories written in the chronological order.

The order of words may alter the meaning of a sentence. In some cases it can improve the rhythm. Thomas Elliott Berry, in his book titled The Most Common Mistakes In English Usage, says: “Whenever possible, modifiers should be arranged according to length, with the shortest preceding the others.” He suggests that the sentence He was disheveled, dirty, and untidy should be reordered as He was dirty, untidy and disheveled. Berry also says that modifiers should always be arranged in a logical sequence. The same is true of verbs too. Here are some fifth R examples:

to go boldly

(to boldly go is rhythmic though the infinitive is split)

A policeman misbehaved with a woman in a drunken state

(A policeman in a drunken state misbehaved with a woman)

She ate, dressed and bathed

(She bathed, dressed and ate)


Inexperienced sub-editors with remarkable linguistic skills have the irresistible urge to rewrite every report. This urge must be resisted for it is the job of the reporters to rewrite their stories. However, sub-editors may rewrite for the following reasons: 1. Merging different stories on the same topic; 2. Summarizing a story for want of space; 3. Highlighting the news point; and 4. Simplifying the copy for average readers. But a rewriter should as far as possible use the original words of the reporter.


Revise the edited report to check whether the changes are justified. The revision may help either fix hitherto unspotted errors or fine-tune the report so that the readers get a newsy copy that is easy to read and easy to remember.


The Vocalic Cloze Procedure

August 21, 2012

By Nirmaldasan


The World Bank commissioned the National Council of Educational Research and Training (New Delhi) in February 1995 to assess the readability of primary level text-books in collaboration with CIIL (Mysore). Six states were covered: Assam, Haryana, Kerala, Karnataka, Maharashtra and Tamil Nadu. The results were published in IER: Special Number 1995.

The analysis was based on the assumption that ‘if 20 per cent of the children score above 75 per cent of the marks and less than 16 per cent of the sample score below 25 per cent of marks, the book could be considered fairly appropriate in terms of readability’. “This rationale is based on,” the report says, “(a) the assumption of normal distribution, and (b) the principle followed in textbook writing of pitching the level a little higher than the average.”

J. Charles Alderson discusses the several techniques for testing reading in his book titled Assessing Reading. Frederick J. Kelly’s multiple-choice questions and Wilson Taylor’s cloze procedure are two of the popular techniques. These tests are easy to administer and it has been found that there is a mathematical relationship between the scores obtained by each of them.

The average syllable has three letters, of which two are usually consonants and one is a vowel. Alderson points to the fact that the English consonants convey more information than the vowels. “Thus it is easier to restore vowels in distorted words than the consonants: _n _ngl_sh th_ c_ns_n_nts _r_ m_r_ _nf_rm_t_v_ th_n v_w_ls.” Why shouldn’t this fact be used to test reading? We will call this the vocalic cloze procedure.

By deleting all the vowels in a sample of 100 words, the vocalic cloze procedure may be administered to a class of students, whose task is to fill in the blanks till time is called. Fifteen minutes may be more than sufficient for the test. Count every word that is completely filled and ignore the rest. The text from which the sample is drawn may be considered suitable for the class if: a) At least 20 per cent of the students score more than 75 marks; and b) Less than 16 per cent of them score below 25 per cent.

If the class takes a test on at least three samples from the text, then the scores would make the vocalic cloze procedure more reliable.

The Rhythm Of Headlines

July 21, 2012

By Nirmaldasan


—  This article appeared in the January-March 2012 issue of Vidura, a quarterly journal of the Press Institute of India:

Whether it be news headline or feature headline, though one is usually factual and the other is often figurative, all headlines without exception have more to do with verse than with prose. Every headline is a poetic line. A badly scripted headline is prosaic, but an effective headline is rhythmic!

Many of the headlines that we read in newspapers allude to book or film titles and play with proverbial quotes or idiomatic expressions. Here are just three imaginary examples, with the allusions in brackets:

1. Murder on the Pandyan Express (Agatha Christie’s Murder On The Orient Express)

2. To err is humour (Alexander Pope’s ‘To err is human …’)

3. A tale of two children (Charles Dickens’s A Tale Of Two Cities)

To grasp the rhythm of the above headlines, we need to look at the three elements of the poetic line: syllable, stress and foot.


Though children are taught how to count syllables in school, they soon forget because they haven’t been told that pronouncing words is as important as getting the spelling right. Teachers themselves need to understand that it is the syllable that determines the subtle rhythm of English prose.

Each word consists of one or more syllables. According to the Advanced Learner’s Dictionary (8th edition), a syllable is ‘any of the units into which a word is divided, containing a vowel sound and usually one or more consonants’. In determining the number of syllables, we always go by the ear and not the eye. For example, the word ‘rhythm’ has no vowel letter but has one vowel sound; ‘soar’ has two vowel letters but only one vowel sound; and ‘beauteous’ has six vowel letters but only two vowel sounds. Based on the number of vowel sounds, words may be monosyllabic or disyllabic or polysyllabic.

By using contractions, the number of syllables may be reduced or increased for the sake of rhythm. The disyllabic phrase is not can be reduced to the monosyllabic isn’t. By the same token, the monosyllabic I’ve can be increased to the disyllabic I have.

Let us return to the imaginary headlines to do a syllable count:

1. Mur/der/ on/ the/ Pand/yan/ Ex/press (eight)

2. To/ err/ is/ hu/mour (five)

3. A/ tale/ of / two/ chil/dren (six)


Syllables combine to form words, phrases and clauses. In the process, some syllables acquire conventional emphasis called stress. Those syllables that are uttered lightly without stress are called slack syllables. The alternation of some stresses and some slacks creates rhythm.

Prefixes and suffixes usually are slack. So are the articles ‘a’, ‘an’ and ‘the’. Words that end in –ion such as ‘derivation’, ‘duplication’ and ‘faction’ take the stress on the penultimate syllable set in bold type. Some words have their conventional stress on the syllable preceding certain suffixes. Examples: diabolic, inimical, precious, initially, enmity.

Sometimes, a shift in the stress can alter meaning. In ‘Stress, Intelligibility and the English Language’ (Eclectic Representations, May 2011), Dr. Franklin Daniel writes: “Great care should be taken to pay particular attention to the role of variation of quality in those words which are distinguished from others by a shift of accent i.e. in the verb and noun/ adjective function. For example, the words ‘desert,’ ‘conduct,’ ‘convict,’ and ‘object’ should be stressed on the first syllables if they are used as nouns or adjectives and stressed on their second syllables if they are used as verbs.”

Now we may code the two types of syllable as ‘ta’ for slack and ‘tum’ for stress. Time to return again to our imaginary headlines to look at stress:

1. Murder on the Pandyan Express

(tumta tum ta  tumta tumta)

2. To err is humour

(ta tum ta tumta)

3. A tale of two children (or) A tale of two children

(ta tum ta tum tata (or) ta tum ta ta tumta)


A headline may be divided into feet just like a poetic line. Each foot usually has two or three syllables. Here are the basic patterns of the disyllabic foot: tatum or tumta or tata or tumtum. And here are the basic patterns of the trisyllabic foot: tatatum or tatumta or tumtata or tumtatum or tatata or tumtumtum. Any pattern may be accepted if it sounds rhythmic to the headline writer’s ear.

The distribution of stresses and slacks creates rising rhythm (tatum or tatatum) and falling rhythm (tumta or tumtata). It is also possible to think of a rising-falling combination called rocking rhythm (tatumta or tumtatum).

The four traditional patterns of a poetic line are the following:

Iambic: tatum tatum tatum tatum … (rising)

Trochaic: tumta tumta tumta tumta … (falling)

Anapaestic: tatatum tatatum tatatum tatatum … (galloping)

Dactylic: tumtata tumtata tumtata tumtata … (marching)

Rhythm doesn’t respect word boundaries. A foot may consist of syllables from many words. So when a headline is divided into feet, one must try to look for a recurring pattern. For the last time, let us go back to the imaginary headlines:

1. Murder / on the / Pandyan / Express (falling rhythm)

(tumta / tumta /  tumta / tumta)

2. To err / is humour (rising and rocking rhythm)

(tatum / tatumta)

3. A tale / of two / children (rising and rocking rhythm)

(or) A tale / of two chil/dren (rising and rocking rhythm)

(tatum tatum tata (or) tatum tatatumta)

Final tip

Headline writers need to read a lot of verse and make it a habit to hum any of the several tunes such as the famous Britannia Marie jingle ‘tumtatatum’ before they match sound and sense in their rhythmic headlines. Remember, it is mainly the rhythm that makes a headline persuasive and memorable. The Rhythm Of Headlines — tatum tatumta!

The Standard Text

June 23, 2012

By Nirmaldasan


A standard text aims for a Flesch Reading Ease score ranging from 60 to 70. In ‘A New Readability Yardstick’ of 1948, Rudolf Flesch presents a pattern of Reading Ease scores along with a seven-point scale: very difficult, difficult, fairly difficult, standard, fairly easy, easy and very easy.

In this article, we will look only at the standard text and the averages that go with it. According to Flesch, the average sentence length in words is 17 and the average number of syllables per 100 words is 147.  So a typical magazine such as digests will have about 17 words per sentence and 1.47 syllables per word.

In The Art of Plain Talk, Flesch writes: “First, sentence length is measured in words because they are the easiest units to count: you just count everything that is separated by white space on the page. But don’t forget that you might just as well count syllables, which would give you a more exact idea of sentence length: a sentence of twenty one-syllable words would then appear shorter than a sentence of ten one-syllable words and six two-syllable words. Keep that in mind while counting words.”

Since a more exact idea of sentence length is desirable, let us agree to count syllables instead of words. Then the standard text will have about 25 syllables per sentence [17 words per sentence x 1.47 syllables per word = 24.99 syllables per sentence].

The Strain Index, which I derived in 2005, is based on just this one variable: syllables per sentence multiplied by a factor of 0.3. For a standard text, the Strain Index = 0.3 x 25 = 7.5. Thus anyone with about eight years of schooling can understand a standard text.

Flesch’s Quick Rule-of-thumb Yardstick

May 21, 2012

By Nirmaldasan


In The Art Of Plain Talk, Rudolf Flesch says that simple language consists of ‘short sentences, few affixes, and many personal references’.  The average words per sentence (W), percentage of affixes (A) and percentage of personal references (P) are strung into a complicated expression: Difficulty score = (0.1338 * W) + (0.645 * A) – (0.0659 * P) – 0.75. Scoring system: up to 1 (very easy, 5th grade); 1 to 2 (easy, 6th); 2 to 3 (fairly easy, 7th); 3 to 4 (standard, 8th to 9th); 4 to 5 (fairly difficult, 10th to 12th); 5 to 6 (difficult, 13th to 16th); and 6 or more (very difficult, college graduate).

But in a postscript, Flesch presents a Quick Rule-of-thumb Yardstick (QRY): Difficulty score = [(A – P) / 2] + W. Scoring system: up to 13 (very easy), 13 to 20 (easy), 20 to 29 (fairly easy), 29 to 36 (standard), 36 to 43 (fairly difficult), 43 to 52 (difficult) and 52 or more (very difficult). But if we take a sample of 50 words instead of 100, then the calculation becomes simpler. Let ‘a’ and ‘p’ be the affixes and personal references in a sample of 50 words; and ‘w’, the average number of words per sentence. Then, difficulty score = w + a – p.

Affixes are extremely hard to spot, but Flesch gives a helpful list of affixes in the appendix. Personal references are easy to locate: names of people, personal pronouns that refer to people and a finite list of human-interest words.

Let’s apply the QRY on the following 50-word paragraph taken from a longer sample analysed by Flesch (the personal references are in capitals and the affixes are in brackets):

“WE shall plan, (with)in each countr(y) and (be)tween countr(ies), for more jobs and for mak(ing), trad(ing) and us(ing) more goods. (Al)so, WE shall plan to do (a)way with all ways of treat(ing) the trade of some countr(ies) bett(er) than that of others, and to low(er) tariffs and other trade barr(iers).”

Number of sentences = 2

Number of words = 50

w = 50/2 = 25

a = 14

p = 2

Difficulty score = w + a – p = 25 + 14 – 2 = 37 (fairly difficult)

For a reliable assessment, the QRY must be applied on at least 10 samples of 50 words each. “Some readers, I am afraid,” writes Flesch, “will expect a magic formula for good writing and will be disappointed with my simple yardstick. Others, with a passion for accuracy, will wallow in the little rules and computations but lose sight of the principles of plain English. What I hope for are readers who won’t take the formula too seriously and won’t expect from it more than a rough estimate.”

Longer The Sentence, Greater The Strain

April 30, 2012

By Nirmaldasan


—  This article appeared in the October-December 2011 issue of Vidura, a quarterly journal of the Press Institute of India: —

All plain English experts echo Robert Gunning’s advice: “Keep sentences short.” The longer the sentence, the greater the strain on the reader. Harold Evans, author of Newsman’s English, writes: “The real seduction of the simple sentence is that taken by itself, it is short and it is confined to one idea. The real trouble with so many compound-complex sentences is that they have to carry too many ideas.”

Martin Cutts, in the Oxford Guide To Plain English, has this to say: “More people fear snakes than full stops, so they recoil when a long sentence comes hissing across the page.” He recommends an average sentence length of 15-20 words.

Jyoti Sanyal, author of Indlish (the book for every English-speaking Indian) writes: “Based on several studies, press associations in the USA have laid down a readability table. Their survey shows readers find sentences of 8 words or less very easy to read; 11 words, easy; 14 words fairly easy; 17 words standard; 21 words fairly difficult; 25 words difficult and 29 words or more, very difficult.” We will return to this readability table a little later.

Rudolph Flesch, creator of the Flesch Reading Ease formula, studied the readability of various magazines: Scientific (very difficult), Academic (difficult), Quality (fairly difficult), Digests (standard), Slick-fiction (fairly easy), Pulp-fiction (easy) and Comics (very easy). He counted the number of syllables per 100 words and measured the average sentence length in words. He put these two variables into a complex formula in an article titled ‘A New Readability Yardstick’, published in the 3 June 1948 issue of the Journal of Applied Psychology.

Now words may be monosyllables (short), disyllables (medium) or polysyllables (long). So an average sentence comprising 17 long words may still be a strain on the reader. In early 2005, when I was a senior sub-editor with The Hindu, I realized that the best way to overcome this problem was to measure the sentence in syllables.

While it is easy to count words, counting syllables may not be all that easy. But with a little practice, anyone can count syllables swiftly. Remember that it is the syllable that determines the rhythm of prose. The syllable is the basic unit of utterance. Each syllable has only one vowel sound. ‘Television’ has four syllables; ‘Internet’ has three; ‘Radio’ has two; and ‘Print’ has only one!

Flesch writes: “If in doubt about syllabication rules, use a good dictionary. Count the number of syllables in symbols and figures according to the way they are normally read aloud, e.g. two for $ (‘dollars’) and four for 1918 (‘nineteen-eighteen’).”

The readability table, which we have already seen, may be better expressed in terms of syllables. Sentences of 10 syllables or less are very easy to read; 14 syllables, easy; 19 syllables, fairly easy; 25 syllables, standard; 33 syllables, fairly difficult; 42 syllables, difficult; and 56 syllables or more, very difficult.


Average sentence length (words)

Average sentence length (syllables)


of style


29 or more

56 or more

Very difficult








Fairly difficult








Fairly Easy






8 or less

10 or less

Very easy

But this table, derived from a simplification of Flesch’s observation of a pattern of ‘Reading Ease’ scores, does not identify the level of the readers for whom a text may be easy or difficult.

So here follows a formula that measures the readability of a text on a scale of 1 to 17+ years of schooling. The Strain Index, which I evolved as an alternative to Gunning’s Fog Index, is a syllable-counting formula. Unlike many a readability formula which intimidates the user with a complex equation, the Strain Index is very easy to use. The plain English expert William DuBay called it ‘remarkably simple’.

In its popular form, Strain Index = S3 /10 (S3 is the number of syllables in three sentences). Let us take an example:

‘I just don’t agree with this hoo-ha about short sentences and simple words,’ said PM. ‘If I can write long sentences well, why shouldn’t I?’ Nor does PM agree with the advice on the use of everyday words.

That passage comes from an article titled ‘Shrink Or Sink’ in Sanyal’s Indlish. The sample has 53 syllables. So, Strain Index = 53 / 10 = 5.3 years of schooling; a Standard V student can understand what Sanyal has written.

But to get a better estimate of the readability of a text, one must test more three-sentence samples or choose a long sample. In its non-popular form, Strain Index = S30 / 100 (S30 is the number of syllables in 30 sentences). This is the same as taking 10 three-sentence samples and calculating the average.

It is possible, though not necessary, to apply the formula to a full text consisting of ‘n’ sentences. In this case, the general form of the Strain Index = 0.3 x (Sn / n), in which Sn is the number of syllables in ‘n’ sentences. But always remember that any readability formula should only be applied on well-written texts.


Get every new post delivered to your Inbox.