The Words We Choose

May 23, 2013

By Nirmaldasan


—This article appeared in the Jan-March 2013 issue of Vidura, a quarterly journal of the Press Institute of India. —

A writer who thinks and feels is a writer who knows words that engage the reader. John Ayto, in his introduction to the Bloomsbury Dictionary of Word Origins, tells us that the average English speaker knows about 50,000 words. If the print and the broadcast media function within this vocabulary-range, readership and rating points are sure to increase. But unfamiliar words have the potency to turnoff the audience.  

Edward Thorndike found that there was a relationship between familiarity and frequency. He spent about a decade preparing The Teacher’s Word Book (1921) of 10,000 words. “The list,” he writes, “makes it much easier than it has been in the past to put standards for word knowledge, by grades, by ages, or by mental ages, into clear, definite comprehensible form. For example, we may say that at a certain mental age or grade the minimum standard should be knowledge of the meanings of 95 per cent of the first 2500 words, 80 per cent of the next 1000, 60 per cent of the next 1500, and 20 percent of the next 5000.” This list he expanded to 30,000 words in 1944, teaming up with Irving Lorge.

Alfred Lewerenz discovered an unusual pattern in the frequency of words. In ‘Proposals For British Readability Measures’, Harry McLaughlin writes about him: “I have always had a soft spot in my heart for the genius who predicted readability from the percentages of words beginning w, h or b (which he considered easy) and of words beginning i or e (considered hard).” George Johnson, in ‘An Objective Method Of  Determining Reading Difficulty’, writes: “Alfred S. Lewerenz reported a study made by the Educational Research Division of the Los Angeles Public Schools. By comparing the number of different words beginning with each letter of the alphabet in a given selection with that of the standard provided by Webster’s Elementary School Dictionary, five critical letters were selected as indicators of reading difficulty. Words beginning with W, H, and B were found frequently in easy material while there were comparatively few beginning with I and E. With difficult reading material the situation was reversed.”

Edgar Dale compiled a list of 3000 words, familiar to 80 percent of 4th graders in the U.S. This list was revised in 1983 and is a factor in the new Dale-Chall readability formula of 1995. Notable among other lists are the Oxford 3000 and Voice of America’s Special English Word Book. The Oxford 3000 also includes some important and familiar words that are not frequent.

Zipf’s law

George Kingsley Zipf was also interested in word frequencies. Two of his books are The Psycho-biology Of Language (1935) and Human Behaviour And The Principle Of Least Effort: An Introduction To Human Ecology (1949). He observed that words of high frequency were usually short or became shorter with frequent use (e.g. bicycle to bike; omnibus to bus; cafeteria to cafe). Moreover, what is called Zipf’s law states that the frequency of a word in a corpus is inversely proportional to its rank. The frequency of the top-ranked word is twice that of the second-ranked word, thrice that of the third-ranked word and so on.   

Since there is a strong correlation between frequency and the length of words, it has become easier for writers to identify words that are familiar to most of their readers. The length of a word may be measured in characters or syllables. The Raygor Estimate Graph of Alton L. Raygor (1977) considers words of six or more characters difficult; the SMOG Grading of Harry McLaughlin (1969) counts polysyllables as a marker of reading difficulty. My research, presented in Readability Monitor, suggests the following measures: reading factor for print and the listening factor for broadcast.

Broadcast Listening Factor

Let P3 be the number of polysyllables in three sentences of a broadcast copy. The Broadcast Listening Factor (BLF) = P3. The lower the score, the higher the listenability. A score of zero means that the story is very easy and a score of 10+ means that it is very hard.

We will get a better estimate if we take 10 samples of three sentences each from various parts of the copy and calculate listenability. If we take just one long sample of 30 sentences, then the BLF = P30/10.

Newspaper Reading Factor

I have argued elsewhere that the average syllable has three letters; and so a polysyllable may have nine letters or more. So a long word is one that has more than eight letters.

The number of long words other than the names of persons and places in five sentences may be called the Newspaper Reading Factor. Names of persons and places are exempted from the count as they are usually supposed to be very easy to understand. This formula measures newspaper texts on a five-point scale: 0 – 4 (very easy); 5 – 8 (easy); 9 – 12 (standard); 13 – 16 (hard); and 17+ (very hard).


The Conversational Style

February 8, 2013

By Nirmaldasan

—This article appeared in the July-September 2012 issue of Vidura, a quarterly journal of the Press Institute of India —

The most readable feature stories in magazines and newspapers are written in the conversational style. Plain English experts have laid much emphasis on the write-the-way-you-talk principle. In How To Take The Fog Out Of Writing, Robert Gunning says: “A conversational tone is one of the best avenues to good writing.” The choice of words, the syntax and the human voice constitute the conversational style.

This style is easy to achieve on radio and television. In The Art Of Plain Talk, Rudolf Flesch writes: “When we are talking, of course, we don’t use any punctuation marks. We use a system of shorter or longer pauses between words to join or separate our ideas, and we raise or lower our voice to make things sound emphatic or casual. In other words, we make ourselves understood not only by words but also by pauses and by stress or pitch.”      

But how to reproduce the conversational tone in print? Flesch has an answer: “Punctuation gets pauses and stress (but not pitch) on paper.” His punctuation system takes care of normal pause, shorter pause and longer pause between words and between sentences. His system also indicates whether utterances have normal stress or emphasis or no stress. Let us take a brief look at pause and stress:


Shorter pause between words: use hyphen (eg. If you say no-work no-pay, then I say no- pay no-work.)

Shorter pause between sentences: use semi-colon (eg. I came; I saw; I conquered.) or colon (eg. Three things I like most: chess, poetry and mathematics.)

Normal pause between words: use usual spacing (eg. I came and saw and conquered.)

Normal pause between sentences:  use the full stop (eg. I came. I saw. I conquered.)

Longer pause between words: use em-dash (eg. The greatest symbol — zero.)

Longer pause between sentences: use a new paragraph


No stress: use parenthesis ( )

Normal stress: use the usual type of upright letters  

Emphasis: use italics or bold type

Here are some other considerations for achieving a conversational style:

* Use words that are short and easy to say (monosyllables or disyllables)

* Use words that are familiar to the average reader

* Use contractions such as I’ve, isn’t, haven’t and aren’t

* Use words that are concrete, which refer to people and things

* Use the active voice instead of the passive

* Use questions and exclamations wherever appropriate

Human Interest Measure (HIM)

Flesch developed a formula called Human Interest Score (Scale: 0 to 100) based on two variables: personal words and personal sentences. The greater the score, the greater the human interest. Flesch also used a five-point scale to describe the level of human interest in a feature story. He measured science magazines (dull), trade publications (mildly interesting), digests (interesting), New Yorker (highly interesting) and fiction (dramatic).

His formula is complicated as it involves two factors 3.635 and 0.314. Those who are fond of decimals may read Flesch’s original article of 1948 titled ‘A New Readability Yardstick’ in William H. Dubay’s book Unlocking Language.

Here I wish to present a useful simplification of his formula. Let us call it HIM (human interest measure). The formula involves the number of personal references (pr) in 100 words and the number of conversational sentences (cs) in 10 sentences.

Personal references are what Flesch calls ‘personal words’: “(a) All first-, second-, and third-person pronouns except the neuter pronouns it, its, itself, and they, them, their, theirs, themselves if referring to things rather than people, (b) All words that have masculine or feminine natural gender, e.g. Jones, Mary, father, sister, iceman, actress. Do not count common-gender words like teacher, doctor, employee, assistant, spouse.      Count singular and plural forms, (c) The group of words people (with the plural verb) and     folks.”

Conversational sentences are (a) utterances within quotes or indirect speech (b) imperative sentences (c) interjections and (d) sentence fragments (eg. With a dagger.) whose meaning depend on their previous sentences (eg. How did Brutus kill Caesar?)

The formula is simple: HIM = pr + cs

Scale: 0 to 3 (dull); 4 to 6 (mildly interesting); 7 to 13 (interesting); 14 to 19 (highly interesting); and 20+ (dramatic).

Rule of thumb

In every 10 sentences, let there be at least two conversational sentences; and in every 100 words, at least 7 personal references.

Now for a final quote from Jyoti Sanyal’s Indlish: “All the stories we heard as children were full of dialogue. We heard what the fox said to persuade the tiger to re-enter the cage the Brahmin had freed it from, and what the tiger said to justify his decision to gobble his benefactor. We all remember what the ants told the grasshopper, who’d only fiddled the whole summer, while they’d worked to save food for the winter. Dialogue and description made those tales live — and often, dialogue was the more important device.”

The Seven Rs Of Sub-editing

October 1, 2012

By Nirmaldasan


— This article appeared in the April-June 2012 issue of Vidura, a quarterly journal of the Press Institute of India:

A well-edited report has no factual, grammatical and stylistic errors. Accuracy, brevity and clarity help readers or listeners to quickly get the news and remember the key points. Unlike Rudyard Kipling’s elephant, people may not have insatiable curiosity unless they are told who-what-when-where-why (5Ws) and how (1H) in a language that obeys the principles of clear writing. An understanding of the news values of timeliness, prominence, proximity, conflict and human interest is essential for sub-editors to choose news stories and suitably edit them for different media.

The single act of processing news copy may be divided into what may be called the seven Rs of sub-editing: 1. Read 2. Remove 3. Rectify 4. Replace 5. Reorder 6. Rewrite and 7. Revise. But this division is arbitrary and is not without overlaps. Sub-editors usually skip some of the Rs when they sprint against the clock to meet deadlines. This perhaps explains why there are more mistakes in the first editions of newspapers. Later, the night editors and their team settle down to tackle the errors with the help of the seven Rs. Consequently, the later editions are more reader-friendly.


Any raw report must be read twice. A casual first reading would tell us the sense of the story. This should be followed by a second critical reading, which would reveal the copy’s merits and faults. Some reporters turn in such fine self-edited reports that the other six Rs become unnecessary; and the sub-editors have nothing more to do than write some effective headlines for such stories.


Philip A. Yaffe, in his book titled The Gettysburg Approach To Writing & Speaking Like A Professional, says: “Nothing in a text is neutral. Whatever doesn’t add to the text, subtracts from it.” It is, therefore, the sub-editor’s job to remove from a report anything that does not enrich it. This could be a superfluous word or phrase, a libelous sentence or an optional paragraph. The reporter may not like it, but it is a job that must be done in the interest of the readers. Some examples may help clarify this point:

The panda eats, shoots and leaves

(The comma changes the meaning)

Major crisis

(Major is a superfluous word. But water crisis makes sense)

The ship will arrive in the month of May

(The phrase the month of is superfluous)

The secretary and the treasurer

(One must be careful here. If the phrase refers to two persons, then it is correct. But if one person holds both these posts, then the correct phrase is the secretary and treasurer)


Spot and correct all spelling and capitalization errors. Insert appropriate honorifics such as Mr or Ms or Dr before names of persons. Wrong dates and figures must also be rectified. Yaffe says that long sentences should be checked for logical coherence and short ones for logical linkage. A long sentence with unrelated ideas must be split up into shorter sentences; and short sentences comprising related ideas must be fused into a longer sentence.


The fourth R replaces unfamiliar words with the familiar; the long with the short; and the ambiguous with the precise. Malapropisms, as in Richard Brinsley Sheridan’s Rivals) must be spotted and replaced with the right words. Here are some fourth R examples:

Wend one’s way to the market

(Go to the market)

Dismount from a bus

(Get down from a bus)

Released from hospital

(Discharged from hospital)

To illiterate him

(To obliterate him)


A news report must have the inverted pyramid structure. This means that events are arranged in the order of diminishing significance. So there is a need to reorder the paragraphs of news stories written in the chronological order.

The order of words may alter the meaning of a sentence. In some cases it can improve the rhythm. Thomas Elliott Berry, in his book titled The Most Common Mistakes In English Usage, says: “Whenever possible, modifiers should be arranged according to length, with the shortest preceding the others.” He suggests that the sentence He was disheveled, dirty, and untidy should be reordered as He was dirty, untidy and disheveled. Berry also says that modifiers should always be arranged in a logical sequence. The same is true of verbs too. Here are some fifth R examples:

to go boldly

(to boldly go is rhythmic though the infinitive is split)

A policeman misbehaved with a woman in a drunken state

(A policeman in a drunken state misbehaved with a woman)

She ate, dressed and bathed

(She bathed, dressed and ate)


Inexperienced sub-editors with remarkable linguistic skills have the irresistible urge to rewrite every report. This urge must be resisted for it is the job of the reporters to rewrite their stories. However, sub-editors may rewrite for the following reasons: 1. Merging different stories on the same topic; 2. Summarizing a story for want of space; 3. Highlighting the news point; and 4. Simplifying the copy for average readers. But a rewriter should as far as possible use the original words of the reporter.


Revise the edited report to check whether the changes are justified. The revision may help either fix hitherto unspotted errors or fine-tune the report so that the readers get a newsy copy that is easy to read and easy to remember.


The Vocalic Cloze Procedure

August 21, 2012

By Nirmaldasan


The World Bank commissioned the National Council of Educational Research and Training (New Delhi) in February 1995 to assess the readability of primary level text-books in collaboration with CIIL (Mysore). Six states were covered: Assam, Haryana, Kerala, Karnataka, Maharashtra and Tamil Nadu. The results were published in IER: Special Number 1995.

The analysis was based on the assumption that ‘if 20 per cent of the children score above 75 per cent of the marks and less than 16 per cent of the sample score below 25 per cent of marks, the book could be considered fairly appropriate in terms of readability’. “This rationale is based on,” the report says, “(a) the assumption of normal distribution, and (b) the principle followed in textbook writing of pitching the level a little higher than the average.”

J. Charles Alderson discusses the several techniques for testing reading in his book titled Assessing Reading. Frederick J. Kelly’s multiple-choice questions and Wilson Taylor’s cloze procedure are two of the popular techniques. These tests are easy to administer and it has been found that there is a mathematical relationship between the scores obtained by each of them.

The average syllable has three letters, of which two are usually consonants and one is a vowel. Alderson points to the fact that the English consonants convey more information than the vowels. “Thus it is easier to restore vowels in distorted words than the consonants: _n _ngl_sh th_ c_ns_n_nts _r_ m_r_ _nf_rm_t_v_ th_n v_w_ls.” Why shouldn’t this fact be used to test reading? We will call this the vocalic cloze procedure.

By deleting all the vowels in a sample of 100 words, the vocalic cloze procedure may be administered to a class of students, whose task is to fill in the blanks till time is called. Fifteen minutes may be more than sufficient for the test. Count every word that is completely filled and ignore the rest. The text from which the sample is drawn may be considered suitable for the class if: a) At least 20 per cent of the students score more than 75 marks; and b) Less than 16 per cent of them score below 25 per cent.

If the class takes a test on at least three samples from the text, then the scores would make the vocalic cloze procedure more reliable.

The Rhythm Of Headlines

July 21, 2012

By Nirmaldasan


—  This article appeared in the January-March 2012 issue of Vidura, a quarterly journal of the Press Institute of India:

Whether it be news headline or feature headline, though one is usually factual and the other is often figurative, all headlines without exception have more to do with verse than with prose. Every headline is a poetic line. A badly scripted headline is prosaic, but an effective headline is rhythmic!

Many of the headlines that we read in newspapers allude to book or film titles and play with proverbial quotes or idiomatic expressions. Here are just three imaginary examples, with the allusions in brackets:

1. Murder on the Pandyan Express (Agatha Christie’s Murder On The Orient Express)

2. To err is humour (Alexander Pope’s ‘To err is human …’)

3. A tale of two children (Charles Dickens’s A Tale Of Two Cities)

To grasp the rhythm of the above headlines, we need to look at the three elements of the poetic line: syllable, stress and foot.


Though children are taught how to count syllables in school, they soon forget because they haven’t been told that pronouncing words is as important as getting the spelling right. Teachers themselves need to understand that it is the syllable that determines the subtle rhythm of English prose.

Each word consists of one or more syllables. According to the Advanced Learner’s Dictionary (8th edition), a syllable is ‘any of the units into which a word is divided, containing a vowel sound and usually one or more consonants’. In determining the number of syllables, we always go by the ear and not the eye. For example, the word ‘rhythm’ has no vowel letter but has one vowel sound; ‘soar’ has two vowel letters but only one vowel sound; and ‘beauteous’ has six vowel letters but only two vowel sounds. Based on the number of vowel sounds, words may be monosyllabic or disyllabic or polysyllabic.

By using contractions, the number of syllables may be reduced or increased for the sake of rhythm. The disyllabic phrase is not can be reduced to the monosyllabic isn’t. By the same token, the monosyllabic I’ve can be increased to the disyllabic I have.

Let us return to the imaginary headlines to do a syllable count:

1. Mur/der/ on/ the/ Pand/yan/ Ex/press (eight)

2. To/ err/ is/ hu/mour (five)

3. A/ tale/ of / two/ chil/dren (six)


Syllables combine to form words, phrases and clauses. In the process, some syllables acquire conventional emphasis called stress. Those syllables that are uttered lightly without stress are called slack syllables. The alternation of some stresses and some slacks creates rhythm.

Prefixes and suffixes usually are slack. So are the articles ‘a’, ‘an’ and ‘the’. Words that end in –ion such as ‘derivation’, ‘duplication’ and ‘faction’ take the stress on the penultimate syllable set in bold type. Some words have their conventional stress on the syllable preceding certain suffixes. Examples: diabolic, inimical, precious, initially, enmity.

Sometimes, a shift in the stress can alter meaning. In ‘Stress, Intelligibility and the English Language’ (Eclectic Representations, May 2011), Dr. Franklin Daniel writes: “Great care should be taken to pay particular attention to the role of variation of quality in those words which are distinguished from others by a shift of accent i.e. in the verb and noun/ adjective function. For example, the words ‘desert,’ ‘conduct,’ ‘convict,’ and ‘object’ should be stressed on the first syllables if they are used as nouns or adjectives and stressed on their second syllables if they are used as verbs.”

Now we may code the two types of syllable as ‘ta’ for slack and ‘tum’ for stress. Time to return again to our imaginary headlines to look at stress:

1. Murder on the Pandyan Express

(tumta tum ta  tumta tumta)

2. To err is humour

(ta tum ta tumta)

3. A tale of two children (or) A tale of two children

(ta tum ta tum tata (or) ta tum ta ta tumta)


A headline may be divided into feet just like a poetic line. Each foot usually has two or three syllables. Here are the basic patterns of the disyllabic foot: tatum or tumta or tata or tumtum. And here are the basic patterns of the trisyllabic foot: tatatum or tatumta or tumtata or tumtatum or tatata or tumtumtum. Any pattern may be accepted if it sounds rhythmic to the headline writer’s ear.

The distribution of stresses and slacks creates rising rhythm (tatum or tatatum) and falling rhythm (tumta or tumtata). It is also possible to think of a rising-falling combination called rocking rhythm (tatumta or tumtatum).

The four traditional patterns of a poetic line are the following:

Iambic: tatum tatum tatum tatum … (rising)

Trochaic: tumta tumta tumta tumta … (falling)

Anapaestic: tatatum tatatum tatatum tatatum … (galloping)

Dactylic: tumtata tumtata tumtata tumtata … (marching)

Rhythm doesn’t respect word boundaries. A foot may consist of syllables from many words. So when a headline is divided into feet, one must try to look for a recurring pattern. For the last time, let us go back to the imaginary headlines:

1. Murder / on the / Pandyan / Express (falling rhythm)

(tumta / tumta /  tumta / tumta)

2. To err / is humour (rising and rocking rhythm)

(tatum / tatumta)

3. A tale / of two / children (rising and rocking rhythm)

(or) A tale / of two chil/dren (rising and rocking rhythm)

(tatum tatum tata (or) tatum tatatumta)

Final tip

Headline writers need to read a lot of verse and make it a habit to hum any of the several tunes such as the famous Britannia Marie jingle ‘tumtatatum’ before they match sound and sense in their rhythmic headlines. Remember, it is mainly the rhythm that makes a headline persuasive and memorable. The Rhythm Of Headlines — tatum tatumta!

The Standard Text

June 23, 2012

By Nirmaldasan


A standard text aims for a Flesch Reading Ease score ranging from 60 to 70. In ‘A New Readability Yardstick’ of 1948, Rudolf Flesch presents a pattern of Reading Ease scores along with a seven-point scale: very difficult, difficult, fairly difficult, standard, fairly easy, easy and very easy.

In this article, we will look only at the standard text and the averages that go with it. According to Flesch, the average sentence length in words is 17 and the average number of syllables per 100 words is 147.  So a typical magazine such as digests will have about 17 words per sentence and 1.47 syllables per word.

In The Art of Plain Talk, Flesch writes: “First, sentence length is measured in words because they are the easiest units to count: you just count everything that is separated by white space on the page. But don’t forget that you might just as well count syllables, which would give you a more exact idea of sentence length: a sentence of twenty one-syllable words would then appear shorter than a sentence of ten one-syllable words and six two-syllable words. Keep that in mind while counting words.”

Since a more exact idea of sentence length is desirable, let us agree to count syllables instead of words. Then the standard text will have about 25 syllables per sentence [17 words per sentence x 1.47 syllables per word = 24.99 syllables per sentence].

The Strain Index, which I derived in 2005, is based on just this one variable: syllables per sentence multiplied by a factor of 0.3. For a standard text, the Strain Index = 0.3 x 25 = 7.5. Thus anyone with about eight years of schooling can understand a standard text.

Flesch’s Quick Rule-of-thumb Yardstick

May 21, 2012

By Nirmaldasan


In The Art Of Plain Talk, Rudolf Flesch says that simple language consists of ‘short sentences, few affixes, and many personal references’.  The average words per sentence (W), percentage of affixes (A) and percentage of personal references (P) are strung into a complicated expression: Difficulty score = (0.1338 * W) + (0.645 * A) – (0.0659 * P) – 0.75. Scoring system: up to 1 (very easy, 5th grade); 1 to 2 (easy, 6th); 2 to 3 (fairly easy, 7th); 3 to 4 (standard, 8th to 9th); 4 to 5 (fairly difficult, 10th to 12th); 5 to 6 (difficult, 13th to 16th); and 6 or more (very difficult, college graduate).

But in a postscript, Flesch presents a Quick Rule-of-thumb Yardstick (QRY): Difficulty score = [(A – P) / 2] + W. Scoring system: up to 13 (very easy), 13 to 20 (easy), 20 to 29 (fairly easy), 29 to 36 (standard), 36 to 43 (fairly difficult), 43 to 52 (difficult) and 52 or more (very difficult). But if we take a sample of 50 words instead of 100, then the calculation becomes simpler. Let ‘a’ and ‘p’ be the affixes and personal references in a sample of 50 words; and ‘w’, the average number of words per sentence. Then, difficulty score = w + a – p.

Affixes are extremely hard to spot, but Flesch gives a helpful list of affixes in the appendix. Personal references are easy to locate: names of people, personal pronouns that refer to people and a finite list of human-interest words.

Let’s apply the QRY on the following 50-word paragraph taken from a longer sample analysed by Flesch (the personal references are in capitals and the affixes are in brackets):

“WE shall plan, (with)in each countr(y) and (be)tween countr(ies), for more jobs and for mak(ing), trad(ing) and us(ing) more goods. (Al)so, WE shall plan to do (a)way with all ways of treat(ing) the trade of some countr(ies) bett(er) than that of others, and to low(er) tariffs and other trade barr(iers).”

Number of sentences = 2

Number of words = 50

w = 50/2 = 25

a = 14

p = 2

Difficulty score = w + a – p = 25 + 14 – 2 = 37 (fairly difficult)

For a reliable assessment, the QRY must be applied on at least 10 samples of 50 words each. “Some readers, I am afraid,” writes Flesch, “will expect a magic formula for good writing and will be disappointed with my simple yardstick. Others, with a passion for accuracy, will wallow in the little rules and computations but lose sight of the principles of plain English. What I hope for are readers who won’t take the formula too seriously and won’t expect from it more than a rough estimate.”

Longer The Sentence, Greater The Strain

April 30, 2012

By Nirmaldasan


—  This article appeared in the October-December 2011 issue of Vidura, a quarterly journal of the Press Institute of India: —

All plain English experts echo Robert Gunning’s advice: “Keep sentences short.” The longer the sentence, the greater the strain on the reader. Harold Evans, author of Newsman’s English, writes: “The real seduction of the simple sentence is that taken by itself, it is short and it is confined to one idea. The real trouble with so many compound-complex sentences is that they have to carry too many ideas.”

Martin Cutts, in the Oxford Guide To Plain English, has this to say: “More people fear snakes than full stops, so they recoil when a long sentence comes hissing across the page.” He recommends an average sentence length of 15-20 words.

Jyoti Sanyal, author of Indlish (the book for every English-speaking Indian) writes: “Based on several studies, press associations in the USA have laid down a readability table. Their survey shows readers find sentences of 8 words or less very easy to read; 11 words, easy; 14 words fairly easy; 17 words standard; 21 words fairly difficult; 25 words difficult and 29 words or more, very difficult.” We will return to this readability table a little later.

Rudolph Flesch, creator of the Flesch Reading Ease formula, studied the readability of various magazines: Scientific (very difficult), Academic (difficult), Quality (fairly difficult), Digests (standard), Slick-fiction (fairly easy), Pulp-fiction (easy) and Comics (very easy). He counted the number of syllables per 100 words and measured the average sentence length in words. He put these two variables into a complex formula in an article titled ‘A New Readability Yardstick’, published in the 3 June 1948 issue of the Journal of Applied Psychology.

Now words may be monosyllables (short), disyllables (medium) or polysyllables (long). So an average sentence comprising 17 long words may still be a strain on the reader. In early 2005, when I was a senior sub-editor with The Hindu, I realized that the best way to overcome this problem was to measure the sentence in syllables.

While it is easy to count words, counting syllables may not be all that easy. But with a little practice, anyone can count syllables swiftly. Remember that it is the syllable that determines the rhythm of prose. The syllable is the basic unit of utterance. Each syllable has only one vowel sound. ‘Television’ has four syllables; ‘Internet’ has three; ‘Radio’ has two; and ‘Print’ has only one!

Flesch writes: “If in doubt about syllabication rules, use a good dictionary. Count the number of syllables in symbols and figures according to the way they are normally read aloud, e.g. two for $ (‘dollars’) and four for 1918 (‘nineteen-eighteen’).”

The readability table, which we have already seen, may be better expressed in terms of syllables. Sentences of 10 syllables or less are very easy to read; 14 syllables, easy; 19 syllables, fairly easy; 25 syllables, standard; 33 syllables, fairly difficult; 42 syllables, difficult; and 56 syllables or more, very difficult.


Average sentence length (words)

Average sentence length (syllables)


of style


29 or more

56 or more

Very difficult








Fairly difficult








Fairly Easy






8 or less

10 or less

Very easy

But this table, derived from a simplification of Flesch’s observation of a pattern of ‘Reading Ease’ scores, does not identify the level of the readers for whom a text may be easy or difficult.

So here follows a formula that measures the readability of a text on a scale of 1 to 17+ years of schooling. The Strain Index, which I evolved as an alternative to Gunning’s Fog Index, is a syllable-counting formula. Unlike many a readability formula which intimidates the user with a complex equation, the Strain Index is very easy to use. The plain English expert William DuBay called it ‘remarkably simple’.

In its popular form, Strain Index = S3 /10 (S3 is the number of syllables in three sentences). Let us take an example:

‘I just don’t agree with this hoo-ha about short sentences and simple words,’ said PM. ‘If I can write long sentences well, why shouldn’t I?’ Nor does PM agree with the advice on the use of everyday words.

That passage comes from an article titled ‘Shrink Or Sink’ in Sanyal’s Indlish. The sample has 53 syllables. So, Strain Index = 53 / 10 = 5.3 years of schooling; a Standard V student can understand what Sanyal has written.

But to get a better estimate of the readability of a text, one must test more three-sentence samples or choose a long sample. In its non-popular form, Strain Index = S30 / 100 (S30 is the number of syllables in 30 sentences). This is the same as taking 10 three-sentence samples and calculating the average.

It is possible, though not necessary, to apply the formula to a full text consisting of ‘n’ sentences. In this case, the general form of the Strain Index = 0.3 x (Sn / n), in which Sn is the number of syllables in ‘n’ sentences. But always remember that any readability formula should only be applied on well-written texts.

Measuring Readability

March 30, 2012

By Nirmaldasan

— This article appeared in the July-September 2011 issue of Vidura, a quarterly journal of the Press Institute of India:

Linguistic skill is a necessary condition for the creation of readable texts. Helen Keller’s The Story Of My Life, Mark Twain’s The Adventures Of Huckleberry Finn and Sir Arthur Conan Doyle’s Tales Of Unease are but three examples that display felicity of language. Like these literary classics, there are a number of non-literary and yet very readable texts such as magazine features, news reports and product manuals.

However, a well-written text may not be readable for all people. William DuBay’s definition of readability as ‘the ease of reading created by a literary style that fits the reading level of the audience’ underscores the need for matching text with the appropriate audience. A scoring system of 1 to 17+ consisting of the years of schooling is perhaps the most effective, though not perfect, way of grading texts. Suppose we walk into a Standard VII classroom and administer a reading test, the results may show that not all students read at level 7. Some students may be below average with a reading level of 5 or even lower; and some may be above average with a reading level of 9 or even higher.

Research has shown that the average reader in theU.S.and theU.K.has eight years of schooling. Since newspapers and magazines are usually meant for a general audience, it is ideal if the reports and features are tailored for readers with eight years of schooling. As this may not be possible given the type of serious content, writers should at least aim for a score of 10 or less.

Though there are a number of factors that make a story readable, most readability formulae depend on two variables: vocabulary and sentence length. Robert Gunning’s Fog Index may be calculated in a few easy steps:

  1. Average words per sentence (AWS)
  2. Percentage of polysyllables (P), excluding capitalized words, easy compound words and disyllabic verbs made trisyllabic by adding ‘-es’ or ‘-ed’.
  3. FI = 0.4 x (AWS + P)

Here is an example from Gunning: “Typing errors are easy to make in transposing code numbers of appropriations. We suggest each Division Planning Office set up a file of yellow tickets showing all authorized unit and item numbers. Then each can make a daily check of construction charges before sending time distribution sheets to the Accounting Department.” Three sentences, 51 words and 5 polysyllables. Therefore, AWS = 51/3 = 17; P = (5/51) x 100 = 9.8; and Fog Index = 0.4 x (17 + 9.8) = 10 years of schooling.

One must remember that a formula is a shortcut to assess the readability of a text. It should be applied on only well-written texts. If the text is poorly crafted, it must be revised or rewritten according to the time-tested principles of effective writing. Then the text may be tested with the Fog Index.

Here are some tips for clear writing:

  1. Keep the sentence length under 30 syllables
  2. Prefer the active voice
  3. Use a conversational style
  4. Limit the number of clauses
  5. Avoid sentence fragments
  6. Use words that are in everyday use
  7. Omit needless words
  8. Write with nouns and verbs
  9. Use adjectives of kind, not of degree
  10. Avoid too many negatives

 Suggested Reading

Plain Language In Plain English: Ed. Cheryl Stephens, Plain Language Wizardry, Vancouver, 2010.

Oxford Guide To Plain English: Martin Cutts,OxfordUniversity Press, New York, 3rd edition, 2009.

The Gettysburg Approach To Writing & Speaking Like a Professional: Philip A. Yaffe, INDI Publishing Group,Phoenix(Arizona), 2009.

Smart Language (Readers, Readability, and the Grading of Text): William DuBay, Impact Information, Costa Mesa (California), 2007.

Unlocking Language (The Classic Readability Studies): William Dubay, Impact Information, Costa Mesa(California), 2007.

Good Style(Writing For Science And Technology): John Kirkman, Spon Press, London, 2nd Indian reprint, 2007.

Indlish (The Book for Every English-Speaking Indian): Jyoti Sanyal, Viva Books, New Delhi, reprint, 2007.

Assessing Reading: J. Charles Alderson,Cambridge University Press, New York, 5th printing, 2005.

The Elements Of Style: William Strunk Jr. and E.B. White, Longman Publishers, New York, 4th edition, 2000.

How To Take The Fog Out Of Writing: Robert Gunning, Taraporevala Publishing Industries, Bombay, 1st Indian reprint, 1979.

The Complete Plain Words: Sir Ernest Gowers, The English Language Book Society and Penguin Books, reprint, 1969.

Grading The Technical Text

February 20, 2012

By Nirmaldasan


There is no reason why the technical text, whether it is a user manual or a project report, should have long sentences. But the technical text may have long words if necessary, as technical terms expressing complex ideas tend to be long. If this proposition be granted, then here is a new formula in two variables for grading the technical text.

The length of a sentence helps determine the syntactic difficulty. To measure semantic complexity, here is a new variable. Polysyllabic excess is the number of syllables more than three per word. The word ‘technicality’ has five syllables and so the polysyllabic excess is two. Obviously, the polysyllabic excess of monosyllables, disyllables and trisyllables is zero.

Let W4 be the number of words and pX4 the polysyllabic excess in a sample of four sentences. The Grade Level (GL), which indicates the number of years of schooling to understand a given technical text, is measured by this formula: GL = (W4/10) + pX4.

Let us apply the formula on the following sample: “Technical writing is the art of communicating technical knowledge to a specified audience. The topic may be as simple as a recipe or as complex as an integral equation. Some of the common technical documents are business letters and user manuals. The nature of the subject and audience determines the style and structure in which technical content is packaged.”

The number of words W4 = 59. Though there are a number of polysyllables, the only word with polysyllabic excess is the pentasyllabic ‘communicating’. So pX4 = 2. Therefore, GL = (59/10) + 2 = 7.9 years of schooling. Since the average reader has about eight years of schooling, the sample that comes from my other article titled ‘The Technical Text’ targets an average audience.

Several factors make a technical text readable. The length of a sentence and polysyllabic excess are just two of them. And this new formula, I hope, will take technical writers closer to their audience.