February 10, 2021

By Nirmaldasan

By the time a child has completed three years of age, it would have, with or without parental assistance, picked up quite a number of words from the world around it. From then on, even as the child continues randomly to increase its vocabulary, it must become familiar with the top-ranked 8000 words from any available corpus. These are by far the most frequent words in everyday use.

My prescription for enriching the child’s vocabulary ideally over a period of five years is not arbitrary. It has the support of the logarithmic scale. I found a nice equation: AC (age completed) = LWR (logarithm of word rank in a corpus). Here is a table for quick reference:

If Prof. Kev Nair looks at my ideal prescription, he is sure to disagree. In his excellent book The Complete Fluency Words, he writes: “You can’t achieve a complete mastery of a word at one fell swoop — with one single and sudden learning action. Vocabulary mastery is a gradual process, and a never-ending one at that.”

With a smile, I have to agree with him. When I was eight years old, my vocabulary was a far cry from my ideal prescription. Even now at the middling and meddling age of 55, I cannot claim a vocabulary of the 7373 words that Prof. Nair prescribes in his landmark work.

However, I am reluctant to abandon the logarithmic scale. Perhaps it may not be unrealistic for seven-year-old children to become familiar with the top-ranked 3000 words in any corpus. The remaining 5000 words may be picked up by children in a never-ending gradual process. I would like to hear parents and children say, “That’s a joyful compromise!”

March 11, 2020

nirmaldasan@hotmail.com

Here’s a rule of thumb: the grade level of a text is approximately half the sentence length. Usually, a sentence is a mix of short and long words and so the thumb-rule tends to yield a higher grade level as it ignores the length of words.

To solve this problem, I suggest a new readability variable based on the letter count of words. Any word with three or more letters may be called a polyliteral word. Half the polyliteral sentence length may be a better estimate of text difficulty. Interestingly, even the shortest polyliteral word may be monosyllabic (sun) or disyllabic (ego) or polysyllabic (UNO).

Let’s move on to the Polyliteral Readability Index (PRI). Let P5 be the number of polyliteral words in five sentences. Then, PRI = P5 / 10. This formula grades texts on a scale of 1 to 17+ years of schooling.

To demonstrate the PRI, let’s take the first five sentences of this article. Here, P5 = 63. Therefore, PRI = 63 / 10 = 6.3 years of schooling.

I randomly tried PRI on a few other texts. I am quite satisfied with its performance. However, I suggest that you test it yourself on graded texts, instead of taking my word for it. Thank you.

March 19, 2019

By Nirmaldasan

Of the two factors in the New Dale-Chall Readability Index of 1995, the number of complete sentences (S) in a sample of 100 words is arguably the simplest of all variables. It accounts for the syntactic difficulty of any text. I used it as the sole variable in the ‘Simplicity Score of Business Writing’ (October 2014).

A readability formula also needs a factor to measure semantic complexity. This may be ASW (average syllables per word) or ALW (average letters per word) or percentage of unfamiliar or polysyllabic words. Without being specific (for the time being), I would like to say that semantic complexity is measured by the percentage of difficult words (D).

To formulate a Generalised Readability Index (GRI), we also need a readability constant (r). If by some means we know the expected percentage of difficult words (EPD), then r = 50 – EPD. If EPD is 0, then r = 50; and if EPD is 50, then r = 0.

Then, GRI = (D + r) / S

This formula measures the grade level of texts on a scale of 1 to 17+ years of schooling.

I have not offered any proof. But, as they say, the proof of the pudding is in the eating. A generalized formula is useless unless we know what exactly are D and r. We now need to become specific. I’ll demonstrate the utility of GRI in deriving new readability formulas.

Example 1: Let the number of uncommon words U be a measure of semantic complexity. The expected percentage of difficult (uncommon) words EPD = 50 since the 100 commonest words account for 50% of any text. For clarification, do take a look at my article ‘The Lemma Readability Index’ (January 2016). So, r = 50 – EPD = 50 – 50 = 0. Therefore, in a sample of 100 words, the Index = (U + 0) / S = U / S. But this is simply the average number of uncommon words in a sentence. Thus the formula is just another form of the Lemma Readability Index.

Example 2: The average syllable has three characters (one vowel letter and two consonants). A disyllabic word may have six characters; and a polysyllabic word, more than six characters. So any word with more than six characters may be called a long word. Let the percentage of long words L(>6) be a measure of semantic complexity. The EPD may be calculated from the distribution of word lengths presented in Peter Norvig’s article titled ‘English Letter Frequency Counts: Mayzner Revisited or ETAOIN SRHLDCU’, which is available at http://norvig.com/mayzner.html

Interestingly, words of six or less characters account for 75%. So EPD = 100 – 75 = 25. Now, r = 50 – EPD = 50 – 25 = 25. Therefore, in a sample of 100 words, the Index = (L(>6) + 25) / S.

Example 3: The average word has five characters (two vowel letters and three consonants). Any word with more than five characters may be called a long word. Let the percentage of long words L(>5) be a measure of semantic complexity. Consulting Peter Norvig’s distribution of word lengths, we notice that words of five or less characters account for 67%. So EPD = 100 – 67 = 33. Now, r = 50 – EPD = 50 – 33 = 17. Therefore, in a sample of 100 words, the Index = (L(>5) + 17) / S.

Application: Let us test the three new indices on the following text:

“The first batch of students of the Certificate in Online Journalism programme announced that they are online at the viva voce examination on Saturday (29 November 2014). They have created for themselves a website, a blog and a twitter account.”

Words = 40 and Sentences = 2

Uncommon words = 17 (batch, students, Certificate, Online, Journalism, programme, announced, online, viva, voce, examination, created, themselves, website, blog, twitter, account)

Words of more than six characters = 13 (students, Certificate, Journalism, programme, announced, examination, Saturday, November, created, themselves, website, twitter, account

Words of more than five characters = 15 (students, Certificate, Online, Journalism, programme, announced, online, examination, Saturday, November, created, themselves, website, twitter, account)

From the above data, we get:

U% = (17/40) * 100 = 42.5

S% = (2/40) * 100 = 5

U-Index = (U / S) = 42.5 / 5 = 8.5 years of schooling.

L(>6)% = (13/40) * 100 = 32.5

L(>6)-Index = (L(>6) + 25) / S = (32.5 + 25) / 5 = 11.5 years of schooling.

L(>5)% = (15/40) * 100 = 37.5

L(>5)-Index = (L(>5) + 17) / S = (37.5 + 17) / 5 = 10.9 years of schooling.

Note: It is better to take a sample of 100 words and thus avoid the calculation of percentages.

## Kiran Thakur’s Rule Of Thumb

February 11, 2019

By Nirmaldasan

Newspapers in India and abroad are not against publishing reader-friendly reports. They do understand the principles of good communication: accuracy and clarity would boost circulation; and brevity would gain space for more news or advertising. Had they time enough, their leads would not exceed plain English expert Jyoti Sanyal’s maximum sentence length of 25 words. Had they time enough, their leads would be free of unnecessary words thanks to Thakur’s rule of thumb.

Plenty of ineffective leads and unfamiliar words may be found in any newspaper. But it is better for reporters to sharpen their writing skills than to blame the deadline for their incompetence. Prof. Dr. Kiran Thakur’s Newspaper English, published by Vishwakarma Publications in January 2019, addresses this problem. In a UGC-funded study, the author sampled 20 ineffective leads, rewrote them and subjected the original and rewritten versions to readability tests. The rewritten versions performed better than the original versions in terms of grade score. He also placed both the versions before 620 Master’s students across various cities in India; the respondents were not told which was which. A majority of them preferred the rewritten versions.

The author applied what has now come to be known as Thakur’s rule of thumb to the original leads and rewrote them in obedience to the principles of plain English. This reviewer examined the rewritten versions and found that Thakur had fulfilled Sanyal’s sentence length guideline in all but one: “Nine jawans of the anti-Maoist Special Operation Group (SOG) were killed and eight others seriously injured in a landmine blast triggered by Maoists in Koraput district of Orissa on Sunday morning.” Since I had enough time on my hands, I rewrote Thakur’s 31-word lead into a 24-word lead: “Nine jawans of the Special Operation Group (SOG) were killed in a landmine blast triggered by Maoists in Orissa’s Koraput district on Sunday morning. Eight others were seriously injured in the blast.” Sanyal would be pleased had he been alive; and so would Thakur because I have taken his rule of thumb seriously.

Thakur also picked up 40 words from newspapers and presented 20 each in two questionnaires. He writes: “The study showed that the journalists often used words that were not easy to understand for the common readers. These included words from foreign cultures. They often used jargon, technical terms, and words common readers do not use.” In three tables, the author shows how to avoid roundabout phrases, longer words and verbosity.

In Newspaper English we hear the voices of plain English experts (especially in the fifth chapter titled ‘Clear, plain and simple language’). The book has a foreword by the Research Director of Plain Language Commission, Martin Cutts. He writes: “Using numerous examples of stories from India’s leading newspapers, the book shows how their readability can be increased. It’s a most valuable source of ideas and research and I warmly commend it to you.”

The book is priced INR 225 and \$9.00 U.S. For copies, email the author: drkiranthakur@gmail.com

## A Communication Formula

November 5, 2018

By Nirmaldasan

Communication is a human skill and therefore cannot be reduced to a single formula. It would even be difficult to list all the communication elements. The best that we can do is to choose a few elements and put them together into a formula such as Burkey Belser’s Communication Index. Many years ago, I presented a paper about this index.

Being a Communication Consultant, I now would like to present my own formula C = A-F2According to the formula, Communication (C) consists of the following elements: Accuracy (A), Brevity (B), Clarity (C), Diplomacy (D), Ecstasy (E), Frequency (F) and Flow (F). These elements have been put together as C = A-F2 to resemble the circle’s formula and, more importantly, Albert Einstein’s E = mc2.

Before we look at each of these elements, here are a few points about C which consists of the four basic skills LSRW (listening, speaking, reading and writing). Listening and reading are receptive skills; speaking and writing are productive skills. It is good to remember that C occurs in specific contexts in which the aforementioned seven elements come into play.

Accuracy

This is a key element in communication. Especially in technical texts, even a slight inaccuracy can result in a butterfly effect. The 5W&1H, namely, who-what-when-where-why-how must be accurately described. In journalistic texts, approximations may be fine but still they must be close to accuracy.

Brevity

One good argument for brevity is: fewer the words, fewer the chances of miscommunication. Though speaking may consume more words, one must be terse in writing. Martin Cutts says that the average sentence has 15-20 words. And I would recommend an average of 25-33 syllables. Philip Yaffe neatly summed it up: long as necessary, short as possible.

Clarity

Sometimes brevity may have to be compromised for the sake of clarity. A short word like fane may not be as clear as its longer synonym temple. An extra sentence to make things clear is always welcome. However, it may not always be wise to clearly call a spade a spade. This takes us to the next element diplomacy.

Diplomacy

A great communicator knows that the smooth is to be preferred to the blunt. There’s the story of the astrologer who undiplomatically told the king that his relatives would die sooner than he would. Diplomatically, he could just have said that the king would live longer than his relatives.

Ecstasy

I have not read Jean Baudrillard’s The Ecstasy of Communication. But I know for sure that communication cannot be effective without this element. It is not only when we are sharing a joke or a humorous anecdote that we must be ecstatic; the words that spring from our lips must carry the bubbles of colorful delight. If there is no ecstasy, then the reader or listener derives no pleasure from the communication.

Frequency

Communicators cannot just communicate something and move on. They must reinforce the ideas again and again. In advertising, there used to be a theory called reach at 3+ which means that an advertisement must be presented at least thrice before it impresses itself in the minds of the prospective consumers.

Flow

This element is my favorite and I would go to the extent of saying that it overrides all the other elements. The rhythm, the flow is everything. In school, when the mathematics teacher asked me, “What is sin60?” I only heard it as ‘something 60’ as I was chatting with my classmate in the last row. But I arose to say root three by two! The teacher was surprised of course! I didn’t even know that was the right answer. I would have said ‘root three by two’ to every trigonometric question.  But many years later I realized why I remembered only ‘root three by two’ and I even spoke about it in a talk titled ‘The Mathematics of Experience’ modeled after Albert Einstein’s ‘The Geometry of Experience’. The answer I gave to sin60 was not a number but simple English. Say it aloud: ‘root three by two’, ‘root three by two’, ‘tum ta ta tum’, Britannia Britannia Marie!

That in short is C = A-F2. Now why did Einstein write E = mc2 when he should have written E = c2m? The constant should come first and only then the variable as in the circle’s formula.  Einstein like me wants the rhythm, the flow!  Quod Erat Demonstrandum.

## Grading The Narrative Text: The RANT Index

January 10, 2018

By Nirmaldasan

The narrative text has a story to tell. And the story may be real or imaginary. Children’s Literature is a storehouse of narrative texts. In this article, I present a new listenability formula called Read-Aloud Narrative Text (RANT) Index. The higher the ranting, the greater the listening difficulty. To confess, I wasn’t really thinking of the narrative text when I created this formula. I realized that the formula wouldn’t work well with other types of text such as the persuasive and the technical texts. Suddenly it struck me that RANT would work quite well for grading the narrative text.

Let us look at three features of a narrative text that increase listenability: 1. Short sentences, 2. Direct speeches and 3. Proper nouns and proper adjectives. There may be plenty of other features such as rhythm and short words, but I have chosen only three features because I found that they could be measured simply by counting one variable: the number of capitalized words (C).

It is a convention that every sentence and every direct speech should begin with a capital letter. Needless to say, proper nouns and proper adjectives are always capitalized. So C is a single variable that combines the strength of at least three variables. The pronoun I is always capitalized. Also capitalized are personifications.

RANT Index = 50 / C%

The scoring system is a scale of 1 to 8+ years of schooling needed for readers to comprehend a text when it is read aloud to them. Any text that scores more than 8 will be assigned a grade of 8+.

Let us apply the formula on two samples. The capitalized words are set in bold for easy counting.

Sample 1: From Mark Twain’s Huckleberry Finn

You don’t know about me without you have read a book by the name of ‘The Adventures of Tom Sawyer’; but that ain’t no matter. That book was made by Mr. Mark Twain, and he told the truth, mainly. There was things which he stretched, but mainly he told the truth. That is nothing. I never seen anybody but lied one time or another, without it was Aunt Polly, or the widow, or maybe Mary. Aunt Polly — Tom’s Aunt Polly, she is — and Mary, and the Widow Douglas is all told about in that book, which is mostly a true book, with some stretchers, as I said before.

Words = 110; C = 24; C% = 21.81. Therefore, RANT Index = 50 / 21.81 = 2.29 years of schooling

Sample 2: From Dr. Spencer Johnson’s Who Moved My Cheese?

Later that same day, Hem and Haw arrived at Cheese Station C. They had not been paying attention to the small changes that had been taking place each day, so they took it for granted their Cheese would be there.

They were unprepared for what they found.

What! No Cheese?” Hem yelled. He continued yelling, “No Cheese? No Cheese?” as though if he had shouted loud enough someone would put it back.

Who moved my Cheese?” he hollered.

Finally, he put his hands on his hips, his face turned red, and he screamed at the top of his voice, “It’s not fair!”

Words = 102; C = 22; C% = 21.56. Therefore, RANT Index = 50 / 21.56 = 2.31 years of schooling.

The general form of RANT Index = W / (2C), where W is the number of words and C is the number of capitalized words in any sample. This form avoids the calculation of percentage. Another way to avoid it would be to choose a sample of 100 words and just count C.

I think there is a clear case (pun intended) for capitalized words. Robert Gunning, creator of the Fog Index, excluded capitalized words from the count of hard words (polysyllables). So capitalized words have to be included in a count of easy words.

Let me not rant anymore.

December 9, 2017

By Nirmaldasan

James N. Farr, James J. Jenkins, and Donald G. Paterson suggested a New Reading Ease Index in their article of 1951 titled ‘Simplification of Flesch Reading Ease Formula’. They replaced the syllabic count in the Flesch formula with a monosyllabic count. This irked Rudolph Flesch himself and the readability expert George Klare. The creators of the new formula responded to the criticism and produced fresh data to show that both the formulae yielded ‘substantially equivalent results’.

Since the number of monosyllables is fewer than the number of syllables in any passage, the Farr-Jenkins-Paterson (FJP) formula is a fine simplification enjoying a high correlation of 0.93 with the Flesch formula.

This is how it works. Take a sample of 100 words from the passage to be tested for readability. Count the number of monosyllabic words (M). Also calculate the average words per sentence (AWS). Substitute the values in the formula:

FJP Reading Ease Index = 1.599*M – 1.015*AWS – 31.517

The formula yields a score which may be converted to Grade Levels by looking up a conversion table – the same that is used for the Flesch formula. Since the scoring system is the same, it becomes easy to compare the old and the new formulae. The authors tested the formula and found ‘perfect agreement for 237 of the 360 paragraphs’ with the Flesch formula. They say: “There is a disagreement of only one step for 119 paragraphs,” and add: “In only four instances is there a disagreement of two steps (in one instance the old index was ‘Fairly Easy’ and the new was ‘Fairly Difficult’, and in the other three instances the old index was ‘Standard’ and the new index was ‘Difficult’).”

This formula, in spite of all its decimal points, is not as intimidating as the Flesch formula. However, the use of the conversion table along with the formula is certainly a trouble that needs to be eliminated with little expense to accuracy.

Readability critics may say counting monosyllables is ‘baby talk’ or ‘primer style’. What will they say about counting non-monosyllables? Surely, they have to agree that this is neither ‘baby talk’ nor ‘primer style’. Then show them this exact equation: M (monosyllables) + N (non-monosyllables) = W (words). That may silence them.

But for those who wish to use the FJP formula without the conversion table, here is my simplification called Direct FJP Grading = 0.2*AWS + 0.3*N – 4.

AWS is the average words per sentence and N is the number of non-monosyllabic words in a passage of 100 words.

## Reviewing The Strain Index

November 15, 2017

By Nirmaldasan

I created in 2005 the strain index, a readability formula that grades texts on a scale of 1 to 17+ years of schooling. I first wrote a short article about it and later, on the readability expert William DuBay’s advice, tested the formula on graded passages. In 2007, I received M.Phil from the Madurai Kamaraj University for my research (under the guidance of Dr. Nirmal Selvamony) titled ‘A Quantitative Analysis of Media Language’ in which I had demonstrated the validity and the application of the strain index. Subsequently, I created this weblog Readability Monitor to promote the formula.

Ten years later, in October 2017, Lambert Academic Publishing published my dissertation. So it is time to review the strain index. I do not know how many people use the formula. I have done my best to promote the strain index not only in my writings but also in my classes. I wrote ‘The Strain Index: A New Readability Formula’ for Journalism Online, and The Hoot accepted my humble request to reproduce the article on its website. I later wrote ‘Longer The Sentence, Greater The Strain’ for Vidura, a journal of the Press Institute of India. These and other articles about the strain index are all available in this weblog Readability Monitor.

So how do I persuade people to buy my book? The blurb says: “A Quantitative Analysis Of Media Language offers an alternative readability formula called Strain Index to the most popular Fog Index of Robert Gunning. Both the formulas were compared by testing them on graded English textbooks. The Strain Index enjoys a very high correlation of 0.97 with the Fog Index. The advantage of the Strain Index is that it uses only one variable instead of two employed by the Fog Index. The readability expert William DuBay called the Strain Index remarkably simple.”

For those who just want to know what the formula is and how to use it, there is obviously no need to buy. But those who are into readability research – big names like Stylewriter and Lexile – may have the insatiable curiosity to find out what I have done with my formula and what the formula does. University libraries may also find my book a welcome addition. I would especially request scholars who have substantial research funds to buy this little book of mine and make me richer by a few Euros.

October 31, 2016

By Nirmaldasan

Edgar Dale and Joseph O’Rourke in ‘Living Word Vocabulary’ (LWV) graded thousands of words using what I would like to call the Graded Survey Method. According to this method, the grade of a word is the lowest grade in which at least 67% of the students found it familiar. In the ‘Plain English Lexicon’ Martin Cutts writes about the LWV: “It covered some 44,000 word meanings and involved 320,000 students. For each word, roughly 200 students were tested using a 3-choice multiple-choice test.”

The work is useful and impressive. But I was intimidated by the amount of labour involved. “There must be a shortcut,” I thought, and found one too.

The grading of words involves two simple steps: 1. Identifying the given word as familiar or unfamiliar; and 2. Counting syllables of a familiar word or counting letters of an unfamiliar word.

The first step can be easily accomplished by using any one of the following methods:

• Group Method: Present the word to a group of five persons. The word is considered familiar if four out of five think so. This method may also be called the 80% method.
• Martin Cutts Method: Look at the frequency of the word in the British National Corpus. “To give a very rough guide, I judge that words scoring more than about 1,200 are fairly common,” says Martin Cutts in the ‘Plain English Lexicon’. If the word does not occur in the corpus or if its frequency is less than 1200, then the word is considered unfamiliar.
• List Method: A word is considered familiar if it occurs in any list of familiar words. We may use Edgar Dale’s List of 3000 familiar words or Kev Nair’s List of Maximum General Utility Words (2788).
• Media Method: A word is considered familiar if it is frequently heard on radio and television or frequently found in newspapers and magazines.
• Subjective Method:  If the word is familiar to me, then it is assumed that the word is familiar to others too. A better version of this method is that if I think the word is familiar to all, then it must be so.

The second step takes no time and little effort. Here we go!

• The Grade Level of Familiar Word (GLFW) = S (number of syllables of the word)
• The Grade Level of Unfamiliar Word (GLUW) = L (number of letters of the word)

NOTE: The Grade Level is the number of years of schooling required to understand a text. Usually, the scale is 1 to 17+ years of schooling.

January 17, 2016

By Nirmaldasan

The Dale-Chall readability formula uses a list of 3000 familiar words. This formula has a very high correlation with text difficulty. However, readability formulae that do not use a list such as Robert Gunning’s Fog Index are more popular as they are easy to apply. But there is no reason to discard the list as it tests each word of a text. Let us look at a shorter list of 100 commonest words, which typically covers 50% of the over two billion words in the Oxford English Corpus. This list in rank order is found in an article titled ‘The OEC Facts About The Language’: http://www.oxforddictionaries.com/words/the-oec-facts-about-the-language

The list uses the idea of lemmas, ‘a lemma being the base form of a word’.  An alphabetical arrangement of the words would help us use the list for measuring readability.

Commonest Lemma List

a  about  after  all  also  an  and  any  as  at  (10 lemmas)

back  be  because  but  by  (5 lemmas)

can  come  could  (3 lemmas)

day  do  (2 lemmas)

even  (1 lemma)

first  for  from  (3 lemmas)

get  give  go  good  (4 lemmas)

have  he  her  him  his  how  (6 lemmas)

I  if  in  into  it  its  (6 lemmas)

just  (1 lemma)

know  (1 lemma)

like  look  (2 lemmas)

make  me  most  my  (4 lemmas)

new  no  not  now  (4 lemmas)

of  on  one  only  or  other  our  out  over  (9 lemmas)

people (1 lemma)

say  see  she  so  some  (5 lemmas)

take  than  that  the  their  them  then  there  these  they  think  this  time  to  two  (15 lemmas)

up  us  use (3 lemmas)

want  way  we  well  what  when which  who  will with  work  would  (12 lemmas)

New Formula

The Lemma Readability Index (LRI) measures texts on a scale of 1 to 17 years of schooling. The LRI is the number of words per sentence not in the Commonest Lemma List. Take a sample of n sentences from a text. Count the Words Not in List (WNL). Then, LRI = WNL/n.

Counting Guidelines

1. Do not count proper names (names of people, places, days, months, organisations … )
2. Do not count numerals, symbols, abbreviations, acronyms
3. Do not count lemmas that are in the list
4. Do not count words that are grammatically associated with the lemmas in the list. Some examples:
5. Since be is in the list, do not count being, am, are, is, was, were
6. Since take is in the list, do not count taken, taker, takers, takes, taking, took
7. Since new is in the list, do not count newer, newest, newly, news, newsy
8. Since time is in the list, do not count timed, timely, timer, times, time’s, timing
9. Do not count compound words, if each part is in the list. Some examples:
10. Since some and how are in the list, do not count somehow
11. Since any and way are in the list, do not count anyway
12. Since an and other are in the list, do not count another
13. Since good and will are in the list, do not count goodwill
14. Count compound words as many times as they appear even if one part is not in the list. Some examples:
15. Since how is in the list but ever is not, count however
16. Since will is in the list but free is not, count freewill
17. Count every single word (even repetitions) which is neither in the list nor grammatically associated with the lemmas in the list

These guidelines solve most of the counting problems. But one is likely to come across a number of deceptive words. For instance, a and do are in the list, therefore ado begs not to be counted (fifth guideline). However, ado has to be counted because it is not a compound word. Again, take the words better and more. Though both are not in the list, one is tempted to exclude them from the count because of semantic reasons. Better is related to good, and more is related to some and most. Resist the temptation and count every deceptive word.  Remember that if we do not count more, then we cannot also count moreover. Let’s not quibble.

Application

Let us apply the formula on the following paragraph:

“The first batch of students of the Certificate in Online Journalism programme announced that they are online at the viva voce examination on Saturday (29 November 2014). They have created for themselves a website, a blog and a twitter account.”

Let us follow the counting guidelines.

1. Proper names are not counted (Saturday, November)
2. Numerals are not counted (29, 2014)
3. Lemmas in the list are not counted (The, first, of, in, that, they, at, on, have, for, a, and)
4. Words grammatically associated with the lemmas are not counted (are)
5. Compound words, if each part is in the list, is not counted (— )
6. Compound words even if one part is not in the list is counted (Online, online, themselves, website)
7. Every single word which is neither in the list nor grammatically associated with the lemmas in the list is counted (batch, students, Certificate, Journalism, programme, announced, viva, voce, examination, created, blog, twitter, account)

In the order of appearance, here is the list of words not in the list: batch, students, Certificate, Online, Journalism, programme, announced, online, viva, voce, examination, created, themselves, website, blog, twitter, account. WNL = 17.

Since the number of sentences in the sample is 2, LRI = WNL/n = 17/2 = 8.5 years of schooling.

Comparison

Let us compare the LRI with the Fog Index (FI).

Average Words per Sentence (AWS) = 40/2 = 20

Percentage of hard words (P) = (1/40)*100 = 2.5 [Not all polysyllables are hard. In this example, Certificate and Journalism are not counted as hard because they are part of the name of a programme. The only hard word is examination]

FI = 0.4*(AWS+P) = 0.4*(20+2.5) = 0.4*22.5 = 9 years of schooling.

The LRI compares very well with the FI. One needs to test the validity of LRI on at least a 100 samples. Please go ahead and put the LRI to the test. Thank you.

Related Articles