Three samples are usually considered to be enough for testing the readability of a text. Whether the chosen sample is representative depends on the sample size. Interestingly, different formulae require different sample sizes.
The Winnetka formula, created by Mabel Vogel and Carleton Washburne in 1928, requires a sample of 1000 words. The Lorge Readability Index (1944), Flesch Reading Ease (1948), Gunning’s Fog Index (1952) and Fry Graph (1977) require a sample of about 100 words. Harry McLaughlin’s SMOG (1969) uses a 30-sentence sample. The FORCAST formula of FORd, CAylor and STicht (1973) needs a sample of 150 words. The new Dale-Chall formula (1995) needs an exact 100-word sample.
Since a formula is only a statistical tool that calculates the approximate grade level of a representative sample, it follows that a shorter sample is most likely to yield an inaccurate result. Testing the full text or choosing larger samples may guarantee better results. However, Harris-Sharples found that a ‘minimum of eight samples produced readability scores similar to the largest samples’, according to Jeanne S. Chall and Edgar Dale (Appendix A of Readability Revisited: The New Dale-Chall Readability Formula).
But what is the optimal sample size? A 1000-word sample is easier to test than a full text; a 100-word sample reduces the task by a tenth; and a 10-word sample would be tempting indeed for those who want quick results. Let us try, though on a small scale, to find out the optimal sample size for calculating semantic complexity, which is usually determined by the length of the words in a text.
I picked up Roald Dahl’s Matilda and calculated the average number of syllables per word (ASW) on a sample of the first 1000 words of Chapter I (The Reader Of Books). There were 1376 syllables and, therefore, ASW = 1.376. I then divided the 1000 words into 10 samples, each of 100 words, and found that the ASW varied from 1.3 to 1.46. Next, I divided the first 100 words (ASW = 1.42) into 10 samples, each of 10 words, and found that the ASW varied from 1 to 1.6.
Samples of 10 words vary too much, but samples of 100 words appear more stable. Perhaps this explains why many readability formulae use 100-word samples. Clearly, the calculation of semantic complexity with a 10-word sample cannot be recommended.