2005-09-23

Trigrams

Trigrams are sequences of three letters in running text, counting space as a letter. I've had this table around for a while, but never publicized it on my web site index. Now I have.

The table is derived from the Brown Corpus, a million words of American English prose, and contains every trigram that appears at least once in every 10,000 words in that corpus (not once in every10,000 times, as the previous version had it).

3 comments:

Anonymous said...

You surely meant "at least 100 times"?!

Anonymous said...

how are the trigrams #M# and #D# justified?

John Cowan said...

Anonymous #2, probably there were some occurrences of "M.D." in the corpus.