Readers of these pages (all three of you) know of my fondness for Googles ngrams, with which I try to frame each blogging cycle. This article is an appreciation of ngrams, and features the work of a researcher who has dived much more deeply.
Engineer Chris Harrison has already taken N-Gram data to that deeper analytical level with the stunning visualization shown here.
Harrison wanted to compare two sets of 3-grams. He started each 3-gram with a different word: “He” and “She.” He then identified the top 120 3-grams for each word. The frequencies of the second word in the 3-gram were combined, sorted, and displayed in decreasing order of frequency-of-use. He repeated the process for ranking the third (and final) word in the 3-gram. …
According to Harrison, the commonalities are as interesting as the differences.” Among the top 120 3-grams, “He” and “She” have many second words in common but diverge on some intriguing ones. For example, only “He” connects to “argues,” while only “She” connects to “love.”
Wow! Word association made visible! Great stuff!