He said, she said, ngrammatically.

Readers of these pages (all three of you) know of my fondness for Googles ngrams, with which I try to frame each blogging cycle. This article is an appreciation of ngrams, and features the work of a researcher who has dived much more deeply.

Engineer Chris Harrison has already taken N-Gram data to that deeper analytical level with the stunning visualization shown here.









Harrison wanted to compare two sets of 3-grams. He started each 3-gram with a different word: “He” and “She.” He then identified the top 120 3-grams for each word. The frequencies of the second word in the 3-gram were combined, sorted, and displayed in decreasing order of frequency-of-use. He repeated the process for ranking the third (and final) word in the 3-gram. …

According to Harrison, the commonalities are as interesting as the differences.” Among the top 120 3-grams, “He” and “She” have many second words in common but diverge on some intriguing ones. For example, only “He” connects to “argues,” while only “She” connects to “love.”


Wow! Word association made visible! Great stuff!


This entry was posted in Cultural Comment, Language, Technology and tagged . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s