Rather than diving directly into a technology discussion, Glenn provided some motivation around why text mining might be interesting. He discussed how important it is for companies to pay attention to what is being said about them out in the world. And as a corollary, he discussed the importance of companies engaging their customers, rather than just broadcasting at them. And, yes, he did reference Cluetrain Manifesto.
In both of these cases, a technology like text mining can help find important connections. Companies like Factiva can help monitor newspapers and other media for references to your company. One can do the same in the blogosphere, and in much of the online world with various blog and web search engines. Services like this offer various subscription models, so one can get periodic reports on what's happening. And the sooner a company knows that people are talking, the sooner it can decide if and how to respond.
So, what's coming in advanced text mining and visualization?
- Enhanced ability to understand how words and terms are related to one another: does one company name appear frequently near another company name? Or is there a phrase that appears near a company name? Text miners have massive archives of proper nouns they can monitor, as well as place and region names. They are beginning to understand how people are related to a chunk of text (about, author, quoted, interviewed, etc.) and can apply the same kinds of questions.
- Is a given combination of terms happening more or less frequently in a given time period? i.e. What does the occurrence of Apple + Motorola + iTunes look like in relation to the discussion of the ROKR phone? What about commentary on the success or failure of the venture?
- Is a particular class of topics gaining / losing currency over time?
- Re-engineering search results. Rather than strictly get a list of results, attempt to apply the text mining capabilities to the results to pull out key concepts, companies, names to help the user focus their energies and give a wider context as to what is in the results. I got the feeling that this was similar to what Technorati is doing with adding Flickr, Furl and del.icio.us matches to the search results. (I understand that search and mining are very different activities.)
- One of the interesting aspects of visualization is to use the time data to show the frequency of occurrence over time, whether that is a single term or term combinations.