
A web-based tool that, when pointed at a given discussion thread (or comments section, or forum post, or what have you), will parse it into individual comments and then replace the content of each comment with the single most “interesting” word in that comment.
The result: a one-word-per-comment reduction of the original thread.
Essentially an extremely restricted word cloud mechanism applied on a per-comment basis.
The metric for “interestingness” is open to interpretation:
- Simple solution: just the word in a comment with the lowest frequency value compared to some baseline English language frequency table (for English-language discussions, natch; the concept is fundamentally internationalizable).
- Local context: use a modified frequency table that takes into account the local distribution of words, to avoid having every comment that mentions some uncommon-in-general word that is a theme of the specific conversation represented by that word.
- Avoid repeats: if a word has already been used as the “interesting” representative of a given comment in the thread, weight the interestingness value for that word somewhat negatively to favor other words that might be less interesting in an absolute sense but are more interesting in context for being novel to the autosummary-in-progress.
>