The Regressive Imagery Dictionary (RID) is a coding scheme for text analysis that is designed to measure “primordial” and conceptual content. Primordial thought is the kind of free-form, associative thinking involved in fantasy and dreams. Like Freud's id, I guess. Conceptual (or secondary) thought is logical, reality-based and focused on problem solving.
RID contains about 3,000 words grouped into categories that are themselves classified as primary, secondary, and emotional. A piece of text is classified by what percentage of its words fall into each category.
I'm skeptical, but it seemed like it might be fun. And some people do think it's accurate and useful:
Detailed evidence concerning the reliability and validity of the Regressive Imagery Dictionary is reported elsewhere (Martindale, 1975, 1990). Evidence for the construct validity of primordial vs. conceptual content comes from studies where the measure has behaved as theoretically predicted: Significantly more primordial content has been found in the poetry of poets who exhibit signs of psychopathology than in that of poets who exhibit no such signs (Martindale, 1975). There is also more primordial content in the fantasy stories of creative as opposed to uncreative subjects (Martindale & Dailey, 1996), in psychoanalytic sessions marked by therapeutic "work" as opposed to those marked by resistance and defensiveness (Reynes, Martindale & Dahl, 1984), and in sentences containing verbal tics as opposed to asymptomatic sentences (Martindale, 1977). A cross-cultural study of folktales from forty-five preliterate societies revealed, as predicted from the "primitive mentality" hypothesis of Lévy-Bruhl (1910) and Werner (1948), that amount of primary process content in folktales is negatively related to the degree of sociocultural complexity of the societies that produced them (Martindale, 1976). Martindale and Fischer (1977) found that psilocybin (a drug that has about the same effect as LSD) increases the amount of primordial content in written stories. Marijuana has a similar effect (West et al., 1983). Research has also revealed more primordial content in verbal productions of younger children as compared with older children (West, Martindale, & Sutton-Smith, 1985) and of schizophrenic subjects as compared with control subjects (West & Martindale, 1988). It shows the pattern expected for historical trends in primordial content in Martindale's (1990) theory of literarary evolution. Thus, the Regressive Imagery Dictionary does seem to yield a valid index of primordial or dedifferentiated thought in a variety of contexts in which the measure varies as is theoretically expected.
Erik Frey has code for analyzing LiveJournal posts with RID. It even compares your scores to the averages of everyone who uses his code. But it wasn't in a form that was easy for me to apply to arbitrary text, so I wrote my own RID code.
rid.py is my Python implementation. It reads text from stdin and prints its analysis to stdout. It uses the English dictionary from Kovach Computing Services. If you want to download or create a dictionary for other languages, it's pretty easy to modify the code to use any dictionary you choose (that's in the proper format).
About the only interesting thing in the code is that it uses a discrimination tree to look up word categories. This was 100x faster than doing linear search with regular expressions.
And now for some armchair psychoanalysis. First up, the second debate between George Bush and Al Gore in 2000. Here's a peek into Al Gore's mind:
dhcp103:~/src/RID wiseman$ ./rid.py < gore.txt SECONDARY:ABSTRACTION 264 SECONDARY:TEMPORAL REFERENCES 154 SECONDARY:SOCIAL BEHAVIOR 125 PRIMARY:REGRESSIVE COGNITION:CONCRETENESS 120 SECONDARY:INSTRUMENTAL BEHAVIOR 107 EMOTIONS:AGGRESSION 89 SECONDARY:MORAL IMPERATIVE 81 EMOTIONS:AFFECTION 46 SECONDARY:RESTRAINT 39 SECONDARY:ORDER 26 PRIMARY:SENSATION:VISION 26 EMOTIONS:GLORY 22 PRIMARY:SENSATION:COLD 20 PRIMARY:DEFENSIVE SYMBOLIZATION:PASSIVITY 18 PRIMARY:ICARIAN IMAGERY:ASCENT 12 PRIMARY:SENSATION:SOUND 12 PRIMARY:REGRESSIVE COGNITION:NARCISSISM 11 PRIMARY:ICARIAN IMAGERY:HEIGHT 10 PRIMARY:REGRESSIVE COGNITION:BRINK-PASSAGE 8 PRIMARY:NEED:ANALITY 7 EMOTIONS:POSITIVE AFFECT 6 PRIMARY:ICARIAN IMAGERY:WATER 6 EMOTIONS:EXPRESSIVE BEHAVIOR 5 PRIMARY:SENSATION:HARD 5 PRIMARY:REGRESSIVE COGNITION:TIMELESSNESS 5 PRIMARY:DEFENSIVE SYMBOLIZATION:CHAOS 4 EMOTIONS:ANXIETY 4 PRIMARY:SENSATION:GENERAL-SENSATION 4 PRIMARY:ICARIAN IMAGERY:FIRE 4 EMOTIONS:SADNESS 3 PRIMARY:ICARIAN IMAGERY:DEPTH 3 PRIMARY:NEED:ORALITY 3 PRIMARY:DEFENSIVE SYMBOLIZATION:RANDOM MOVEMENT 3 PRIMARY:DEFENSIVE SYMBOLIZATION:DIFFUSION 2 PRIMARY:DEFENSIVE SYMBOLIZATION:VOYAGE 1 PRIMARY:SENSATION:ODOR 1 PRIMARY:SENSATION:TOUCH 1 PRIMARY:REGRESSIVE COGNITION:UNKNOWN 1 PRIMARY : 22.813990 % EMOTIONS : 13.910970 % SECONDARY : 63.275040 % 6560 words total
And a window into our current president's brain:
dhcp103:~/src/RID wiseman$ ./rid.py < bush.txt SECONDARY:ABSTRACTION 319 SECONDARY:INSTRUMENTAL BEHAVIOR 197 SECONDARY:SOCIAL BEHAVIOR 183 PRIMARY:REGRESSIVE COGNITION:CONCRETENESS 170 SECONDARY:MORAL IMPERATIVE 107 SECONDARY:TEMPORAL REFERENCES 93 SECONDARY:RESTRAINT 61 EMOTIONS:AFFECTION 57 EMOTIONS:AGGRESSION 51 SECONDARY:ORDER 29 PRIMARY:SENSATION:VISION 24 PRIMARY:DEFENSIVE SYMBOLIZATION:PASSIVITY 23 PRIMARY:SENSATION:COLD 20 PRIMARY:ICARIAN IMAGERY:WATER 13 EMOTIONS:POSITIVE AFFECT 11 PRIMARY:REGRESSIVE COGNITION:NARCISSISM 10 PRIMARY:SENSATION:HARD 10 EMOTIONS:GLORY 10 PRIMARY:ICARIAN IMAGERY:HEIGHT 8 EMOTIONS:ANXIETY 7 PRIMARY:ICARIAN IMAGERY:FIRE 6 PRIMARY:ICARIAN IMAGERY:DESCENT 5 PRIMARY:NEED:SEX 4 PRIMARY:REGRESSIVE COGNITION:BRINK-PASSAGE 4 PRIMARY:SENSATION:SOUND 4 PRIMARY:ICARIAN IMAGERY:DEPTH 3 PRIMARY:SENSATION:GENERAL-SENSATION 3 PRIMARY:DEFENSIVE SYMBOLIZATION:VOYAGE 3 PRIMARY:ICARIAN IMAGERY:ASCENT 3 PRIMARY:DEFENSIVE SYMBOLIZATION:DIFFUSION 2 EMOTIONS:SADNESS 2 PRIMARY:NEED:ORALITY 2 PRIMARY:DEFENSIVE SYMBOLIZATION:RANDOM MOVEMENT 2 PRIMARY:DEFENSIVE SYMBOLIZATION:CHAOS 1 EMOTIONS:EXPRESSIVE BEHAVIOR 1 PRIMARY:REGRESSIVE COGNITION:CONSCIOUSNESS ALTERATION 1 PRIMARY:REGRESSIVE COGNITION:UNKNOWN 1 PRIMARY : 22.206897 % EMOTIONS : 9.586207 % SECONDARY : 68.206897 % 7996 words total
Look at that--Bush was 5 percentage points more reality-based than Gore. And he was 50% more emotional. And feels a 33.3% greater need for sex.
Now let's compare newsgroups. First we'll analyze the last 1000 posts from comp.lang.lisp:
dhcp103:~/src/RID wiseman$ ./get_posts.py comp.lang.lisp 1000 | ./rid.py SECONDARY:ABSTRACTION 8078 SECONDARY:INSTRUMENTAL BEHAVIOR 5493 PRIMARY:REGRESSIVE COGNITION:CONCRETENESS 3941 SECONDARY:SOCIAL BEHAVIOR 3680 SECONDARY:TEMPORAL REFERENCES 2812 SECONDARY:ORDER 2402 PRIMARY:SENSATION:VISION 1231 SECONDARY:RESTRAINT 1134 EMOTIONS:AFFECTION 925 EMOTIONS:AGGRESSION 911 SECONDARY:MORAL IMPERATIVE 630 PRIMARY:REGRESSIVE COGNITION:BRINK-PASSAGE 594 PRIMARY:REGRESSIVE COGNITION:NARCISSISM 559 PRIMARY:DEFENSIVE SYMBOLIZATION:PASSIVITY 415 PRIMARY:ICARIAN IMAGERY:HEIGHT 363 PRIMARY:ICARIAN IMAGERY:ASCENT 342 EMOTIONS:GLORY 260 PRIMARY:NEED:ORALITY 240 PRIMARY:ICARIAN IMAGERY:DEPTH 224 EMOTIONS:EXPRESSIVE BEHAVIOR 208 PRIMARY:ICARIAN IMAGERY:FIRE 195 PRIMARY:DEFENSIVE SYMBOLIZATION:CHAOS 193 PRIMARY:SENSATION:SOUND 186 PRIMARY:SENSATION:GENERAL-SENSATION 186 PRIMARY:ICARIAN IMAGERY:WATER 175 EMOTIONS:POSITIVE AFFECT 173 EMOTIONS:ANXIETY 170 PRIMARY:SENSATION:TASTE 165 PRIMARY:SENSATION:HARD 162 PRIMARY:DEFENSIVE SYMBOLIZATION:DIFFUSION 126 PRIMARY:REGRESSIVE COGNITION:UNKNOWN 119 EMOTIONS:SADNESS 117 PRIMARY:DEFENSIVE SYMBOLIZATION:RANDOM MOVEMENT 111 PRIMARY:DEFENSIVE SYMBOLIZATION:VOYAGE 101 PRIMARY:ICARIAN IMAGERY:DESCENT 81 PRIMARY:SENSATION:SOFT 73 PRIMARY:SENSATION:COLD 71 PRIMARY:NEED:ANALITY 55 PRIMARY:SENSATION:TOUCH 54 PRIMARY:REGRESSIVE COGNITION:TIMELESSNESS 50 PRIMARY:REGRESSIVE COGNITION:CONSCIOUSNESS ALTERATION 36 PRIMARY:NEED:SEX 21 PRIMARY:SENSATION:ODOR 15 PRIMARY : 27.197454 % EMOTIONS : 7.454756 % SECONDARY : 65.347790 % 221521 words total
And then the last 1000 posts in comp.lang.ruby:
dhcp103:~/src/RID wiseman$ ./get_posts.py comp.lang.ruby 1000 | ./rid.py SECONDARY:ABSTRACTION 5112 SECONDARY:INSTRUMENTAL BEHAVIOR 4072 PRIMARY:SENSATION:VISION 3667 SECONDARY:SOCIAL BEHAVIOR 3272 PRIMARY:REGRESSIVE COGNITION:CONCRETENESS 3166 SECONDARY:TEMPORAL REFERENCES 2613 SECONDARY:ORDER 2021 SECONDARY:RESTRAINT 1136 PRIMARY:REGRESSIVE COGNITION:BRINK-PASSAGE 834 EMOTIONS:AFFECTION 767 SECONDARY:MORAL IMPERATIVE 668 PRIMARY:ICARIAN IMAGERY:HEIGHT 443 EMOTIONS:AGGRESSION 415 PRIMARY:DEFENSIVE SYMBOLIZATION:PASSIVITY 304 PRIMARY:REGRESSIVE COGNITION:NARCISSISM 266 PRIMARY:REGRESSIVE COGNITION:UNKNOWN 264 PRIMARY:ICARIAN IMAGERY:DEPTH 257 PRIMARY:SENSATION:COLD 210 PRIMARY:ICARIAN IMAGERY:FIRE 195 PRIMARY:NEED:ORALITY 192 PRIMARY:SENSATION:GENERAL-SENSATION 158 EMOTIONS:POSITIVE AFFECT 153 EMOTIONS:GLORY 149 PRIMARY:SENSATION:SOUND 146 PRIMARY:ICARIAN IMAGERY:DESCENT 135 PRIMARY:SENSATION:TASTE 131 PRIMARY:DEFENSIVE SYMBOLIZATION:CHAOS 117 PRIMARY:REGRESSIVE COGNITION:CONSCIOUSNESS ALTERATION 97 EMOTIONS:ANXIETY 90 EMOTIONS:SADNESS 87 PRIMARY:SENSATION:HARD 81 PRIMARY:NEED:ANALITY 76 PRIMARY:REGRESSIVE COGNITION:TIMELESSNESS 70 PRIMARY:ICARIAN IMAGERY:ASCENT 69 PRIMARY:DEFENSIVE SYMBOLIZATION:RANDOM MOVEMENT 68 PRIMARY:ICARIAN IMAGERY:WATER 64 PRIMARY:SENSATION:SOFT 63 EMOTIONS:EXPRESSIVE BEHAVIOR 59 PRIMARY:DEFENSIVE SYMBOLIZATION:VOYAGE 56 PRIMARY:SENSATION:TOUCH 49 PRIMARY:DEFENSIVE SYMBOLIZATION:DIFFUSION 23 PRIMARY:NEED:SEX 4 PRIMARY:SENSATION:ODOR 1 PRIMARY : 35.216845 % EMOTIONS : 5.405405 % SECONDARY : 59.377750 % 185304 words total
Conclusion: Ruby usenetters are operating on a significantly more primordial level than the Lispers, giving less attention to problem solving! And damn, the Lispers need sex badly—more than 4x as much as Ruby fans!
The possibilities for RID are endless. In-line twitter filters. A Yahoo Pipes module. Tiny little 3-category pie charts on every email. An emacs package that disables source code commits when too much Icarian fire imagery is detected. Go nuts, people.
Update: Added a license (MIT) to the code. Fixed another misspelling in the dictionary itself.
Update: Neil Kandalgaonkar has a visualization tool based on RID.
Posted by jjwiseman at May 21, 2007 12:03 AMWould you add Haskell to your analysis? That would make an interesting comparison to Lisp mindset.
Posted by: dimosd on May 21, 2007 06:35 AMI like how Gore's narcissism goes to "11".
Posted by: Chris B. on May 21, 2007 07:22 AMJohn,
Can you clarify the license for the code a bit; maybe put it under a creative commons license?
Will
Posted by: will Fitzgerald on May 21, 2007 09:08 AMThanks for reminding me, Will. My code has an MIT license, but I don't know the status of the actual dictionary and exclusion list that I included with the code; it's freely available for downloading, at least, and maybe it just comes from a Martindale paper.
P.S. The discrimination tree is translated from some Lisp code of yours.
Posted by: John Wiseman on May 21, 2007 10:25 AMEek! Please don't put code under Creative Commons Licenses. They aren't designed for code. Use the GPL, LGPL or MIT licenses instead.
If the program uses CC-licensed data that's fine. Data doesn't have to be the same license as the program. We'll ignore the fact that this is a Lisp weblog. ;-)
Posted by: Rob Myers on May 21, 2007 01:31 PMAny reason you chose python over lisp ?
just curious...
Justin: It's not like there's much code for it to make much difference what language it's in. But I just find it easier to work in Python for this sort of thing. For example, it took me about 10 minutes apiece to write get_posts.py and get_irc.py, which will work with a stock Python installation on any of OS X, Linux and Windows.
Posted by: John Wiseman on May 24, 2007 01:29 PM