Text Visualization Dec 9, 10:40 PM
The script attempts to pick out all the parts of speech in a given text by looking up each of its words in the Moby database. The parts of speech are then mapped to colors and placed in order of appearance on a 8 1/2×11 postscript page. White space happens when the word in the text cannot be mapped to a part of speech in the database.
The color palette used is the 216-color ‘websafe’ palette. The corresponding hex code for each color is converted to a sequence of 3 floating point numbers (R,G,B) to be used with postscript. These 216 values are saved for reference.
Moby identifies a word’s part of speech with a single character. For instance, a noun is represented by ‘N’ and an adjective is represented by ‘A’. For each word in the text, once the part of speech is determined, the value used to index the color palette is the ASCII value of the character that identifies the part of speech. ‘A’ = 65, ‘N’ = 78. Many words are able to act as multiple parts of speech. Moby identifies this by stringing together a sequence of characters. A word that can serve as an adjective and a noun would be marked ‘AN’. In this case, the script averages the ASCII value of the two characters to determine where to index the color palette. ‘AN’ = 71 (remainders are discarded).
Here’s an example (from Whitman’s Leaves of Grass ) : Right here
The idea is to explore the possiblities for transcoded text (text turned into image, sound, etc). I want to see if this kind of transfer can maintain aspects of textuality (through color patterns, etc.), if not overall, maybe for certain texts or categories of text. Are words as important to a text as EVERYBODY wants you to believe? Maybe. I hope not.
The code:
textvis.py : class definitions
t_vis : script to generate the images
