Ted Underwood, who teaches english and information sciences did a fascinating book called Distant Horizons, where they took massive volumes of literature and analyzed them by having computers count the words. The name of the book comes from the name of the technique: "distant reading".
It starts with an easy one: seeing if the robot can tell, in a mass of undifferentiated text, what's detective fiction, just by counting words--from Edgar Allen Poe and pre-Poe crime and Victorian shock novels to the present...
A quick glance at the words most predictive of detective fiction reveals the themes we would expect: police, murder, investigation, and crime. If we look a little deeper into the model, there are less obvious details. Architecture and domestic furnishing, for instance, loom large as a source of clues: door, room, window, and desk are all highly predictive words. On the opposite side of the model, words that describe childhood and education (born, grew, taught, children) strongly predict that a volume is not detective fiction…Working with a light touch, a critic who has read a few detective stories can use predictive features to tease out insights (the absence of children in these stories, for instance, is an interesting dog that didn’t bark).
The model is also pretty decent at finding science fiction--going all the way back to Shelley, and this is pretty neat:
Science fiction turns out to have a strong stylistic signature, which we might loosely characterize as sublime. Invocations of scale (vast, far, larger, infinite) are very characteristic of the genre, as are large numbers (thousands, millions). Horror, nightmare, and destruction are more prominent than one might have imagined. Self-conscious references to the human tend to accompany creatures against which humanity may be defined, and the pronoun its is common, since we often confront unknown things that lack an easily recognized human gender. At the other end of the scale, a whole range of quotidian details mark a book as probably not science fiction: references to tea or a hat, for instance, and to particular days of the week…this angle of analysis may help us understand why Mary Shelley’s apocalyptic intensity belongs in our history of science fiction, with or without detailed scientific content. The same thing might be said about some of the randomly selected books that the model strongly (and persuasively) misclassifies as science fiction, like Thomas Pynchon’s Crying of Lot 49 and William S. Burroughs’s The Ticket That Exploded. It may not be Pynchon’s explicit concern with entropy but his paranoid fascination with the sheer scale of mass society that this model sees as connected to the tradition of science fiction.
A few things here:
- Pynchon and Burroughs read a lot of science fiction--and appeal to the kind of literary-fiction-reading people who like sci-fi. The similarity in word-choice also probably reveals a similar interest in Big Ideas.
- Christopher Mennel once paid me 200 dollars to give him a title for his sci-fi game, I gave him "Ultramassive and Unexplained"--he was very happy.
- I'm a big fan of number words myself, but I also think of ultraextremism as a stylistic tic Patrick had.
The model also seems to be pretty good at distinguishing pre-war (basically Shelley until pulp) from post-war sci fi. While it can tell things are sci fi all the way up til the present, he tried running the pre-war-sci-fi-only predictor on post-war sci-fi and he got these results:
Many of the passages where the models disagree most sharply have a psychological or social dimension. The following example from Babel-17 is typical. (I have italicized words that make prewar models particularly skeptical that this is science fiction.)
What “self”? There was no “I.”
She had entered him in some bewildering reversed sexuality. Enclosing her, he was in agony. The light—you make! You make! his crying in terror.
Butcher, she asked, more familiar in patterning words about emotionalturbulences than he, what does my mind in yours look like?
Bright, bright moving, he howled, the analytical precision of Babel-17, crude as stone to articulate their melding, making so many patterns, re-forming them.
...There is a lot going on in this passage. Our model may not necessarily notice, for instance, that Delany has distorted English syntax to reflect his characters’ struggle to communicate. The model does notice, however, that the central conflict of the passage is psychological: words like emotional and crying are legible clues, and they do a lot to convince models trained on prewar evidence that this passage is not science fiction. The passages of Le Guin’s Left Hand of Darkness that are hardest for prewar models to recognize as science fiction similarly explore the psychological and social consequences of alien sexuality. And although Robert Heinlein’s politics were different from Delany’s and Le Guin’s, Stranger in a Strange Land fails to fit prewar models of science fiction for the same reason: it pays less attention to the physics of spaceflight than to psychological, social (and specifically sexual) disorientation. Arthur C. Clarke, by contrast, is relatively easy for early-twentieth-century models to recognize as a science fiction writer.
Not too surprising, but still interesting to know the robot matches what critics have said. A few of the other analyses match what you might think: male writers seem to write male and female characters more differently than female writers do, gender differences seem to narrow over time, less-prominent authors' writing tends to change to become more like more-prominent ones over time (prestigious works either set precedents or fit the changing world better or both).
That last analyses was part of a string of inquiry Underwood did based around literary prestige. Basically he took massive samples of random writing and compared the word usage to works that had been reviewed. These produced some of the most interesting results--he started with poetry, which seems to have similar results to fiction only more exaggerated:
…it turns out that poetic prominence does correlate with a particular kind of writing. Further, that “kind of writing” can be modeled simply by counting words, and it remained rather stable between 1820 and 1919…To make this clearer, I have divided the model’s twenty-six hundred variables into three groups: the top nine hundred words, which markedly increase a poem’s perceived likelihood of being reviewed, are represented in boldface. The bottom nine hundred, which markedly decrease that likelihood, are italicized. All others are typeset normally (this includes words too rare to be included in the model, as well as the middle eight hundred words—which don’t individually have a huge effect). We’ll start with the conclusion of Christina Rossetti’s “Echo” (1865), which the model sees as very likely to be reviewed:
Yet come to me in dreams, that I may live
My very life again tho’ cold in death:
Come back to me in dreams, that I may give
Pulse for pulse, breath for breath:
Speak low, lean low,
As long ago, my love, how long ago.
It likes certain abstractions too, such as dreams and death—although, perversely enough, not live. It likes cold, but not hot; fear, but not joy; bitter, but not sweet. In fact, we may as well admit that this model is happiest when poems are a bit desolate. Brooding, blind, hollow, and harsh are some of its favorite words. It has an allergy to things that are kind or noble. It doesn’t even like homes. We can see why if we look at the volumes at the very bottom of its list—the ones it is rightly confident will never be reviewed. Many of these have some inspirational or hortatory purpose; they are about equally divided between religious and political topics but share a reliance on positive emblems of collective emotion. In Memorial or Decoration Day (1891), for instance, George Loomis invokes “those who battled for these homes of ours, / And precious blood on Freedom’s altar shed.”
...the obscure poets in our random sample lean toward abstraction and positive sentiment, whereas reviewed poets emphasize physical description—especially of colors, nature, and the human body. On the other side of the model, if we unpack the Inquirer’s terms “power” and “dependence,” we find that randomly sampled volumes emphasize words associated with social relations. The reviewed writers, by contrast, use more first-person singular pronouns. All of this boils down to a fairly clear contrast between embodied lyric subjectivity and an older mode of poetic authority that is more didactic, sentimental, and collective.
The model (correctly) sees this as a less-likely-to-have-been-reviewed passage--with bold being "good" and italic being "bad":
"Quick as lightning these thoughts and wishes flashed through his mind. Seeing his peril, in an instant he had seized his rifle by the barrel, and raising it by the side of his head, prepared to deal his foe a tremendous blow. But bruin was too good a boxer. . . . The next instant, Charles felt the strong legs of the shaggy beast folded about him, and pressing him in a closer and closer hug. He dropped his rifle from his hand, and struggled to draw his knife."
And the model sees this (correctly) as in line with prevailing literary standards:
"Many who knew her thought it a pity that so substantive and rare a creature should have been absorbed into the life of another, and be only known in a certain circle as wife and mother. But no one stated what elsethat was in her power she ought rather to have done."
Interesting that creature is on the good list but beast is neutral. Other hypotheses emphasized by Underwood:
…“A certain circle,” for instance, has paused to acknowledge a limitation on its own social description, while “else” and “ought” are used in counterfactual moral reasoning. The language is not recondite, but it may be the sort of language that was used by careful thinkers in the nineteenth century. And yet, of course, all the model really knows is that these words tend to be used by other authors reviewed in prominent venues
"Readers" vs "Audiences"
Looking at the model's good/bad writing predictions brings up a few things:
-While it may not be surprising to anyone familiar with the image of the lonely, tortured poet that the more-successful works tend toward despair and isolation--and self-examining nuance, it's also interesting to notice that the arty word-cloud is exactly the opposite of people are told to talk marketing, in giving political speeches, and when negotiating with armed lunatics: the language of public speech is all about positive abstractions, warmth, simplicity.
-While I am sure anyone familiar with modern comfort culture could write a long and snarky post about the relationship of abstract positivity to real or at least perceived badness (even empathy-performance motormouth Arthur Chu hates on Animal Crossing, apparently) I wonder if there isn't a more interesting and democratic explanation:
When you comfortably read a book, you are alone. Even if you aren't in your little bubble on a train or in the park on your lunch break, you are consciously screening-out the outside world. While anyone who's made it this far into this blog entry probably enjoys reading, you might even go so far as to say that reading is often a consolation of lonely people. You are examining minutely--part of what may make people who like books like a book may be the text's ability to respond to the situation that the reader is literally in while reading.
Contrast this to public speech: if you go to a political rally or a church or even a rave you are surrounded by people, you all came to see the same thing happen, you all are hoping these people are on your side, that you have collective purpose--it would suck to be in the middle of a crowd that didn't want what you did. Also, in these situations: you actually do have things in common--you did all choose this, you might have even paid for it. So the language of "we" and "isn't this nice" is responding to what you're probably already feeling as well.
A way to shorthand this is: you are a different person when you are a reader than when you are part of an audience.
A tremendous amount of satirical writing is literary people have a lot of merciless fun ripping open the vapidities of public speech and making fun of them, because in the context of a book being read, on purpose, by someone who paid to read it because they want to read, the cliches of tv ads, the president, pamphlets about immigration, etc. seems so absurd. And, vice versa, a lot of twitter is about taking what's 300 pages into a book most people on twitter would never read and didn't know they were going to hear about today and making fun of someone admitting to some nuanced, private, dark thing.
If anyone needed to be reminded that propaganda and art (or whatever you want to call nuanced and careful attention to private ideas) are different things.
Side note: Sex was happily on Team Literature until people started dying from it in massive numbers the 1980s. Then it had to be political, and thus require Public Speech. Now we have a lot of constant clashes between these two kinds of speech about sex: one obsessed with honesty (because people need to know they are not alone) and another obsessed with messaging (because people need to be safe).
Now consider game writing:
Is game writing public, exhortatory speech or is it literary, nuanced speech?
When you're sitting at home with the book comfortably on your lap, deep in someone's cyberpunk rainforest it's literature--when you're reading a circled-in-red squeepost or outragepost about how inclusive or uninclusive a retweeted paragraph is, the game text is public speech.
When you're using the text in-game to see what a rule is--it's neither.
When you're reading that rule to the table--it's both?
I could go out on a further limb and point to evidence that a lot of broader conflict in gaming culture is down to people wanting game texts to be dark, detailed, isolated and isolating literary speech including "the sort of language that was used by careful thinkers" versus those who want it to fulfill its potential as persuasive oratory to this-or-that large group. Obviously, you can do both with any given text, but the constant frustrations of people who want most of all to be privately-pleased Readers and those who want to be messaged-at as an Audience are different.
Those most likely to see gaming as an eccentric thing done with trusted friends are most likely to be grumpy about how poorly-written something is--poorly-written not as in "using lazy tropes" but as in "It doesn't seem fun to read or diverting enough to hold my attention".
The folks most likely to say "fuck this hobby"--or even to use the phrase "the hobby"--are seeing gaming as a mass collective endeavor, one where they do, metaphorically, stand in a crowd, looking at a stage, waiting for a positive message that will resonate with them and their desire to not be trampled by everyone else in the room. Some sign that they do indeed have something in common.