Saturday, January 11, 2014

Computer Scientists Quantify Elements of Writing Style That Differentiate Successful Fiction

So there you are. Fire up the algorithm and start cranking out Ulysses and Lolita Meet Don Quixote reciting The Tale of Genji to Madame Bovary.
From PhysOrg:
Imagine the challenge publishers face, pouring over thousands of manuscripts to determine if a book will be a hit. Stony Brook Department of Computer Science Assistant Professor Yejin Choi thinks she has a tool to bring some science to that art, and she is co-author of a paper, Success with Style: Using Writing Style to Predict the Success of Novels, which was unveiled at the conference on Empirical Methods in Natural Language Processing (EMNLP) 2013.

"Predicting the success of literary works poses a massive dilemma for publishers and aspiring writers alike," Choi said. "We examined the quantitative connection between writing style and successful literature. Based on novels across different genres, we investigated the predictive power of statistical stylometry in discriminating successful literary works, and identified the stylistic elements that are more prominent in successful writings."

Statistical stylometry is the statistical analysis of variations in literary style between one writer or genre and another. The study reports, for the first time, that the discipline can be effective in distinguishing highly successful literature from its less successful counterpart, achieving accuracy rates as high as 84%.

For example, the research indicated that more successful books make more frequent use of discourse connectives (conjunctions such as "and", "but", "or") to join sentences and prepositions. Prepositions, nouns, pronouns, determiners (words that precede nouns to indicate whether the noun is specific or general, e.g. "your letter"), and adjectives are also predictive of highly successful books. Less successful books are characterized by a higher percentage of verbs, adverbs, and foreign words. They also rely more on topical words that could be almost cliché ("love"), typical locations, and extreme ("breathless") and negative ("bruised") words....MORE 
Possibly more sophisticated than HackerFactor's Gender Guesser which judged a snippet of my writing:
Three nits to pick:
1) Technical analysis is less valuable with indices than with tradable issues.
2) The longer the timeframe of the chart the less valuable the short term trading information.
3) In the instant case, commodity prices are just one input into economy-wide inflation or deflation.

So why'd I use this chart? Overall Kimball does a pretty good job of pointing out possible inflection points.
Total words: 66
Too few words.  Try 300 words or more.

Genre: Informal
  Female = 103
  Male   = 206
  Difference = 103; 66.66%
  Verdict: MALE
OR
Genre: Formal
  Female = 104
  Male   = 86
  Difference = -18; 45.26%
  Verdict: Weak FEMALE

Weak emphasis could indicate European.