The number of different words an author uses compared to the length of the work creates a unique, identifiable "fingerprint," according to a study published December 10 in the New Journal of Physics. Using such a calculation could answer questions about works with unknown authors and help to discover lost literary works.
With Thomas Hardy, D. H. Lawrence, and Herman Melville as their test subjects, scientists at the University of Umea in Sweden identified patterns in the authors' style based on the frequency with which they used new words. Analysis showed that these patterns were unique to each author used in the study. (Herman Melville, for example, was found to use a larger vocabulary than Thomas Hardy.)
The researchers also looked at the speed at which this new-word usage drops off over the course of a work, finding that the drop-off rate varied between the three authors but was consistent across each author's entire body of work.
Using this stastical analysis, a work by an unknown author could be compared to previous, known works in order to find a fingerprint match.