TY - JOUR
T1 - Using Shakespeare's Sotto Voce to determine true identity from text
AU - Kernot, David
AU - Bossomaier, Terry
AU - Bradbury, Roger
N1 - Includes bibliographical references.
PY - 2018/3/15
Y1 - 2018/3/15
N2 - Little is known of the private life of William Shakespeare, but he is famous for his collection of plays and poems, even though many of the works attributed to him were published anonymously. Determining the identity of Shakespeare has fascinated scholars for 400 years, and four significant figures in English literary history have been suggested as likely alternatives to Shakespeare for some disputed works: Bacon, de Vere, Stanley, and Marlowe. A myriad of computational and statistical tools and techniques have been used to determine the true authorship of his works. Many of these techniques rely on basic statistical correlations, word counts, collocated word groups, or keyword density, but no one method has been decided on. We suggest that an alternative technique that uses word semantics to draw on personality can provide an accurate profile of a person. To test this claim, we analyse the works of Shakespeare, Christopher Marlowe, and Elizabeth Cary. We use Word Accumulation Curves, Hierarchical Clustering overlays, Principal Component Analysis, and Linear Discriminant Analysis techniques in combination with RPAS, a multi-faceted text analysis approach that draws on a writer's personality, or self to identify subtle characteristics within a person's writing style. Here we find that RPAS can separate the known authored works of Shakespeare from Marlowe and Cary. Further, it separates their contested works, works suspected of being written by others. While few authorship identification techniques identify self from the way a person writes, we demonstrate that these stylistic characteristics are as applicable 400 years ago as they are today and have the potential to be used within cyberspace for law enforcement purposes.
AB - Little is known of the private life of William Shakespeare, but he is famous for his collection of plays and poems, even though many of the works attributed to him were published anonymously. Determining the identity of Shakespeare has fascinated scholars for 400 years, and four significant figures in English literary history have been suggested as likely alternatives to Shakespeare for some disputed works: Bacon, de Vere, Stanley, and Marlowe. A myriad of computational and statistical tools and techniques have been used to determine the true authorship of his works. Many of these techniques rely on basic statistical correlations, word counts, collocated word groups, or keyword density, but no one method has been decided on. We suggest that an alternative technique that uses word semantics to draw on personality can provide an accurate profile of a person. To test this claim, we analyse the works of Shakespeare, Christopher Marlowe, and Elizabeth Cary. We use Word Accumulation Curves, Hierarchical Clustering overlays, Principal Component Analysis, and Linear Discriminant Analysis techniques in combination with RPAS, a multi-faceted text analysis approach that draws on a writer's personality, or self to identify subtle characteristics within a person's writing style. Here we find that RPAS can separate the known authored works of Shakespeare from Marlowe and Cary. Further, it separates their contested works, works suspected of being written by others. While few authorship identification techniques identify self from the way a person writes, we demonstrate that these stylistic characteristics are as applicable 400 years ago as they are today and have the potential to be used within cyberspace for law enforcement purposes.
KW - Authorship identification
KW - Linear discriminant analysis
KW - Personality
KW - Principal component analysis
KW - Sensory processing
UR - http://www.scopus.com/inward/record.url?scp=85044060306&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85044060306&partnerID=8YFLogxK
U2 - 10.3389/fpsyg.2018.00289
DO - 10.3389/fpsyg.2018.00289
M3 - Article
C2 - 29599734
AN - SCOPUS:85044060306
SN - 1664-1078
VL - 9
SP - 1
EP - 17
JO - Frontiers in Psychology
JF - Frontiers in Psychology
IS - MAR
M1 - 289
ER -