In September, I presented a paper which discussed the application of vector space models to a corpus of Jane Austen’s published novels at the Japanese Association for Digital Humanities Conference in Tokyo.
The paper was titled ‘Jane Austen in Vector Space: Applying vector space models to 19th century literature’ and outlined some of the findings from my pilot study applying data mining techniques to Austen’s novels.
The advent of distant and scaled reading techniques within literary studies has enabled the exploration of texts in a manner which “defamiliarize…making them unrecognizable in a way…that helps scholars identify features they might not otherwise have seen” (Clement, Tanya. “Text Analysis, Data Mining and Visualisations in Literary Scholarship.” MLA Commons | Literary studies in the digital age. Oct. 2013. Web.). Topic modelling is, perhaps, the most popular of these tools for Digital Humanists who wish to transform texts and view them through a different lens. However, the application of ‘word2vec’ (an algorithm which represents words as points in space, and the meanings and relationships between them as vectors) has the potential to be of even greater use. It can work effectively on a smaller corpus and can be applied to full texts, whereas, as Jockers has noted (“‘Secret’ recipe for topic modeling themes’. matthewjockers.net. 12 Apr. 2013, Web.), topic modelling is more effective when working with a large, noun only corpus. In addition, ‘word2vec’ allows the exploration of discourses surrounding a theme. Rather than asking ‘which topics or themes are in this corpus of texts?’ the application of the ‘word2vec’ algorithm allows us to ask ‘what does the corpus say about this theme?’.