Jun 26 2017

Topic Modelling – Letters of 1916

Letters of 1916 is a public humanities project run by Maynooth University and directed by Professor Susan Schreibman. The project is creating a crowd-sourced digital collection of letters written between 1st November 1915 and 31st October 1916. For my internship, I have been working on the project as a research assistant, analysing the collection.

One of the most popular tools for exploring text collections is topic modelling. Topic modelling is a method of exploring latent topics within a text collection, often using Latent Dirichlet Allocation. In simple terms, “Topic modeling is a way of extrapolating backward from a collection of documents to infer the discourses (“topics”) that could have generated them” (Underwood, 2012).

Identifying Number of Topics
A challenge when creating topic models is determining the optimum number of topics. The R package ‘ldatuning’ (Murzintcev, 2015), which maps the corpus against four separate metrics, was used to identify an appropriate number of topics, 30.

Metrics from ‘ldatuning’ Package for Optimal Number of Topics

The topic model was created in R using the ‘topicmodels’ package (Grün and Hornik, 2011) an implementation of Latent Dirchlet Allocation using Gibbs sampling. The resulting topics were visualised as a series of word clouds (available at http://sarajkerr.com/Dataviz/intern/images/Letters1916_new.gif).

Word Cloud of 30 Topics

Interactive Topic Visualisations Using LDAvis
The topics were also visualised using ‘LDAvis’ (Sievert and Shirley, 2015), where the most distinct terms in the topic can be interactively viewed by adjusting the relevance metric λ to 0.5. The image below illustrates Topic 14 – Rebellion, the red bars indicate the terms in the topic while the blue bars indicate the terms in the corpus as a whole.

Output from ‘LDAvis’ Showing Topic 15

The interactive model is available at http://sarajkerr.com/Dataviz/intern/lda/index.html.

The topics identified by the topic model highlight a number of interesting themes within the collection: Topics 11 and 22 refer to prison; Topic 14 to rebellion; Topic 15 official correspondence; Topic 17 Roger Casement; Topic 18 letters to Lady Clonbrock; Topic 21 legal matters; Topic 23 the murders of Sheehey Skeffington, Dickson, and MacIntyre; and Topic 30 prisoners of war.

Top 20 Terms in Topic 11 – Internment

Top 20 Terms in Topic 14 – Rebellion

While the topic models are informative, they have a relatively narrow focus which limits the opportunities for “serendipitous discovery” (Nicholas and Herman 2009:22).

In my next post I will explore vector space models, also known as word embeddings.

May 10 2017

GST1 – 6: Quantitative Methods in the Social Sciences 3

GST 1 is a module at Maynooth University which aims to improve research skills and employability. To gain 5 ECTS for this module you need to attend 6 sessions and produce a diary entry or set of notes for each one.

Quantitative Methods in the Social Sciences 3: Inferential and Exploratory Quantitative Techniques

Parametric and Non-Parametric Tests
The type of test chosen for your data depends on whether the data are parametric or non-parametric. Parametric data are:

  • Independent randomly selected observations
  • Approximately normally distributed
  • Interval scale measurements – continuous
  • Generally a minimum sample size of 30
  • Hypothesis posed regarding mean, standard deviation of population

If the sample does not fit into the parametric criteria, or the data are ordinal or nominal. The hypothesis posed is regarding ranks, medians, frequencies or inter-quartile range. There are parallel statistical tests depending on your data.

Parametric Non-Parametric
Mean, Standard Deviation Chi Squared
z-Test Kendall’s Tau
Student’s t-Test Mann-Whitney U Test
Pearson’s Product Moment Correlation Spearman’s Rank Correlation Coefficient

One of the biggest challenges is deciding which type of test to use:

  • Difference between groups (independent) – t-test or Mann-Whitney U test
  • Difference between dependent variables – t-test for dependent samples or Wilcoxon Sign-rank
  • Relationship between variables – Pearson correlation or Spearman Correlation / Chi Square.

The choice of test will depend on your objectives, the distribution, data type, number of samples, what you are trying to do, and, ideally, whether you have used the test before. The steps to testing the hypothesis are:

  1. Identify the research question
  2. State H0 and Ha
  3. Decide the level of significance – 95% or < 0.05 p value
  4. Identify whether you need a 1 or 2-tailed test
  5. Compute the test statistic
  6. If it is in the critical region – reject H0

Student’s t-Test
There are three types of t-test and all compare the means. A 1-sample t-test compares the sample mean to a known population, e.g. 1st year students compared to whole student body. An independent sample t-test compares 2 independent groups, e.g. male vs female income. A Paired (dependent) samples t-test compares repeated measures, e.g. before and after in a drug trial.

Analysis of Variance (ANOVA)
ANOVA is a continuation of the t-test if you are using 3 or more samples. Using a series of t-tests is likely to create a Type I error, using ANOVA reduces this chance. The ANOVA (f statistic) is calculated by dividing an estimate of the variability between groups by the variability within groups.

May 04 2017

GST1 – 5: Quantitative Methods in the Social Sciences 2

GST 1 is a module at Maynooth University which aims to improve research skills and employability. To gain 5 ECTS for this module you need to attend 6 sessions and produce a diary entry or set of notes for each one.

Quantitative Methods in the Social Sciences 2: Introduction to Hypothesis Testing

Hypothesis testing starts with a specific question and based on a given level it is accepted or rejected. Simply, an assumption is made, evidence collected, and, based on the sample data, is the initial assumption reasonable?

Steps for Hypothesis Testing

  1. Identify the research question.
  2. Determine the null and alternative hypotheses. H0 – the null hypothesis is always the status quo – there is no difference, change, x is not guilty.Ha – the alternative hypothesis is that there is a difference
  3. Decide the level of significance – 95% or a p-value of < 0.05
  4. Decide if need a one or two-tailed test
  5. Gather data with a view to proving H0 untrue
  6. Calculate the test statistic
  7. If there is no significant difference between the two parameters then accept H0. If there is a difference at a 95% significance level the observed differences are so great that they are unlikely to happen by chance, therefore reject H0.

Rejecting the null hypothesis is not proof as we don’t have the whole population, but the significance level suggest that this is not due to chance.

A one-tailed test is used if there is direction, if we are saying something is greater than, above, below X. A two-tailed test has no direction an indicates difference or change. In general a two-tailed test is used.

Type I and Type II Errors
Setting a significance value of 95% or higher reduces the chances of Type I (rejecting H0 when it is true) or Type II (accepting H0 when it is false) errors.

Accept H0 Reject H0
H0 is True Correct Type I (𝛂)
H0 is False Type II (𝛃) Correct

May 03 2017

GST1 – 4: Quantitative Methods in the Social Sciences 1

GST 1 is a module at Maynooth University which aims to improve research skills and employability. To gain 5 ECTS for this module you need to attend 6 sessions and produce a diary entry or set of notes for each one.

Quantitative Methods in the Social Sciences 1: Probability Distributions

This is the first in a series of sessions from the FSS1 Module – Quantitative Methods in the Social Sciences. This session focused on the basics of probability and inferential testing.

Inferential Tests
This type of test infers from a sample which is representative of the whole population. Caution is needed that the sample is valid as a representation. This is all linked to probability – measuring the likelihood that event X will occur and making an informed decision. This is a shift from what has happened (the data) to what will happen (the inference).

“Probability distribution may be thought of as histograms depicting relative frequencies” – Rogerson (2001:47). Probability is the area under the curve.

Conceptual Approaches and Laws of Probability
There are three conceptual approaches:

  • Classical – equal likelihood
  • Relative Frequency – based on empirical findings
  • Subjective – based on personal judgement

The sample space enumerates all possible outcomes, e.g. selecting a card from a standard deck has a sample space of 52. We need to know whether a sample is mutually exclusive, e.g. selecting a J and a K when selecting a single card is mutually exclusive, selecting a 2 and a ❤️ is not. We also need to know if an event is independent or dependent.

Laws of Probability:

  • Law of Subtraction – all probabilities are equal to 1, therefore if there are two possible outcomes and event x has a probability of 25% then event y will have a probability of 75%.
  • Law of Multiplication – to calculate the probability that both events occur, multiply the events’ probabilities.
  • Law of Addition – if two events are mutually exclusive, to calculate if event a or event b will occur add the probabilities together. If the events are not mutually exclusive, to calculate if event a or event b will occur: P(event a) + P(event b) – P(event a * event b).

Binomial and Poisson Probability Distributions
The curve of the data reflects the distribution. The assumption in inferential statistics is that the sample is random.

The Binomial Distribution
Binomial random variables are discrete (i.e. they are a series of integers and can be illustrated as a bar chart). There will be several independent repetitions of the experiment with two possible outcomes ‘success’ and ‘failure’. To calculate the probability you need to know the probability of success.

Binomial Probability Law
P(X = r) = nCr * p^r * q^(n-r)
Probability of X successes = number of possible outcomes * (probability of success ^ number of successes) * (probability of failure ^ (number of trials – number of successes))

The number of possible outcomes is calculated by using a Combination calculator. Sampling is key for accuracy – it must be representative.

The Poisson Distribution
This distribution is used for determining the probability of x events occurring over y space or time, time is more usual. Each Poisson distribution depends on the average number of occurrences of the event in a given time interval, denoted by µ.

Poisson Probability Law
P(X = r) = (µ ^ r) * (e ^ -µ) / r!
Probability of X events occurring within the time frame = (mean number of occurrences in the time period ^ number observed) * (constant 2.71829 ^ -number observed) / factorial of number observed.

The Normal Probability Distribution
This is also known as the Bell-shaped distribution or the Gaussian distribution. The normal curve is a theoretical model, it is a continuous probability distribution within a specified range. Probability is the corresponding area under the curve. The curve is described by skew and kurtosis.

The Normal Distribution

The normal curve is described according to the mean µ and the standard deviation σ. For a normal distribution skewness and kurtosis values tend towards 0. The image above illustrates the empirical rule that 68.2% of values will fall within 1 standard deviation, 95.4% within 2 standard deviations, and 99.7% will fall within 3 standard deviations. Beyond these are values which are significant.

To assess normality:

  1. Visually
    • Box and Whisker
    • Stem and Leaf
    • Histogram
    • Q-Q Plot
  2. Descriptiveness
    • Mean, Median, Mode
    • Skewness
    • Kurtosis
  3. Normality Tests
    • Shapiro-Wilk
    • Anderson-Darling
    • Kolmogorov-Smirnov
  4. The Standard Normal Distribution has a mean of 0 and a standard deviation of 1. Z tables and z scores represent the standard normal distribution, the z table gives us the probability. If the z score is outside the scope of the table then it is assumed to be 1.

    If a Normal Distribution is not standard it will need to be standardised:
    z = x – µ / σ
    z score = the value – mean / standard deviation.
    If we know the probability and want to calculate the value:
    value = z * σ + µ
    value = z score * standard deviation + mean.

Apr 13 2017

Trolls and Critics in the 19th Century

We often like to perceive the past as something quite alien, populated with unfamiliar people who are ‘not like us’. However, the more primary sources you encounter, the more this belief is shaken.

For part of my Literature Review I have been exploring the 19th century reviews of Jane Austen, Maria Edgeworth and Sydney Owenson’s novels, and the venom expressed in some of them has been quite shocking.

Jane Austen
Austen gets off relatively lightly, not because of her popularity but because her novels were almost entirely overlooked by the reviewers. The British Critic, in 1812, praises the characters in Sense and Sensibility for being “happily delineated”, going on to say “we will, however, detain our female friends no longer than to assure them that they may peruse these volumes not only with satisfaction but with real benefit, for they may learn from them, if they please, many sober and salutary maxims for the conduct of life” (p.527). Well, that’s a relief then – just what you look for in a novel!

The most famous review of Austen’s novels was written by Sir Walter Scott, appearing anonymously in The Quarterly Review (1816). The review focuses on Emma but also draws attention to two of Austen’s previous novels Sense and Sensibility and Pride and Prejudice. As Peter Sabor notes in his article about the significance of this review: “In taking notice of an obscure female novelist, and in commissioning a review from the greatest man of letters of his age, Murray was tacitly acknowledging the particular significance of Emma: only a very few of the hundreds of contemporary novels would ever be so favoured”. This was not entirely lacking in self-interest, John Murray the publisher of The Quarterly Review, was also the publisher of Austen and Scott’s novels.

Scott proposes that her novels belonged to a new “class of fictions which has arisen almost in

“Even less story than either of the preceding novels” – on Emma
our own times, and which draws the characters and incidents introduced more immediately from the current of ordinary life than was permitted by the former rules of the novel” and indicated that it was superior to “the ephemeral productions which supply the regular demand of watering-places and circulating libraries” (p.189). After such high praise, the review itself seems rather a slap in the face, to a modern reader. Scott informs his readers that Emma, although having “even less story than either of the preceding novels”, has “subjects… [which] are finished up to nature, and with a precision which delights the reader” (p.195-7). He praises her “quiet yet comic dialogue” but criticises “the minute detail” of her “haracters of folly or simplicity” which “is apt to become tiresome in fiction” (p.199-200). High praise indeed! However, this review was critical in securing the the future value placed upon Austen’s novels in the late nineteenth century and beyond, saving her from becoming one of the great unread.

Maria Edgeworth
Maria Edgeworth’s Belinda (1802), received a rather mixed review from The Monthly Review. The critic acknowledges their “respect for her talents” and call her novel the “production of no common pen” but go on to say that although the novel starts well it fades

“tameness and insipidity” – on Belinda
into “tameness and insipidity” (p.368), they were clearly disappointed that the female duellist Lady Delacour had been reformed.

The reviews for Patronage were not so favourable, Edgeworth had made the mistake of setting parts of her novel in the male, public sphere of law, politics and the Church. The reviewers of The British Critic and Quarterly Theological Review were pretty savage, her understanding of diplomacy could only have come from “some ape of his superiors”, and her descriptions of political characters were “absurd” and “raise an incredulous disgust”. Rather more menacingly they “advise her, as she regards her own reputation, not to libel our English Church”. This phrase concludes a series of comments regarding her morality (“To the morality of Miss Edgeworth we can raise no objection”) and her private life (“With the private lives of those whose works are before us we have not the slightest knowledge”), one can’t help envisioning a tabloid campaign to dig up the dirt on Miss Edgeworth. Ultimately, in perhaps the greatest insult to the author, the reviewers conclude: “If we shall be thought to be severe upon those parts…it is to be remembered, that it is not upon our ingenious and lively authoress that our censures rest so heavily, as upon that Father” (p160-173).

Edgeworth did not let this go unanswered:

Sydney Owenson, Later Lady Morgan
However, it was for Sydney Owenson that the critics saved their worst jibes. Perhaps not a polished a writer as Austen and Edgeworth, Owenson was also an outspoken social climber. Her second book (The Novice of St Dominick), The Critical Review tells us, “was the last book that amused the hours of illness of the late Mr Pitt” and “tho’ we cannot speak of it [her third book, The Wild Irish Girl] in the first type of panegyric, is yet in many parts capable of exciting considerable interest, and may well amuse a leisure hour” (p.327-328). Such positive reviews were not to last.

In 1804, Owenson made a public response to John Wilson Croker’s anonymously published Familiar Epistles, to Frederick J–S Esq, On the Present State of the Irish Stage, which attacked many of her father’s friends. This made her an influential, political enemy and prompted a twenty year war of words.

Croker seems to have been the ultimate troll, anonymously attacking Owenson and her writing. In a series of letters to The Freeman’s Journal, Croker wrote: “her merits have been over-rated…and her arguments over-praised…I accuse Miss OWENSON of having written bad novels, and worse poetry…I accuse her of attempting to vitiate mankind – of attempting to undermine morality by sophistry” (Connolly p.98). This war of words may have provided Owenson with more support than Croker anticipated; one particular letter writer, identifying himself

“I accuse Miss OWENSON of having written bad novels, and worse poetry” – J W Croker
only as the ‘Son of Ireland’ suggested that Croker’s motive was jealousy and that “the puny pretender to wit is prompt to undervalue the talent that can detect his insufficiency” (p.104).

In a review of Ida of Athens, a book in which Owenson herself was disappointed, calling it a ‘bad book’ in her Memoir, Croker suggested that if she “practise a little self denial, and gather a few precepts of humility…she might then hope to prove, not indeed a good writer of novels, but a useful friend, a faithful wife, a tender mother, and a respectable and happy mistress of a family” (The Quarterly Review p.52).

As a Tory politician, Croker, may also have found Owenson’s support of Whig politicians and the cause of Catholic emancipation a threat to his political party and his own emerging political career. Yet there are suggestions that Croker was not just attacking Owenson’s work and political stance, he was also attacking her as a woman who sought financial independence and admittance to the higher ranks of society. “Croker makes it clear that had Owenson not been in search of commercial gain – as in her countertype, the independently wealthy and ultimately frivolous author…her reputation might have remained her ‘private property'” (Connolly p112). Even Owenson’s husband was attacked. In 1821 Owenson published a travel book, Italy, Croker’s review states: “Notwithstanding the obstetric skill of Sir Charles Morgan (who we believe is a male midwife), this book dropt all but stillborn from the press” (Adburgham p.255).

Croker seems to have relished the vitriolic attack, Owenson was not his only target (he also attacked Mary Shelly’s Frankenstein, and the works of Hugo, and Alexandre Dumas) but he seemed to have a particular hatred for her. Like modern trolls, the attacks frequently went beyond the content of her novels and other writings and attacked Owenson herself, almost always from the safety of anonymity, a cowardly man with an axe to grind.

Works Cited

  • Adburgham, Alison. Women in Print. London: George Allen and Unwin Ltd, 1972. Print.
  • “Belinda, By Maria Edgeworth.” The Monthly Review; or Literary Journal 37 (1802): 368–374. Web.
  • Connolly, Claire. “‘I Accuse Miss Owenson’: The Wild Irish Girl as Media Event.” Colby Quarterly 36.2 (2000): 98–115. Print.
  • “Emma; a Novel. By the Author of Sense and Sensibility, Pride and Prejudice, Etc.” The Quarterly Review 14 (1816): 188–201. Web.
  • “Patronage by Miss Edgeworth.” The British Critic and Quarterly Theological Review 1 (1814): 159–173. Web.
  • Sabor, Peter. “‘Finished up to Nature’: Walter Scott’s Review of Emma.” Persuasions: The Jane Austen Journal 13 (1991): 88–99. Web.
  • “Sense and Sensibility by A Lady.” The British Critic 39 (1812): 527. Web.
  • “The Wild Irish Girl, By Miss Owenson.” The Critical Review; or Annals of Literature 9 (1806): 327–328. Web.
  • “Woman: Or , Ida of Athens.” The Quarterly Review 1 (1809): 50–52. Web.

Apr 03 2017

‘Alt-Right Jane Austen’ – My view

Nicole M. Wright’s article, “Alt-Right Jane Austen” (The Chronicle Review, March 12), was fascinating and shocking to read in equal measure. However, I felt that she did not go far enough in challenging the Alt-Right’s appropriation of Austen for their cause. A few additional details may help reinforce exactly why the Alt-Right are so wrong.

Firstly, the “cozy England of Austen”, idealized by the Alt-Right commentators, was in fact an England beset by almost constant war, rebellions and social uprisings, where refugees from Revolutionary France were a common sight. It was also a period where feminist ideas, and calls for equality, became increasingly shared as literacy levels increased.

Not only does Austen present “sexually adventurous characters”, she openly criticises the hypocrisy which judges men and women’s behaviour differently: “In this world the penalty is less equal than could be wished” (Mansfield Park). There is also a consistent criticism throughout the novels of many of those who hold power simply because of wealth and status, and a celebration of those who succeed through education, intelligence and merit. My own research, which uses vector space models to examine the theme of independence in Austen’s novels, reinforces this, revealing a complex and nuanced discourse of criticism against inequality.

Rather than being a rare example of a celebrated English female novelist, Austen began publishing at a time when female novelists out-published men. She was influenced by Frances Burney and Maria Edgeworth (whose novel Belinda includes cross-dressing female duellists and an interracial marriage), amongst other well-known female authors, and is part of a tradition of women’s writing which stretches from the early modern period to the present day.

By writing novels at a time when novel writing itself was a potentially political act, Austen places her challenge to the world in which she lived on a public stage. The Alt-Right’s misreading of Austen just shows how subversive she really is.

Feb 03 2017

GST1 – 3: Measuring Your Research Impact

GST 1 is a module at Maynooth University which aims to improve research skills and employability. To gain 5 ECTS for this module you need to attend 6 sessions and produce a diary entry or set of notes for each one.

Your Researcher Impact: Measuring Your Research Impact

This session, led by Ciarán Quinn the research support librarian, focused on two main aspects of research impact: the author profile, and impact measures.

Author Profile
One of the most straightforward thinks to consider is how you present your name, if you have multiple variations it can be hard for the various metrics to track all your work. One way of making sure that there is a consistent record is by creating and linking profiles on a variety of academic sites:

  • ORCID
  • Web of Science Researcher ID
  • RIS Profile
  • SCOPUS Author ID
  • Google Scholar Citation Profile
  • Academic Social Networks – Academia, Mendeley, ResearchGate

For the purpose of this post I have included screenshots for each of the profiles that I have, or have set up following this session. I am not able to set up RIS as this is for academic staff at Maynooth, the SCOPUS ID is created automatically when you publish with one of their journals.

Having a broad base of profiles improves the chances of your work being seen and cited. You do, however, need to check the profiles periodically to make sure they are up to date.

Impact Measures
There are a wide variety of impact measures which are all calculated slightly differently. There are also significant differences between academic fields.
Author level measures:

  • H-index
  • G-index
  • i10-index

The H-index is probably the most recognised measure, it is defined by h papers having at least h citations each. The index can vary according to who is calculating it: SCOPUS only focuses on publications in its database, Google uses a much broader range of publications.

Article level measures focus on how many times the article has been cited – which is why access is so important.

Journals have impact factors and rankings, as well as being divided into those that peer review and those that do not. It is important to consider the impact factor of the journal you plan to publish with, SciVal (part of SCOPUS) has lots of information to make this easier.

Feb 01 2017

GST1 – 2: Accelerated Expertise

GST 1 is a module at Maynooth University which aims to improve research skills and employability. To gain 5 ECTS for this module you need to attend 6 sessions and produce a diary entry or set of notes for each one.

Accelerated Expertise

Session on Accelerated Expertise and Superintelligence presented by Dr D Delany from Trinity College, Dublin.

The cognitive science of expertise, competence is a function of mental model quality. Our mental model of the world and our field is a schema or knowledge structure. This is particularly interesting as it ties in with my education research into the SOLO taxonomy and gives a possible reason why it may be effective.

The ‘Superintelligence’ Framework

Cognitive Tools
The ability to think in the abstract relies upon the cognitive tools we have available. The shift from orality to literacy marked a fundamental cognitive shift. Alphabetic literacy, as opposed to logographic, was more efficient and is linked to the rise of abstract theoretical and scientific thinking.

The rise in literacy and literature is linked to how the individual parses the information. Therefore the capacity for abstract thought is a function of literacy.

Dr Delany suggested what he called ‘knowledge engineering’ – aiming to reverse engineer the understanding beneath a concept. For example, to understand the concept of capital and its associated concepts.
Capital: The man-made factor of production encompassing all the physical assets, such as machinery, used by a business to produce goods and services.

Knowledge Engineering ‘Capital’

Factor of Production: Resources, such as capital, used by a business as inputs to the production process in the creation of goods and services.

By considering the definitions we can extract the deep learning, e.g. that production processes use factors of production to create goods and services, and to demonstrate a level of expertise. Extracting the deep structure of concepts enables us to compare and critique experts in the field.

Considering what lies beneath the sentence level – the sentence level structure may reflect orality.

Levels of analysis
Sentence-level analysis:

  • obscures deep structure
  • weakly schemogenic

Concept-level analysis:

  • exposes the deep structure
  • functional role of concepts
  • strongly schemogenic

Creating a Semantic Relationship
Taxonomic – class and subclass (is a…)
Holonymic (x contains y)
Meronymic (y is a part of x)

Going Beyond Expertise
The mental model can be explored by using adduction to infer what else the definition suggests. For example, if there are man-made productive assets it suggests that there would also be synthetic productive assets.

One of the challenges of using this type of reverse engineering is overcoming conceptual biases. Human creativity is relatively limited in that we recycle and adapt familiar elements.

The example we explored in the session was Newton’s first and second law. Newton’s first law was modified from Descartes’ Laws of Nature 37 and 39. What was particularly interesting is that the second law can be extrapolated from the first, but this took Newton years to do.

Thematic Abstraction
Learning transfer – the ability to apply knowledge learned in one context in new contexts – is the key to adaptive expertise. This is where I noticed the similarities between the different levels within the SOLO taxonomy. Weakly schemogenic learning can lead to two problems: the ‘incompetent novice’ whose knowledge is not linked, or in small clusters (SOLO Uni-structural and Multi-structural), and the ‘brittle expert’ whose knowledge is linked by not consistently, or is hierarchical and narrow (SOLO Multi-structural and Relational). The adaptive expert’s knowledge is integrated, hierarchical and extensive (SOLO Extended Abstract). In strongly schemogenic learning the working memory is directly linked to schemas.

Nov 05 2016

Getting Organised – Referencing

books-1281581_1920
September marked the half way point of my PhD. So, as it is time to get things organised and focus on writing, I have decided to write a series of posts about my workflow. This is partly to help me clarify and streamline my processes, but also in case it is of use to anyone else.

My first post is about referencing. Referencing is one of those things you have to do, but is all too easy to leave to the last minute as, quite frankly, it is boring. I have spent many sleepless nights trying to sort out a bibliography started way too late, especially when I was doing my undergraduate degree and there was no laptop to help. Luckily there are lots of tools available now which can make this task relatively painless.

Reference Management
One of the absolute essentials, for me at least, is a reference management tool. I use Mendeley, partly because I have been using it since my MA, but mostly because it is free and has desktop and web versions which sync. This means I am confident that my library of references is safe and backed up (I also back up to an external hard drive, because you can never have too many back up copies – yes I am slightly paranoid about losing my work).

References can be added simply by dragging a pdf file into the document list, or manually. The authors’ names appear on the left hand side and the details of a selected document appear on the right hand side, which allows you to check the details and correct them if needed.You can view your references as a table or as citations, and you can choose from a number of different referencing styles. There is also a notes section, but to be honest I never use this. If PDF files are saved into Mendeley you can open and read them in the desktop or via the web and iPhone versions – great for reading on the go.

However, where Mendeley really comes into its own is organising your library of references. Over the course of my MA and PhD studies I have amassed a huge library of references, over 600 and still growing. I have created a Thesis folder in Mendeley, with a sub-folder for each chapter. Each reference I use in a particular chapter will be added to the appropriate folder, this will reduce the references I have to check to smaller chunks, and the folder can also be used as a reading list for each chapter.

So, references are organised and checked for accuracy, but this still doesn’t solve the dreary task of creating a bibliography. The solution I found combines LaTex (which I will write about in my next post) and a BibTex file created from my Mendeley library.

Creating a BibTex File
To create a BibTex file from your library, you need to go to the preferences tab. Go to the BibTex tab and tick ‘Enable BibTex syncing’.

Mendeley Preferences Tab

Mendeley Preferences Tab

There are three options, I go with ‘Create one BibTex file for my whole library’ so I don’t have to worry whether my references are in a particular folder. Browse and select where you want the file to be saved, then click ‘Apply’ and ‘Save’. And that is it, as you add to your Mendeley library your BibTex file is automatically updated to add your new references.

Citation Keys
The other thing you need to do is make sure ‘citation key’ is ticked under ‘Document Details’. When you select a document you will see that in the details on the right a citation key has been created (circled in red here),

Mendeley Desktop with Citation Key circled

Mendeley Desktop with Citation Key circled

the default is Author/Date but you can change it to whatever works best for you. This little shorthand reference to the document is the key to creating a bibliography and in text citations with minimal effort using LaTex.

Oct 03 2016

Jane Austen in Vector Space – Presentation at JADH

In September, I presented a paper which discussed the application of vector space models to a corpus of Jane Austen’s published novels at the Japanese Association for Digital Humanities Conference in Tokyo.

The paper was titled ‘Jane Austen in Vector Space: Applying vector space models to 19th century literature’ and outlined some of the findings from my pilot study applying data mining techniques to Austen’s novels.

The advent of distant and scaled reading techniques within literary studies has enabled the exploration of texts in a manner which “defamiliarize…making them unrecognizable in a way…that helps scholars identify features they might not otherwise have seen” (Clement, Tanya. “Text Analysis, Data Mining and Visualisations in Literary Scholarship.” MLA Commons | Literary studies in the digital age. Oct. 2013. Web.). Topic modelling is, perhaps, the most popular of these tools for Digital Humanists who wish to transform texts and view them through a different lens. However, the application of ‘word2vec’ (an algorithm which represents words as points in space, and the meanings and relationships between them as vectors) has the potential to be of even greater use. It can work effectively on a smaller corpus and can be applied to full texts, whereas, as Jockers has noted (“‘Secret’ recipe for topic modeling themes’. matthewjockers.net. 12 Apr. 2013, Web.), topic modelling is more effective when working with a large, noun only corpus. In addition, ‘word2vec’ allows the exploration of discourses surrounding a theme. Rather than asking ‘which topics or themes are in this corpus of texts?’ the application of the ‘word2vec’ algorithm allows us to ask ‘what does the corpus say about this theme?’.

Links can be found here to the conference Proceedings, Slides and a draft of the presentation.

Older posts «