Aug 02 2017

On Word Embeddings and Fish – Letters of 1916

Letters of 1916 is a public humanities project run by Maynooth University and directed by Professor Susan Schreibman. The project is creating a crowd-sourced digital collection of letters written between 1st November 1915 and 31st October 1916. For my internship, I have been working on the project as a research assistant, analysing the collection.

Following my two posts on Topic Modelling (which can be found here and here), I am moving on to a different type of vector space modelling – Word Embedding.

Like Topic Modelling, Word Embeddings have their origins in computer science, specifically methods of information retrieval. As before, I carry out my analysis using R, this time using the ‘wordVectors’ package created by Schmidt and Li (2015). This package uses a version of the Word2Vec algorithm originally created by Mikolov et al (2013) at Google. To visualise the results I have used both the built-in plot function, and an adapted plot created using ‘ggplot2’ (Wickham 2009) and ‘ggrepel’ (Slowikowski 2016).

Preparing the text
Word Embeddings can be created with relatively little pre-processing, although depending on the size and type of the corpus you may wish to experiment with stop word removal. For the analysis below, the texts were in plain text files and were pre-processed using the prep_word2vec function. The collection is a subset of the larger Letter of 1916 collection, totalling 1372 letters. This function takes a collection of plain text documents and creates a single plain text file – you can also opt to remove capital letters, which I chose to do here.

Creating Word Embeddings

Word Embeddings are a useful addition to the tools that can be used to explore text collections, as Ben Schmidt notes: “For digital humanists, they merit attention because they allow a much richer exploration of the vocabularies or discursive spaces implied by massive collections of texts than most other reductions out there” (2015 ‘Word Embeddings’). Unlike a Topic Model, which provides an answer to ‘what topics are in this text collection?’ a Word Embedding allows us to ask `what is being said about this topic/word in this corpus?’.

The first step is to train the Word Embedding model. The vector space model maps the words in a corpus in a multi-dimensional space, which represents the semantic and syntactic relationships between words. The relationships between words are encoded as a vector of length n characteristics/contexts/dimensions, n being the chosen number of vectors which encode the relationships. The characteristics/contexts/dimensions are created by the computer and, at present, “this is a highly debated topic in the NLP/ML community, so my scientifically accurate answer is that we don’t yet know” (Levy 2015) what they specifically represent. The model I created has 300 vectors, a window of 12, negative sampling of 5, and uses the default skip-gram option. I will go into more detail about how Word Embedding works in a future post.

Changing the number of vectors will change the model produced, for example:

100 Vector Model – Ten words closest to ‘rising’

300 Vector Model – Ten words nearest to ‘rising’

These two images show the ten words nearest to a vector for ‘rising’ using a model with 100 vectors and a model with 300. Although the differences are subtle, the larger number of vectors creates a more nuanced interpretation.

Visualising the Results

Once the model has been created you can start to explore it. One way is to use closest_to to create a list of words nearest to a chosen target word. For example, if we wanted to see the words closest_to ‘rising’ in the letters we would get this:

10 Words Closest to ‘rising’

We can view a greater number of words near to the term ‘rising’ by viewing it as a plot. The built-in plot function uses t-SNE (t-distributed stochastic neighbour embedding) to reduce the dimensions for each word to a point that can be plotted in 2D. The built in plot returns a result like this:

T-SNE plot Terms closest to ‘rising’

Unfortunately, the terms are overlapping in places which makes the plot quite hard to read. I decided to create a custom plot to try to solve this problem. I used the ‘ggplot2’ and ‘ggrepel’ packages which allowed me to mark the position of each word with a point, and to offset the label to improve readability.

Close up of Custom Plot

Analysing the Results

The previous image is a close-up from the plot of the 500 words closest to ‘rising’. One of the advantages of this method of text exploration is that it can confirm things we know to be in the texts, for example, this cluster indicates several terms which are linked to the Rising, such as ‘rebels’, ‘seditious’, ‘speeches’; but it can also highlight the unexpected, or as Nicholas and Herman (2009:22) refer to it “serendipitous discovery”. This provides a way of searching a text while reducing the problems of search itself: “Search is a form of data mining, but a strangely focused form that only shows you what you already know to expect” (Underwood 2014:66).

In the cluster above we can see the word ‘shark’, which seems rather out of place. To examine the context of the words a Key Word in Context (KWIC) search (such as the one Jockers refers to in his 2014 book Text Analysis with R for Students of Literature) can be used to provide context. This reveals that this is a reference, in a single letter, to the sinking of HMS Shark at the Battle of Jutland. The letter provides a detailed account of the sinking of HMS Shark and the fate of its captain and crew – something that could easily be overlooked in

KWIC results for ‘shark’

In the same cluster, towards the top right, we have the terms ‘preserved’ and ‘herrings’ – what, we might ask, do herrings have to do with the 1916 Easter Rising?


This highlights the importance of the human element in examining the output of tools like this. Computers are great for crunching numbers and creating plots, but when it comes to spotting patterns there is nothing better than a human. With a little knowledge about the Easter Rising, we can spot words which seem to fit into a pattern and those that seemingly do not. Again, we use the KWIC to explore the context of our ‘herrings’ to see if we can identify how they are linked to the Rising.

KWIC results for ‘herrings’

This time our references come from four different letters. The last two references are letters from Soldiers in WWI talking about food they have at the Front and food they would like their families to send. However, the remaining five references are from two letters regarding the conditions for post-Rising internment prisoners at Frongoch Prison in Wales. This leads us to an interesting potential research question: ‘what were the conditions like for internment prisoners?’ as well as some initial sources for further investigation.

A final point to note about our fishy terms is that neither term is particularly frequent, ‘shark’ appears only 6 times in the corpus and ‘herrings’ only 7 times. The chance of a “serendipitous discovery” for these low frequency terms is fairly unlikely if we were using close reading alone. This highlights the usefulness of combining Word Embedding and close reading to examine large collections of texts.

Jul 06 2017

Topic Modelling: PoS Tagging – Letters of 1916

Letters of 1916 is a public humanities project run by Maynooth University and directed by Professor Susan Schreibman. The project is creating a crowd-sourced digital collection of letters written between 1st November 1915 and 31st October 1916. For my internship, I have been working on the project as a research assistant, analysing the collection.

In my previous post I discussed topic modelling the Letters of 1916 collection. Before moving on to a post on word embeddings, I though I would explore some additional points. Topic modelling is not an exact science, there is a certain amount of trial and error involved. This means that some of the topics extracted from a text collection can be overly influenced by frequent words or the texts themselves. Matthew Jockers demonstrated some of these problems in his blog post “Secret” Recipe for Topic Modelling, showing that character names and the effect of modelling complete novels, can result in topics which reflect the text rather than its themes. While stop word lists can address the issue of names and some of the more commonly used words, this is sometimes not enough.

Part of Speech Tagging (PoS)
In his blog post, Jockers suggests the use of Part of Speech tagging as an additional preprocessing step to reduce the noise from other parts of speech. He does, however add the caveat: “I think this is a good way to capture thematic information; it certainly does not capture such things as affect (i.e. attitudes towards the theme) or other nuances that may be very important to literary analysis and interpretation”.

To apply this additional processing step to the Letters of 1916 we need to use a part of speech tagger – in this case the freely available TreeTagger ( The .txt files are first passed through TreeTagger which tags each word for its part of speech. The words tagged with nouns (NN) and plural nouns (NNS) are then extracted and saved into a new file with the original letter ID. The texts are processed using the ‘tm’ package in R with numbers, punctuation and stop words being removed, and a Document Term Matrix created. As before, I used ‘ldatuning’ to identify the optimum number of topics – this time it was a little less clear, although 30 still appears to be the best option.

Topic Modelling Noun-Only Corpus

Letters of 1916 Topics – Nouns

What this preprocessing step seems to achieve is to refine the words in the topics, making the theme for each group of letters easier to identify. For some of the topics the difference is relatively minor as we can see in the Internment topic:

Internment Topic – Full Corpus

Internment Topic – Nouns Corpus

For other topics, the focus on the nouns helps to clarify the topic further. This is evident in the topic I have called Letters Before Death:

Letters Before Death Topic – Full Corpus

Letters Before Death Topic – Nouns Corpus

What seems to be clear is that examining the topics created from the full text corpus, as well as those from the nouns-only corpus, may prove useful in understanding the collection.

Jun 26 2017

Topic Modelling – Letters of 1916

Letters of 1916 is a public humanities project run by Maynooth University and directed by Professor Susan Schreibman. The project is creating a crowd-sourced digital collection of letters written between 1st November 1915 and 31st October 1916. For my internship, I have been working on the project as a research assistant, analysing the collection.

One of the most popular tools for exploring text collections is topic modelling. Topic modelling is a method of exploring latent topics within a text collection, often using Latent Dirichlet Allocation. In simple terms, “Topic modeling is a way of extrapolating backward from a collection of documents to infer the discourses (“topics”) that could have generated them” (Underwood, 2012).

Identifying Number of Topics
A challenge when creating topic models is determining the optimum number of topics. The R package ‘ldatuning’ (Murzintcev, 2015), which maps the corpus against four separate metrics, was used to identify an appropriate number of topics, 30.

Metrics from ‘ldatuning’ Package for Optimal Number of Topics

The topic model was created in R using the ‘topicmodels’ package (Grün and Hornik, 2011) an implementation of Latent Dirchlet Allocation using Gibbs sampling. The resulting topics were visualised as a series of word clouds (available at

Word Cloud of 30 Topics

Interactive Topic Visualisations Using LDAvis
The topics were also visualised using ‘LDAvis’ (Sievert and Shirley, 2015), where the most distinct terms in the topic can be interactively viewed by adjusting the relevance metric λ to 0.5. The image below illustrates Topic 14 – Rebellion, the red bars indicate the terms in the topic while the blue bars indicate the terms in the corpus as a whole.

Output from ‘LDAvis’ Showing Topic 15

The interactive model is available at

The topics identified by the topic model highlight a number of interesting themes within the collection: Topics 11 and 22 refer to prison; Topic 14 to rebellion; Topic 15 official correspondence; Topic 17 Roger Casement; Topic 18 letters to Lady Clonbrock; Topic 21 legal matters; Topic 23 the murders of Sheehey Skeffington, Dickson, and MacIntyre; and Topic 30 prisoners of war.

Top 20 Terms in Topic 11 – Internment

Top 20 Terms in Topic 14 – Rebellion

While the topic models are informative, they have a relatively narrow focus which limits the opportunities for “serendipitous discovery” (Nicholas and Herman 2009:22).

In my next post I will explore vector space models, also known as word embeddings.

May 10 2017

GST1 – 6: Quantitative Methods in the Social Sciences 3

GST 1 is a module at Maynooth University which aims to improve research skills and employability. To gain 5 ECTS for this module you need to attend 6 sessions and produce a diary entry or set of notes for each one.

Quantitative Methods in the Social Sciences 3: Inferential and Exploratory Quantitative Techniques

Parametric and Non-Parametric Tests
The type of test chosen for your data depends on whether the data are parametric or non-parametric. Parametric data are:

  • Independent randomly selected observations
  • Approximately normally distributed
  • Interval scale measurements – continuous
  • Generally a minimum sample size of 30
  • Hypothesis posed regarding mean, standard deviation of population

If the sample does not fit into the parametric criteria, or the data are ordinal or nominal. The hypothesis posed is regarding ranks, medians, frequencies or inter-quartile range. There are parallel statistical tests depending on your data.

Parametric Non-Parametric
Mean, Standard Deviation Chi Squared
z-Test Kendall’s Tau
Student’s t-Test Mann-Whitney U Test
Pearson’s Product Moment Correlation Spearman’s Rank Correlation Coefficient

One of the biggest challenges is deciding which type of test to use:

  • Difference between groups (independent) – t-test or Mann-Whitney U test
  • Difference between dependent variables – t-test for dependent samples or Wilcoxon Sign-rank
  • Relationship between variables – Pearson correlation or Spearman Correlation / Chi Square.

The choice of test will depend on your objectives, the distribution, data type, number of samples, what you are trying to do, and, ideally, whether you have used the test before. The steps to testing the hypothesis are:

  1. Identify the research question
  2. State H0 and Ha
  3. Decide the level of significance – 95% or < 0.05 p value
  4. Identify whether you need a 1 or 2-tailed test
  5. Compute the test statistic
  6. If it is in the critical region – reject H0

Student’s t-Test
There are three types of t-test and all compare the means. A 1-sample t-test compares the sample mean to a known population, e.g. 1st year students compared to whole student body. An independent sample t-test compares 2 independent groups, e.g. male vs female income. A Paired (dependent) samples t-test compares repeated measures, e.g. before and after in a drug trial.

Analysis of Variance (ANOVA)
ANOVA is a continuation of the t-test if you are using 3 or more samples. Using a series of t-tests is likely to create a Type I error, using ANOVA reduces this chance. The ANOVA (f statistic) is calculated by dividing an estimate of the variability between groups by the variability within groups.

May 04 2017

GST1 – 5: Quantitative Methods in the Social Sciences 2

GST 1 is a module at Maynooth University which aims to improve research skills and employability. To gain 5 ECTS for this module you need to attend 6 sessions and produce a diary entry or set of notes for each one.

Quantitative Methods in the Social Sciences 2: Introduction to Hypothesis Testing

Hypothesis testing starts with a specific question and based on a given level it is accepted or rejected. Simply, an assumption is made, evidence collected, and, based on the sample data, is the initial assumption reasonable?

Steps for Hypothesis Testing

  1. Identify the research question.
  2. Determine the null and alternative hypotheses. H0 – the null hypothesis is always the status quo – there is no difference, change, x is not guilty.Ha – the alternative hypothesis is that there is a difference
  3. Decide the level of significance – 95% or a p-value of < 0.05
  4. Decide if need a one or two-tailed test
  5. Gather data with a view to proving H0 untrue
  6. Calculate the test statistic
  7. If there is no significant difference between the two parameters then accept H0. If there is a difference at a 95% significance level the observed differences are so great that they are unlikely to happen by chance, therefore reject H0.

Rejecting the null hypothesis is not proof as we don’t have the whole population, but the significance level suggest that this is not due to chance.

A one-tailed test is used if there is direction, if we are saying something is greater than, above, below X. A two-tailed test has no direction an indicates difference or change. In general a two-tailed test is used.

Type I and Type II Errors
Setting a significance value of 95% or higher reduces the chances of Type I (rejecting H0 when it is true) or Type II (accepting H0 when it is false) errors.

Accept H0 Reject H0
H0 is True Correct Type I (𝛂)
H0 is False Type II (𝛃) Correct

May 03 2017

GST1 – 4: Quantitative Methods in the Social Sciences 1

GST 1 is a module at Maynooth University which aims to improve research skills and employability. To gain 5 ECTS for this module you need to attend 6 sessions and produce a diary entry or set of notes for each one.

Quantitative Methods in the Social Sciences 1: Probability Distributions

This is the first in a series of sessions from the FSS1 Module – Quantitative Methods in the Social Sciences. This session focused on the basics of probability and inferential testing.

Inferential Tests
This type of test infers from a sample which is representative of the whole population. Caution is needed that the sample is valid as a representation. This is all linked to probability – measuring the likelihood that event X will occur and making an informed decision. This is a shift from what has happened (the data) to what will happen (the inference).

“Probability distribution may be thought of as histograms depicting relative frequencies” – Rogerson (2001:47). Probability is the area under the curve.

Conceptual Approaches and Laws of Probability
There are three conceptual approaches:

  • Classical – equal likelihood
  • Relative Frequency – based on empirical findings
  • Subjective – based on personal judgement

The sample space enumerates all possible outcomes, e.g. selecting a card from a standard deck has a sample space of 52. We need to know whether a sample is mutually exclusive, e.g. selecting a J and a K when selecting a single card is mutually exclusive, selecting a 2 and a ❤️ is not. We also need to know if an event is independent or dependent.

Laws of Probability:

  • Law of Subtraction – all probabilities are equal to 1, therefore if there are two possible outcomes and event x has a probability of 25% then event y will have a probability of 75%.
  • Law of Multiplication – to calculate the probability that both events occur, multiply the events’ probabilities.
  • Law of Addition – if two events are mutually exclusive, to calculate if event a or event b will occur add the probabilities together. If the events are not mutually exclusive, to calculate if event a or event b will occur: P(event a) + P(event b) – P(event a * event b).

Binomial and Poisson Probability Distributions
The curve of the data reflects the distribution. The assumption in inferential statistics is that the sample is random.

The Binomial Distribution
Binomial random variables are discrete (i.e. they are a series of integers and can be illustrated as a bar chart). There will be several independent repetitions of the experiment with two possible outcomes ‘success’ and ‘failure’. To calculate the probability you need to know the probability of success.

Binomial Probability Law
P(X = r) = nCr * p^r * q^(n-r)
Probability of X successes = number of possible outcomes * (probability of success ^ number of successes) * (probability of failure ^ (number of trials – number of successes))

The number of possible outcomes is calculated by using a Combination calculator. Sampling is key for accuracy – it must be representative.

The Poisson Distribution
This distribution is used for determining the probability of x events occurring over y space or time, time is more usual. Each Poisson distribution depends on the average number of occurrences of the event in a given time interval, denoted by µ.

Poisson Probability Law
P(X = r) = (µ ^ r) * (e ^ -µ) / r!
Probability of X events occurring within the time frame = (mean number of occurrences in the time period ^ number observed) * (constant 2.71829 ^ -number observed) / factorial of number observed.

The Normal Probability Distribution
This is also known as the Bell-shaped distribution or the Gaussian distribution. The normal curve is a theoretical model, it is a continuous probability distribution within a specified range. Probability is the corresponding area under the curve. The curve is described by skew and kurtosis.

The Normal Distribution

The normal curve is described according to the mean µ and the standard deviation σ. For a normal distribution skewness and kurtosis values tend towards 0. The image above illustrates the empirical rule that 68.2% of values will fall within 1 standard deviation, 95.4% within 2 standard deviations, and 99.7% will fall within 3 standard deviations. Beyond these are values which are significant.

To assess normality:

  1. Visually
    • Box and Whisker
    • Stem and Leaf
    • Histogram
    • Q-Q Plot
  2. Descriptiveness
    • Mean, Median, Mode
    • Skewness
    • Kurtosis
  3. Normality Tests
    • Shapiro-Wilk
    • Anderson-Darling
    • Kolmogorov-Smirnov
  4. The Standard Normal Distribution has a mean of 0 and a standard deviation of 1. Z tables and z scores represent the standard normal distribution, the z table gives us the probability. If the z score is outside the scope of the table then it is assumed to be 1.

    If a Normal Distribution is not standard it will need to be standardised:
    z = x – µ / σ
    z score = the value – mean / standard deviation.
    If we know the probability and want to calculate the value:
    value = z * σ + µ
    value = z score * standard deviation + mean.

Apr 13 2017

Trolls and Critics in the 19th Century

We often like to perceive the past as something quite alien, populated with unfamiliar people who are ‘not like us’. However, the more primary sources you encounter, the more this belief is shaken.

For part of my Literature Review I have been exploring the 19th century reviews of Jane Austen, Maria Edgeworth and Sydney Owenson’s novels, and the venom expressed in some of them has been quite shocking.

Jane Austen
Austen gets off relatively lightly, not because of her popularity but because her novels were almost entirely overlooked by the reviewers. The British Critic, in 1812, praises the characters in Sense and Sensibility for being “happily delineated”, going on to say “we will, however, detain our female friends no longer than to assure them that they may peruse these volumes not only with satisfaction but with real benefit, for they may learn from them, if they please, many sober and salutary maxims for the conduct of life” (p.527). Well, that’s a relief then – just what you look for in a novel!

The most famous review of Austen’s novels was written by Sir Walter Scott, appearing anonymously in The Quarterly Review (1816). The review focuses on Emma but also draws attention to two of Austen’s previous novels Sense and Sensibility and Pride and Prejudice. As Peter Sabor notes in his article about the significance of this review: “In taking notice of an obscure female novelist, and in commissioning a review from the greatest man of letters of his age, Murray was tacitly acknowledging the particular significance of Emma: only a very few of the hundreds of contemporary novels would ever be so favoured”. This was not entirely lacking in self-interest, John Murray the publisher of The Quarterly Review, was also the publisher of Austen and Scott’s novels.

Scott proposes that her novels belonged to a new “class of fictions which has arisen almost in

“Even less story than either of the preceding novels” – on Emma
our own times, and which draws the characters and incidents introduced more immediately from the current of ordinary life than was permitted by the former rules of the novel” and indicated that it was superior to “the ephemeral productions which supply the regular demand of watering-places and circulating libraries” (p.189). After such high praise, the review itself seems rather a slap in the face, to a modern reader. Scott informs his readers that Emma, although having “even less story than either of the preceding novels”, has “subjects… [which] are finished up to nature, and with a precision which delights the reader” (p.195-7). He praises her “quiet yet comic dialogue” but criticises “the minute detail” of her “haracters of folly or simplicity” which “is apt to become tiresome in fiction” (p.199-200). High praise indeed! However, this review was critical in securing the the future value placed upon Austen’s novels in the late nineteenth century and beyond, saving her from becoming one of the great unread.

Maria Edgeworth
Maria Edgeworth’s Belinda (1802), received a rather mixed review from The Monthly Review. The critic acknowledges their “respect for her talents” and call her novel the “production of no common pen” but go on to say that although the novel starts well it fades

“tameness and insipidity” – on Belinda
into “tameness and insipidity” (p.368), they were clearly disappointed that the female duellist Lady Delacour had been reformed.

The reviews for Patronage were not so favourable, Edgeworth had made the mistake of setting parts of her novel in the male, public sphere of law, politics and the Church. The reviewers of The British Critic and Quarterly Theological Review were pretty savage, her understanding of diplomacy could only have come from “some ape of his superiors”, and her descriptions of political characters were “absurd” and “raise an incredulous disgust”. Rather more menacingly they “advise her, as she regards her own reputation, not to libel our English Church”. This phrase concludes a series of comments regarding her morality (“To the morality of Miss Edgeworth we can raise no objection”) and her private life (“With the private lives of those whose works are before us we have not the slightest knowledge”), one can’t help envisioning a tabloid campaign to dig up the dirt on Miss Edgeworth. Ultimately, in perhaps the greatest insult to the author, the reviewers conclude: “If we shall be thought to be severe upon those parts…it is to be remembered, that it is not upon our ingenious and lively authoress that our censures rest so heavily, as upon that Father” (p160-173).

Edgeworth did not let this go unanswered:

Sydney Owenson, Later Lady Morgan
However, it was for Sydney Owenson that the critics saved their worst jibes. Perhaps not a polished a writer as Austen and Edgeworth, Owenson was also an outspoken social climber. Her second book (The Novice of St Dominick), The Critical Review tells us, “was the last book that amused the hours of illness of the late Mr Pitt” and “tho’ we cannot speak of it [her third book, The Wild Irish Girl] in the first type of panegyric, is yet in many parts capable of exciting considerable interest, and may well amuse a leisure hour” (p.327-328). Such positive reviews were not to last.

In 1804, Owenson made a public response to John Wilson Croker’s anonymously published Familiar Epistles, to Frederick J–S Esq, On the Present State of the Irish Stage, which attacked many of her father’s friends. This made her an influential, political enemy and prompted a twenty year war of words.

Croker seems to have been the ultimate troll, anonymously attacking Owenson and her writing. In a series of letters to The Freeman’s Journal, Croker wrote: “her merits have been over-rated…and her arguments over-praised…I accuse Miss OWENSON of having written bad novels, and worse poetry…I accuse her of attempting to vitiate mankind – of attempting to undermine morality by sophistry” (Connolly p.98). This war of words may have provided Owenson with more support than Croker anticipated; one particular letter writer, identifying himself

“I accuse Miss OWENSON of having written bad novels, and worse poetry” – J W Croker
only as the ‘Son of Ireland’ suggested that Croker’s motive was jealousy and that “the puny pretender to wit is prompt to undervalue the talent that can detect his insufficiency” (p.104).

In a review of Ida of Athens, a book in which Owenson herself was disappointed, calling it a ‘bad book’ in her Memoir, Croker suggested that if she “practise a little self denial, and gather a few precepts of humility…she might then hope to prove, not indeed a good writer of novels, but a useful friend, a faithful wife, a tender mother, and a respectable and happy mistress of a family” (The Quarterly Review p.52).

As a Tory politician, Croker, may also have found Owenson’s support of Whig politicians and the cause of Catholic emancipation a threat to his political party and his own emerging political career. Yet there are suggestions that Croker was not just attacking Owenson’s work and political stance, he was also attacking her as a woman who sought financial independence and admittance to the higher ranks of society. “Croker makes it clear that had Owenson not been in search of commercial gain – as in her countertype, the independently wealthy and ultimately frivolous author…her reputation might have remained her ‘private property'” (Connolly p112). Even Owenson’s husband was attacked. In 1821 Owenson published a travel book, Italy, Croker’s review states: “Notwithstanding the obstetric skill of Sir Charles Morgan (who we believe is a male midwife), this book dropt all but stillborn from the press” (Adburgham p.255).

Croker seems to have relished the vitriolic attack, Owenson was not his only target (he also attacked Mary Shelly’s Frankenstein, and the works of Hugo, and Alexandre Dumas) but he seemed to have a particular hatred for her. Like modern trolls, the attacks frequently went beyond the content of her novels and other writings and attacked Owenson herself, almost always from the safety of anonymity, a cowardly man with an axe to grind.

Works Cited

  • Adburgham, Alison. Women in Print. London: George Allen and Unwin Ltd, 1972. Print.
  • “Belinda, By Maria Edgeworth.” The Monthly Review; or Literary Journal 37 (1802): 368–374. Web.
  • Connolly, Claire. “‘I Accuse Miss Owenson’: The Wild Irish Girl as Media Event.” Colby Quarterly 36.2 (2000): 98–115. Print.
  • “Emma; a Novel. By the Author of Sense and Sensibility, Pride and Prejudice, Etc.” The Quarterly Review 14 (1816): 188–201. Web.
  • “Patronage by Miss Edgeworth.” The British Critic and Quarterly Theological Review 1 (1814): 159–173. Web.
  • Sabor, Peter. “‘Finished up to Nature’: Walter Scott’s Review of Emma.” Persuasions: The Jane Austen Journal 13 (1991): 88–99. Web.
  • “Sense and Sensibility by A Lady.” The British Critic 39 (1812): 527. Web.
  • “The Wild Irish Girl, By Miss Owenson.” The Critical Review; or Annals of Literature 9 (1806): 327–328. Web.
  • “Woman: Or , Ida of Athens.” The Quarterly Review 1 (1809): 50–52. Web.

Apr 03 2017

‘Alt-Right Jane Austen’ – My view

Nicole M. Wright’s article, “Alt-Right Jane Austen” (The Chronicle Review, March 12), was fascinating and shocking to read in equal measure. However, I felt that she did not go far enough in challenging the Alt-Right’s appropriation of Austen for their cause. A few additional details may help reinforce exactly why the Alt-Right are so wrong.

Firstly, the “cozy England of Austen”, idealized by the Alt-Right commentators, was in fact an England beset by almost constant war, rebellions and social uprisings, where refugees from Revolutionary France were a common sight. It was also a period where feminist ideas, and calls for equality, became increasingly shared as literacy levels increased.

Not only does Austen present “sexually adventurous characters”, she openly criticises the hypocrisy which judges men and women’s behaviour differently: “In this world the penalty is less equal than could be wished” (Mansfield Park). There is also a consistent criticism throughout the novels of many of those who hold power simply because of wealth and status, and a celebration of those who succeed through education, intelligence and merit. My own research, which uses vector space models to examine the theme of independence in Austen’s novels, reinforces this, revealing a complex and nuanced discourse of criticism against inequality.

Rather than being a rare example of a celebrated English female novelist, Austen began publishing at a time when female novelists out-published men. She was influenced by Frances Burney and Maria Edgeworth (whose novel Belinda includes cross-dressing female duellists and an interracial marriage), amongst other well-known female authors, and is part of a tradition of women’s writing which stretches from the early modern period to the present day.

By writing novels at a time when novel writing itself was a potentially political act, Austen places her challenge to the world in which she lived on a public stage. The Alt-Right’s misreading of Austen just shows how subversive she really is.

Feb 03 2017

GST1 – 3: Measuring Your Research Impact

GST 1 is a module at Maynooth University which aims to improve research skills and employability. To gain 5 ECTS for this module you need to attend 6 sessions and produce a diary entry or set of notes for each one.

Your Researcher Impact: Measuring Your Research Impact

This session, led by Ciarán Quinn the research support librarian, focused on two main aspects of research impact: the author profile, and impact measures.

Author Profile
One of the most straightforward thinks to consider is how you present your name, if you have multiple variations it can be hard for the various metrics to track all your work. One way of making sure that there is a consistent record is by creating and linking profiles on a variety of academic sites:

  • Web of Science Researcher ID
  • RIS Profile
  • SCOPUS Author ID
  • Google Scholar Citation Profile
  • Academic Social Networks – Academia, Mendeley, ResearchGate

For the purpose of this post I have included screenshots for each of the profiles that I have, or have set up following this session. I am not able to set up RIS as this is for academic staff at Maynooth, the SCOPUS ID is created automatically when you publish with one of their journals.

Having a broad base of profiles improves the chances of your work being seen and cited. You do, however, need to check the profiles periodically to make sure they are up to date.

Impact Measures
There are a wide variety of impact measures which are all calculated slightly differently. There are also significant differences between academic fields.
Author level measures:

  • H-index
  • G-index
  • i10-index

The H-index is probably the most recognised measure, it is defined by h papers having at least h citations each. The index can vary according to who is calculating it: SCOPUS only focuses on publications in its database, Google uses a much broader range of publications.

Article level measures focus on how many times the article has been cited – which is why access is so important.

Journals have impact factors and rankings, as well as being divided into those that peer review and those that do not. It is important to consider the impact factor of the journal you plan to publish with, SciVal (part of SCOPUS) has lots of information to make this easier.

Feb 01 2017

GST1 – 2: Accelerated Expertise

GST 1 is a module at Maynooth University which aims to improve research skills and employability. To gain 5 ECTS for this module you need to attend 6 sessions and produce a diary entry or set of notes for each one.

Accelerated Expertise

Session on Accelerated Expertise and Superintelligence presented by Dr D Delany from Trinity College, Dublin.

The cognitive science of expertise, competence is a function of mental model quality. Our mental model of the world and our field is a schema or knowledge structure. This is particularly interesting as it ties in with my education research into the SOLO taxonomy and gives a possible reason why it may be effective.

The ‘Superintelligence’ Framework

Cognitive Tools
The ability to think in the abstract relies upon the cognitive tools we have available. The shift from orality to literacy marked a fundamental cognitive shift. Alphabetic literacy, as opposed to logographic, was more efficient and is linked to the rise of abstract theoretical and scientific thinking.

The rise in literacy and literature is linked to how the individual parses the information. Therefore the capacity for abstract thought is a function of literacy.

Dr Delany suggested what he called ‘knowledge engineering’ – aiming to reverse engineer the understanding beneath a concept. For example, to understand the concept of capital and its associated concepts.
Capital: The man-made factor of production encompassing all the physical assets, such as machinery, used by a business to produce goods and services.

Knowledge Engineering ‘Capital’

Factor of Production: Resources, such as capital, used by a business as inputs to the production process in the creation of goods and services.

By considering the definitions we can extract the deep learning, e.g. that production processes use factors of production to create goods and services, and to demonstrate a level of expertise. Extracting the deep structure of concepts enables us to compare and critique experts in the field.

Considering what lies beneath the sentence level – the sentence level structure may reflect orality.

Levels of analysis
Sentence-level analysis:

  • obscures deep structure
  • weakly schemogenic

Concept-level analysis:

  • exposes the deep structure
  • functional role of concepts
  • strongly schemogenic

Creating a Semantic Relationship
Taxonomic – class and subclass (is a…)
Holonymic (x contains y)
Meronymic (y is a part of x)

Going Beyond Expertise
The mental model can be explored by using adduction to infer what else the definition suggests. For example, if there are man-made productive assets it suggests that there would also be synthetic productive assets.

One of the challenges of using this type of reverse engineering is overcoming conceptual biases. Human creativity is relatively limited in that we recycle and adapt familiar elements.

The example we explored in the session was Newton’s first and second law. Newton’s first law was modified from Descartes’ Laws of Nature 37 and 39. What was particularly interesting is that the second law can be extrapolated from the first, but this took Newton years to do.

Thematic Abstraction
Learning transfer – the ability to apply knowledge learned in one context in new contexts – is the key to adaptive expertise. This is where I noticed the similarities between the different levels within the SOLO taxonomy. Weakly schemogenic learning can lead to two problems: the ‘incompetent novice’ whose knowledge is not linked, or in small clusters (SOLO Uni-structural and Multi-structural), and the ‘brittle expert’ whose knowledge is linked by not consistently, or is hierarchical and narrow (SOLO Multi-structural and Relational). The adaptive expert’s knowledge is integrated, hierarchical and extensive (SOLO Extended Abstract). In strongly schemogenic learning the working memory is directly linked to schemas.

Older posts «