GST 1 is a module at Maynooth University which aims to improve research skills and employability. To gain 5 ECTS for this module you need to attend 6 sessions and produce a diary entry or set of notes for each one.

**Quantitative Methods in the Social Sciences 1: Probability Distributions**

This is the first in a series of sessions from the FSS1 Module – Quantitative Methods in the Social Sciences. This session focused on the basics of probability and inferential testing.

*Inferential Tests*

This type of test infers from a sample which is representative of the whole population. Caution is needed that the sample is valid as a representation. This is all linked to probability – measuring the likelihood that event X will occur and making an informed decision. This is a shift from what has happened (the data) to what will happen (the inference).

“Probability distribution may be thought of as histograms depicting relative frequencies” – Rogerson (2001:47). Probability is the area under the curve.

*Conceptual Approaches and Laws of Probability*

There are three conceptual approaches:

- Classical – equal likelihood
- Relative Frequency – based on empirical findings
- Subjective – based on personal judgement

The **sample space** enumerates all possible outcomes, e.g. selecting a card from a standard deck has a sample space of 52. We need to know whether a sample is mutually exclusive, e.g. selecting a J and a K when selecting a single card is mutually exclusive, selecting a 2 and a ❤️ is not. We also need to know if an event is independent or dependent.

Laws of Probability:

- Law of Subtraction – all probabilities are equal to 1, therefore if there are two possible outcomes and event x has a probability of 25% then event y will have a probability of 75%.
- Law of Multiplication – to calculate the probability that both events occur, multiply the events’ probabilities.
- Law of Addition – if two events are mutually exclusive, to calculate if event a or event b will occur add the probabilities together. If the events are not mutually exclusive, to calculate if event a or event b will occur: P(event a) + P(event b) – P(event a * event b).

*Binomial and Poisson Probability Distributions*

The curve of the data reflects the distribution. The assumption in inferential statistics is that the sample is random.

The Binomial Distribution

Binomial random variables are discrete (i.e. they are a series of integers and can be illustrated as a bar chart). There will be several independent repetitions of the experiment with two possible outcomes ‘success’ and ‘failure’. To calculate the probability you need to know the probability of success.

**Binomial Probability Law**

P(X = r) = nCr * p^r * q^(n-r)

Probability of X successes = number of possible outcomes * (probability of success ^ number of successes) * (probability of failure ^ (number of trials – number of successes))

The number of possible outcomes is calculated by using a Combination calculator. Sampling is key for accuracy – it must be representative.

The Poisson Distribution

This distribution is used for determining the probability of x events occurring over y space or time, time is more usual. Each Poisson distribution depends on the average number of occurrences of the event in a given time interval, denoted by µ.

Poisson Probability Law

P(X = r) = (µ ^ r) * (e ^ -µ) / r!

Probability of X events occurring within the time frame = (mean number of occurrences in the time period ^ number observed) * (constant 2.71829 ^ -number observed) / factorial of number observed.

The Normal Probability Distribution

This is also known as the Bell-shaped distribution or the Gaussian distribution. The normal curve is a theoretical model, it is a continuous probability distribution within a specified range. Probability is the corresponding area under the curve. The curve is described by skew and kurtosis.

The normal curve is described according to the mean µ and the standard deviation σ. For a normal distribution skewness and kurtosis values tend towards 0. The image above illustrates the empirical rule that 68.2% of values will fall within 1 standard deviation, 95.4% within 2 standard deviations, and 99.7% will fall within 3 standard deviations. Beyond these are values which are significant.

To assess normality:

- Visually
- Box and Whisker
- Stem and Leaf
- Histogram
- Q-Q Plot

- Descriptiveness
- Mean, Median, Mode
- Skewness
- Kurtosis

- Normality Tests
- Shapiro-Wilk
- Anderson-Darling
- Kolmogorov-Smirnov

The Standard Normal Distribution has a mean of 0 and a standard deviation of 1. Z tables and z scores represent the standard normal distribution, the z table gives us the probability. If the z score is outside the scope of the table then it is assumed to be 1.

If a Normal Distribution is not standard it will need to be standardised:

z = x – µ / σ

z score = the value – mean / standard deviation.

If we know the probability and want to calculate the value:

value = z * σ + µ

value = z score * standard deviation + mean.