GST 1 is a module at Maynooth University which aims to improve research skills and employability. To gain 5 ECTS for this module you need to attend 6 sessions and produce a diary entry or set of notes for each one.
Quantitative Methods in the Social Sciences 1: Probability Distributions
This is the first in a series of sessions from the FSS1 Module – Quantitative Methods in the Social Sciences. This session focused on the basics of probability and inferential testing.
This type of test infers from a sample which is representative of the whole population. Caution is needed that the sample is valid as a representation. This is all linked to probability – measuring the likelihood that event X will occur and making an informed decision. This is a shift from what has happened (the data) to what will happen (the inference).
“Probability distribution may be thought of as histograms depicting relative frequencies” – Rogerson (2001:47). Probability is the area under the curve.
Conceptual Approaches and Laws of Probability
There are three conceptual approaches:
- Classical – equal likelihood
- Relative Frequency – based on empirical findings
- Subjective – based on personal judgement
The sample space enumerates all possible outcomes, e.g. selecting a card from a standard deck has a sample space of 52. We need to know whether a sample is mutually exclusive, e.g. selecting a J and a K when selecting a single card is mutually exclusive, selecting a 2 and a ❤️ is not. We also need to know if an event is independent or dependent.
Laws of Probability:
- Law of Subtraction – all probabilities are equal to 1, therefore if there are two possible outcomes and event x has a probability of 25% then event y will have a probability of 75%.
- Law of Multiplication – to calculate the probability that both events occur, multiply the events’ probabilities.
- Law of Addition – if two events are mutually exclusive, to calculate if event a or event b will occur add the probabilities together. If the events are not mutually exclusive, to calculate if event a or event b will occur: P(event a) + P(event b) – P(event a * event b).
Binomial and Poisson Probability Distributions
The curve of the data reflects the distribution. The assumption in inferential statistics is that the sample is random.
The Binomial Distribution
Binomial random variables are discrete (i.e. they are a series of integers and can be illustrated as a bar chart). There will be several independent repetitions of the experiment with two possible outcomes ‘success’ and ‘failure’. To calculate the probability you need to know the probability of success.
Binomial Probability Law
P(X = r) = nCr * p^r * q^(n-r)
Probability of X successes = number of possible outcomes * (probability of success ^ number of successes) * (probability of failure ^ (number of trials – number of successes))
The number of possible outcomes is calculated by using a Combination calculator. Sampling is key for accuracy – it must be representative.
The Poisson Distribution
This distribution is used for determining the probability of x events occurring over y space or time, time is more usual. Each Poisson distribution depends on the average number of occurrences of the event in a given time interval, denoted by µ.
Poisson Probability Law
P(X = r) = (µ ^ r) * (e ^ -µ) / r!
Probability of X events occurring within the time frame = (mean number of occurrences in the time period ^ number observed) * (constant 2.71829 ^ -number observed) / factorial of number observed.
The Normal Probability Distribution
This is also known as the Bell-shaped distribution or the Gaussian distribution. The normal curve is a theoretical model, it is a continuous probability distribution within a specified range. Probability is the corresponding area under the curve. The curve is described by skew and kurtosis.
The normal curve is described according to the mean µ and the standard deviation σ. For a normal distribution skewness and kurtosis values tend towards 0. The image above illustrates the empirical rule that 68.2% of values will fall within 1 standard deviation, 95.4% within 2 standard deviations, and 99.7% will fall within 3 standard deviations. Beyond these are values which are significant.
To assess normality:
- Box and Whisker
- Stem and Leaf
- Q-Q Plot
- Mean, Median, Mode
- Normality Tests
The Standard Normal Distribution has a mean of 0 and a standard deviation of 1. Z tables and z scores represent the standard normal distribution, the z table gives us the probability. If the z score is outside the scope of the table then it is assumed to be 1.
If a Normal Distribution is not standard it will need to be standardised:
z = x – µ / σ
z score = the value – mean / standard deviation.
If we know the probability and want to calculate the value:
value = z * σ + µ
value = z score * standard deviation + mean.