**Why use network models?**

An area of data visualisation, which I have been exploring in another online course (Social Network Analysis from the University of Michigan, via Coursera), are network models. Network models are designed to simplify complex networks in a manner which allows the reader to view patterns and relationships which may be hidden in a mass of data. In addition, they have the benefit that properties can be derived mathematically – which means properties and outcomes can be predicted. By exploring models in this way, we can draw conclusions about a particular network (also known as a graph).

Networks are made up of nodes (points or dots) and edges (lines). Nodes are said to have a degree (the number of connections with other nodes); in a network where the links are directed (link to or from a node), this is further defined as in-degree and out-degree. In a simple network (without self-edges or multiple-edges between nodes) the maximum degree of any node is (n-1) – the total number of nodes minus the chosen node.

**Random Graphs**

To explore a network, we need to see whether it is behaving in a particular way, and therefore need to consider whether this behaviour could be viewed as random. The Erdös-Rényi random graph makes several assumptions: that nodes connect at random, that the network is undirected, and uses a key parameter, for example the probability that any two nodes share an edge (p).

The degree distribution of a random graph will assume that the probability of whether one node is connected to another is independent (i.e. not affected by any other node being connected), and equally likely to occur/not occur – this is a binomial distribution. The binomial distribution tells us the probability that a particular node will have a specific degree (k).

Binomial Distribution = [pmath size=16] ({n-1} /k) p^k (1 – p)^{n-1-k} [/pmath]

where:

[pmath size=12]~n [/pmath] is defined as the number of nodes [pmath size=12]~k [/pmath] is defined as degree [pmath size=12]~p [/pmath] is defined as probabilityWe can also work out:

- the average degree: [pmath size=14] (n-1)p [/pmath]
- the variance: [pmath size=14] sigma^2 = (n-1)p(1-p) [/pmath]
- the standard deviation: [pmath size=14] sigma = sqrt{sigma^2} [/pmath]

**Use of NetLogo and Gephi to analyse a social network**

If we have a network we wish to explore, we can compare it to one or more models to see whether it is similar, and what the similarities and differences suggest. NetLogo is a free piece of software which enables you to run network models. Below, I have a screenshot from the Erdös-Rényi random model:

In this model, with 100 nodes, we can see that the nodes are all linked into one giant component and the average degree is 6 edges per node. The random model does not have hubs (nodes with considerably higher degree than other nodes).

I took a Facebook network (my husband’s, as I haven’t got an active Facebook account) and created a visualisation. The Facebook network data was extracted using NetGet (a version of Netvizz with reduced functionality). While Facebook allows the extraction of data like this relatively easily, other social media sites, for example Twitter, do not, for extracting data from a wider range of sites this may help (Windows only). To ensure anonymity, I have only reproduced the image of the network, and not the interactive version.

To carry out the visualisation, I used Gephi, a free piece of software which allows the user to create data visualisations. In the example below, I used the Force Atlas 2 layout algorithm, coloured the nodes according to gender (red are male, blue are female), and altered the size of the nodes to reflect the degree (the number of edges which attach to a node).

The giant component (the large central mass of nodes and edges) is a network of work colleagues. The satellite groups represent networks from previous jobs, and groups of friends from different locations and periods of time.

Unsurprisingly, the work network has nodes with higher degree than the personal networks. What is interesting is that in many of the groups there is an individual who acts as a central node, linking the group members together.

The layout suggests a globe or world map, which in itself is interesting as this could be viewed as a personal, social media ‘world’.

In the Facebook model, there are 142 nodes, and the average degree is 8, however the giant component only makes up 42% of the total, and there are several hubs of various sizes. This tells us that a Facebook network does not follow the pattern of the Erdös-Renyi model and therefore is not random. As this is a social network, the nodes are likely to be created over time rather than all at once. The presence of the hubs, suggests that some of the nodes (contacts) seem to attract more edges than others – a power-law distribution.

Visualising the network makes it much easier to see the links and patterns, and while it would be possible to garner this information from the raw data, it would take a considerable length of time.