A few years ago, I was binge-reading the Game of Thrones books and found myself having a hard time keeping track of all the characters in my head. (This is not surprising — there are more than 150 named characters in the series!) I was going back and forth between chapters or constantly looking up the A Song of Ice and Fire wiki to remember plotlines. I needed a mental map — surely there was a better way to visualize these characters?

The basics

Pictured here is a sample network graph from Wikipedia that illustrates the contributions of Wikipedia editors to different languages. Using this example, here are some basics (or a quick refresher, if you’re already familiar) of graph theory concepts:

In this example, each edge represents (by line weight, or thickness) the number of editors who have contributed to both the languages that the line connects. This is what we call an undirected simple graph. “Undirected” meaning {en--> fr} and {fr --> en} are identical, and “simple” means no more than one edge connects each pair of vertices. The graph is also “weighted,” meaning that the thickness of the edges is relative to the strength of the relationship between the vertices. In this example, the weighted incidence function could look something like this:

While the visual representation of graphs in this way is an intuitive approach to quickly show relationships so that they are easy to comprehend, there are even richer insights we can derive from representing a dataset as a graph object.

Getting the data for the Hamilton dataset

“In data science, 80 percent of time spent is preparing data, 20 percent of time is spent complaining about the need to prepare data.”

Data scientists may not agree on everything — but we agree that the most difficult part of any project is getting the data. Lucky for us, that part is behind us for this article. There is a nice clean dataset of Hamilton lyrics readily available on Kaggle that you can simply download and start graphing.

Exploratory analysis

This is what the Hamilton dataset looks like.

There is one line of record per character/song/line of lyric.

Building an adjacency matrix

To build a network graph of all the Hamilton speakers, the following must be defined:

The incidence function I’ve chosen is the Number of songs each pair of speakers appears in together. My assumption is that the more songs two characters appear together in, the stronger their relationship.

Weight {speaker,x, speaker,y} = #songs that feature both speaker,x and speaker,y

Using R’s dplyr, I can transform my original dataset into an **{src, dest, weight}** entity, and then convert that into an adjacency matrix. I can then use graph.adjacency in R’s igraph package to create a “graph object” from this adjacency matrix, which I can then use for plotting and other analyses.

Visualizing the network plot

The graph_obj can be visualized using the plot.igraph function. Because this function has many custom layouts to choose from, I start by rendering the same graph using the “star” layout.

The result is technically a network plot. But is it possible to do even better? The chart above seems to suggest that all vertices and edges have equal importance — but that undermines the whole point of visualizing a social network. Some characters are indeed more “significant,” and some speakers have stronger relationships relative to others.

How can this graph reflect that?

This is where edge weight and vertex degree come into play. I start by playing around with the parameters of the plot.igraph function to make edge.width (i.e., the thickness of the edge in the plot) relative to the weight, and vertex.label.cex (i.e., the font size of the vertices) relative to degree.

Much better! Characters with a higher degree are visually larger, and the distinction between strong and weak relationships is also apparent from the darkness of the lines. This iteration is much more intuitive and lets the viewer immediately grasp the relationships among characters. It's also fitting that King George is a lone node, considering that his songs are always (very funny) monologues.

You could also use visNetwork library in R to make an interactive network graph. The library makes it possible to zoom in and out of multiple parts of the graph (especially useful with a particularly large graph), and has support for Shiny.

Centrality measures

Centrality is a key concept in graph theory to identify the significance of nodes:

I can use igraph’s degree(), betweenness(), and eigen_centrality() functions to get centrality for the generated graph:

It looks like Aaron Burr has the highest betweenness centrality (the “bridge”) in our graph, while Hamilton has the highest eigenvector centrality (the “influencer”). Make what you will of that.

Conclusion

Business applications of network graphs are numerous:

Fun as they are, it is important to note that network graphs are not without drawbacks when employed in production. For example, they can be resource-intensive. As is the case with any matrix operations, scalability and performance sometimes take a hit. There is also a “cold start” problem — if your dataset is too sparse or there aren’t really many relationships among entities, a network graph is not an effective solution. Used correctly and in the right context, however, they can be valuable to business.

Code:https://github.com/iswaryam/hamilton/

Dataset credit: https://www.kaggle.com/lbalter/hamilton-lyrics#

If you're a Potterhead, check out my GitHub - I've also graphed the characters of Harry Potter with a similar method.