So let’s break down that definition piece by piece to offer some clarity. Assuming you know the general idea behind analytics, what is the difference when we add the word “graph” to it? Consider the general statement above:
“Using a graph format to perform analysis of relationships between data based on strength and direction.”
Drilling that statement down into individual pieces, we can look at segments to gain a greater understanding of the definition.
“Using a graph format”: The technical definition of a graph is the relationship between nodes (aka vertices or points) and edges (aka links or lines).
“Analysis of relationships”: Graph analytics excels at delivering insights from relationships. The visual nature of the method makes it much easier to identify unexpected relationships and derive insights faster and quicker than using, say, a tabular format of data. While you may be able to come to the same conclusion by analyzing, for example, a spreadsheet of data, a graph format can bring this about with far less effort. The phrase “a picture is worth a thousand words” essentially applies here, and with computing tools designed to maximize the capabilities of graph analytics, insights can be determined in much more efficient ways.
“Based on strength or direction”: If you consider data points to be nodes in a graph, then the edges connecting those points define the relationship between them. Thus, strength of relationship can be derived from the density of the edge (as in, two points have a dozen relationships, so it is denser than an edge with a single connection) as well as the direction of the edge (the visual layout of nodes can translate into spatial data, where physical distance offers insight into the node.)
Never miss an update about big data! Subscribe to the Big Data Blog to receive the latest posts straight to your inbox!
A Real-World Example of Graph Analytics
The type of insight provided by graph analytics doesn’t have to be a complex technical concept; in fact, one of the easiest ways to explain graph analytics is through a party game that pretty much everyone has played at one point or another: Six Degrees of Kevin Bacon.
If you’re one of the few people on the planet who’s never heard of it, the idea behind Six Degrees of Kevin Bacon came about in the 1990s based on the theory that every actor was connected to Kevin Bacon through six degrees of working relationships. When playing with friends, the challenge is to minimize the number of degrees connecting the actor to Kevin Bacon—functionally, this is the same thing as running a query via graph analytics.
Imagine every actor working in Hollywood as a node in a massive graph, with Kevin Bacon at the center of it. Edges are drawn for every film connection between actors. We want to run a query to find the relationships that connect Kevin Bacon to another random node (actor). For this example, let’s pick Pedro Pascal (Game of Thrones, The Mandalorian). Pascal’s lengthy list of high-profile work means that he shares the cast list with many other notable actors, creating nearly limitless paths for connecting to Kevin Bacon. However, the goal is to find the shortest path to Kevin Bacon.
To do this, we run a query that analyzes the various paths (or, if you’re doing this in a party setting, you just think really hard), ultimately generating this output:
- Kevin Bacon (node) was in Crazy, Stupid, Love (edge) with Julianne Moore (node).
- Julianne Moore (node) was in Kingsman: The Golden Circle (edge) with Pedro Pascal.
The website The Oracle of Bacon (no relation to Oracle, of course) is an online database project built around this game. It should be noted that the site’s database uses a graph algorithm known as breadth-first, and in this instance, the site would give this a Bacon Number of two, because there are two edges Thus, the shortest path to connect Kevin Bacon is through Julianne Moore. That’s a pretty easy example given that Pedro Pascal currently has a high-profile stature. But what if we pick someone who’s a little more obscure—someone whose career came about prior to Kevin Bacon’s breakout period in the 1980s? For this example, let’s use Wendell Corey, a character actor who worked in the 1940s, 50s, and 60s.
If we were using an analytics tool, we would submit a query to search for the number of relationships between Kevin Bacon and Wendell Corey. This produces a Bacon Number of three:
- Kevin Bacon (node) was in Animal House (edge) with Tim Matheson (node).
- Tim Matheson (node) was in The Apple Dumping Gang Rides Again (edge) with Audrey Totter (node).
- Audrey Totter (node) was in Any Number Can Play (edge) with Wendell Corey (node).
Using a breadth-first search, even seemingly obscure connections can find relationships quickly and efficiently. How else can we use the Kevin Bacon example to demonstrate elements of graph analytics? The principle behind Six Degrees is to minimize the number of edges between nodes. To try different types of graph analyses, we can look at the strength of relationships. In this case, if the nodes are Kevin Bacon and other actors, we can assign a single edge for every time they are in a film together. Thus, the theoretical strongest relationship between Kevin Bacon and another actor comes down to the quantity of edges in their respective filmographies.
Distance between nodes offers another dimension in how data is analyzed. For example, if we place Kevin Bacon at the center of this, then the placement of nodes (coordinates) is based on how recently the film with Kevin Bacon was made. In this case, Jill Hennessey co-starred in the series City on a Hill, which was made in 2019, so her node in this analysis would be placed immediately next to Kevin Bacon at the center. Queen Latifah appeared in the film Beauty Shop with Kevin Bacon in 2005, which would place her node further out. With the nodes distributed this way, a more-refined analysis could be run based on connections made in the last ten years.
The context of the query can change as well if you take the focus off Kevin Bacon. Degree centrality, or the calculation of a node’s volume of relationships in relation to the largest volume of relationships (mathematically, a percentage calculated by node’s relationships divided by largest volume of relationships), can easily be determined using graph analytics—which shows us, by the way, that Kevin Bacon does not have the largest degree centrality among actors listed in IMDb. Consider the logistics of trying to calculate such a thing by analyzing data in a tabular format versus a graph format where nodes and edges naturally take care of a significant amount of the prep work.
Digging Deeper into Graph Analytics
Six Degrees of Kevin Bacon provides a fun and accessible example of graph analytics, but there’s much more to this method from both a functional and technical perspective. To learn more about the how, what, and why behind graph analytics, check out What Is Graph Analytics for an in-depth explanation.