40  Graphs

Consider a typical day scrolling through Instagram. You tap on a photo from a friend, then another from an influencer you admire, followed by a click on the profile of someone who commented, leading you to yet another profile you find interesting. This scenario illustrates the complex web of connections that is characteristic of social networks. In order to analyze and make sense of such connections, we need an appropriate data structure to model them.

Take, for instance, your Instagram followers. We could start by simply representing you and your followers as entities, or “vertices,” connected by lines or “edges.” The diagram below (Figure 40.1) shows such a simple representation of your followers.

G Y You Follower1 1 Y--Follower1 Follower2 2 Y--Follower2 Follower3 3 Y--Follower3 Follower4 4 Y--Follower4
Figure 40.1: A simplified representation of your Instagram followers. You (Y, colored in blue) are connected to four followers (1-4, colored in orange). Each line represents a connection (follows).

At first glance, this might appear to be a tree structure, where you are the root, and your followers are the branches. However, this isn’t the most accurate representation. In reality, your followers can also follow other people, and those individuals can have their own followers, creating a multi-tiered and mutual relationship that cannot be represented using a tree structure. Tree structures are fundamentally hierarchical, with each node having one parent at most, making them inadequate to model the complex interconnections found in social networks (Figure 40.2).

G You You Follower1 1 You->Follower1 Follower2 2 You->Follower2 Follower3 3 You->Follower3 Follower4 4 You->Follower4 Follower5 5 Follower1->Follower5 Follower6 6 Follower1->Follower6 Follower7 7 Follower1->Follower7 Follower8 8 Follower2->Follower8 Follower9 9 Follower2->Follower9 Follower10 10 Follower3->Follower10 Follower11 11 Follower3->Follower11 Follower12 12 Follower4->Follower12 Follower13 13 Follower4->Follower13
Figure 40.2: A representation of Instagram followers with multiple levels of connections. Each follower can have their own followers, creating a network of connections. Arrows indicate the direction of following, showing that the relationship is not always reciprocal.

To accurately capture these complex relationships, we use a graph data structure. Graphs consist of a set of vertices (or nodes) and a set of edges that connect them. In the context of Instagram followers, the vertices represent the users, and the edges represent the connections between them.

Furthermore, the relationships between Instagram users are not always reciprocal. You might follow a celebrity who does not follow you back. This creates a directed relationship, represented by an edge with a specific direction. Such relationships can be modeled using directed graphs. Figure 40.3 shows a graph where the edges represent the direction of follower relationships.

G You You Follower1 1 You->Follower1 Follower2 2 You->Follower2 Follower3 3 You->Follower3 Follower4 4 You->Follower4 Follower1->You Follower5 5 Follower1->Follower5 Follower6 6 Follower1->Follower6 Follower7 7 Follower1->Follower7 Follower8 8 Follower2->Follower8 Follower2->Follower8 Follower9 9 Follower2->Follower9 Follower3->Follower1 Follower10 10 Follower3->Follower10 Follower11 11 Follower3->Follower11 Follower4->Follower2 Follower12 12 Follower4->Follower12 Follower13 13 Follower4->Follower13 Follower7->Follower9 Follower10->Follower6 Follower11->Follower5 Follower13->Follower1
Figure 40.3: A directed representation of your Instagram followers. Here, an arrow going from vertex \(A\) to vertex \(B\) indicates that \(A\) follows \(B\), but \(B\) does not necessarily follow \(A\). The users are color-coded: blue for you, orange for direct followers, and light grey for followers of followers. Note that the graph reflects mutual follows, unreciprocated follows, and connections among followers, which a tree structure would not capture.

Understanding and modeling such intricate networks is critical to analyzing social media behavior. From detecting trends and predicting future connections to understanding influence within a network, the applications of this model are vast and significant. In the subsequent sections, we will delve deeper into how graph data structures work, how to implement them, and how to analyze the interesting problems they can help solve.

40.1 Introduction

Imagine standing at a subway station, tracing the colorful lines on the map that depict the various routes. Each line represents a different subway route, while each station serves as a meeting point for different lines. The interplay of lines and stations illustrates the essence of a graph - a non-linear data structure that consists of vertices (stations) and edges (subway routes) connecting these vertices.

Adjacency and incidence are two fundamental terms in the realm of graphs. Two vertices, say station A and station B, are considered adjacent if a subway line (edge) directly connects them. On the other hand, incidence refers to a station (vertex) being served by a subway line (edge).

Graphs are not merely abstract mathematical constructs but have tangible real-world applications. They model social networks, where vertices are people, and edges signify their connections. In transportation networks, vertices represent locations interconnected by edges symbolizing routes. In the era of a global pandemic, graphs can even represent Coronavirus transmission networks, where vertices symbolize individuals, and edges depict transmission paths.

40.2 Graph Terminology

To navigate through the expansive world of graphs, one must be acquainted with its language. Let’s traverse through the fundamental terms and properties that define graphs.

40.2.1 Basic Terms and Properties

Imagine a city map representing connections between various landmarks - each landmark is a vertex, and the roads linking them are the edges. The connections established are not always two-way streets. While some roads may be traversed in both directions, symbolizing an undirected graph, others might be one-way streets, indicative of a directed graph. Here, edges have a specific direction, indicating a unidirectional relationship between vertices. The diagrams Figure 40.4 and Figure 40.5 vividly illustrate this concept.

G A A B B A--B C C B--C C--A
Figure 40.4: Example of an undirected graph. Vertices A, B, and C are interconnected, demonstrating bi-directional relationships.
G A A B B A->B C C B->C C->A
Figure 40.5: Example of a directed graph. The arrowheads (highlighted in orange) demonstrate the direction of the relationships between vertices A, B, and C.

In some cities, not all roads are created equal. Some roads might be longer or more congested, carrying different costs for traversal. This scenario reflects a weighted graph where edges are not equally important but carry different weights. On the contrary, in an unweighted graph, all roads carry the same weight - a short stroll in the park or a long cross-country journey holds equal significance. The illustrations Figure 40.6 and Figure 40.7 demonstrate this.

G A A B B A--B C C B--C C--A
Figure 40.6: Example of an unweighted graph. The edges between vertices A, B, and C carry no weights.
G A A B B A--B 2 C C B--C 3 C--A 1
Figure 40.7: Example of a weighted graph. The weights (2, 3, 1) assigned to the edges between vertices A, B, and C differentiate their importance or cost.

As we traverse through the city, we encounter different traffic configurations. Some intersections allow only a single connection between landmarks, reflecting a simple graph with no self-loops. However, other intersections might have multiple routes connecting the same pair of landmarks or even allow U-turns, creating self-loops. This scenario portrays a multigraph. The diagrams Figure 40.8 and Figure 40.9 encapsulate these ideas.

G A A B B A--B C C B--C C--A
Figure 40.8: Example of a simple graph. Vertices A, B, and C are interconnected by simple edges.
G A A A--A self-loop B B A--B e1 A--B e2 C C B--C C--A
Figure 40.9: Example of a multigraph. Vertex A is connected to vertex B through two edges (a solid and a dashed line) and to itself via a self-loop (in orange).

In the city of graphs, the popularity of a landmark (vertex) might be judged by the number of roads (edges) leading to it, a concept known as the degree of a vertex. A busy intersection with roads pouring in and out signifies a high-degree vertex. (See Figure 40.10 for an example.)

G A A - degree 3 B B - degree 2 A--B D D - degree 1 A--D C C - degree 2 B--C C--A
Figure 40.10: Example graph with vertex degrees. The degree of each vertex is indicated by the number in parentheses in the labels on the edges.

To travel from one landmark to another, one must choose a path, a sequence of landmarks connected by roads. Occasionally, this path might lead back to the starting point, creating a cycle, a closed path where the journey starts and ends at the same landmark. (See Figure 40.11 and Figure 40.12 for examples.)

G A A B B A--B C C B--C D D C--D
Figure 40.11: Example graph with a path from A to D.
G A A B B A--B C C B--C C--A
Figure 40.12: Example graph with a cycle. (A-B-C)

Cities can be busy, bustling with interconnected landmarks, or they can be quiet towns with isolated points of interest. A city with a path between every pair of landmarks is a connected graph, while a city with isolated landmarks, unreachable from others, is a disconnected graph. This concept is illustrated in Figure 40.13 and Figure 40.14.

G A A B B A--B C C B--C C--A
Figure 40.13: Example of a connected graph.
G A A B B A--B C C B--C D D E E D--E
Figure 40.14: Example of a disconnected graph.

Understanding these terms and properties provides the vocabulary to talk about graphs, setting the stage for exploring more advanced concepts and applications. As we delve deeper, we will start seeing how these properties interplay and shape the complex world of graphs.

40.2.2 Graph Notation

In mathematical terms, a graph can be represented by the notation \(G(V, E)\), where \(G\) stands for the graph, \(V\) is the set of vertices (or nodes), and \(E\) is the set of edges (or connections). This notation provides a concise, abstract representation of the graph, which aids in understanding and analyzing its structure.

40.2.3 Special Types of Graphs

Certain types of graphs, distinguished by specific properties or structures, have earned unique classifications. Three such types are: complete graphs, bipartite graphs, and trees.

  • Complete Graph: A complete graph, denoted by \(K_n\) where \(n\) is the number of vertices, is a simple graph where every pair of distinct vertices is connected by a unique edge. In a complete graph, each vertex is directly connected to every other vertex, signifying a fully interconnected network. This type of graph is commonly seen in problems concerning network connectivity and graph coloring. See Figure 40.15.
G A A B B A--B C C A--C D D A--D B--C B--D C--D
Figure 40.15: Example of a complete graph. In this graph, every pair of distinct vertices (A, B, C, D) is connected by a unique edge (shown in orange), demonstrating a fully interconnected network.
  • Bipartite Graph: A bipartite graph can be partitioned into two distinct sets, where vertices within the same set share no edges. Each edge connects a vertex from one set to a vertex in the other set, and never two vertices within the same set. This special class of graphs has extensive applications, including matching problems, scheduling problems, and in the study of molecular structures. See Figure 40.16.
G A A 1 1 A--1 2 2 A--2 B B B--1 3 3 B--3 C C C--2 C--3
Figure 40.16: Example of a bipartite graph. The graph can be partitioned into two distinct sets (shown in lightblue and lightgreen), with each edge connecting a vertex from one set to a vertex in the other set. No edges connect vertices within the same set.
  • Tree: A tree is a special type of graph that holds a significant place in computer science due to its simplicity and versatility. A tree is an undirected, connected graph with no cycles. Often, one vertex is designated as the root, and the other vertices are arranged in a parent-child relationship, forming a hierarchical structure. Trees are utilized in various areas such as organizing hierarchical data, structuring web pages, and in algorithms like searches and sorts. See Figure 40.17.
G A A B B A--B C C A--C D D B--D E E B--E F F C--F G G C--G
Figure 40.17: Example of a tree. A hierarchical structure with a designated root (A, in red) and parent-child relationships (edges in orange). The other vertices are arranged in parent-child relationships, forming a hierarchical structure.

40.3 Graph Representation

As we use graphs to model various complex scenarios in computer science, it’s crucial to understand how to translate the abstract concept of graphs into concrete data structures. This will enable us to manipulate graphs within a programming environment. In this section, we’ll explore two prominent methods for graph representation: the adjacency list and the adjacency matrix.

40.3.1 Adjacency List

In an adjacency list representation, each vertex in the graph is associated with a list of its neighboring vertices. Depending on the specific scenario, the adjacency list can be implemented via arrays of lists or as a hash table. In either case, each vertex’s key (or index) corresponds to a list of its adjacent vertices.

G A A B B A--B C C A--C D D B--D C--D
Figure 40.18: An undirected graph depicted as an example for the adjacency list. Each vertex (A in red, B in green, C in blue, D in yellow) is linked to its neighboring vertices. In the adjacency list representation, each of these vertices points to their respective neighbors.

Adjacency list representation for the graph Figure 40.18 is as follows:

Listing 40.1: Adjacency list representation.
A: [B, C]
B: [A, D]
C: [A, D]
D: [B, C]

The adjacency list is particularly effective for sparse graphs, where the number of edges is far less than the total possible number of edges. This method only stores the vertices that are directly connected, conserving memory and allowing faster iteration over the neighbors of a given vertex.

40.3.2 Adjacency Matrix

An adjacency matrix is a two-dimensional grid representation, where the cell at the i-th row and j-th column denotes the connection between the i-th and j-th vertices. In an unweighted graph, the cells contain a 1 (indicating an edge between the vertices) or a 0 (indicating no edge). In a weighted graph, the cells hold the weights of the respective edges.

G A A B B A--B C C A--C D D B--D C--D
Figure 40.19: A graph for adjacency matrix example

Adjacency matrix representation (unweighted) for the graph in Figure 40.19 is:

Listing 40.2: Adjacency matrix representation.
  A B C D
A 0 1 1 0
B 1 0 0 1
C 1 0 0 1
D 0 1 1 0

The adjacency matrix proves advantageous when working with dense graphs, where the number of edges approaches the total possible number of edges. It’s also useful when one needs to rapidly determine whether an edge exists between two vertices. However, for large sparse graphs, this method can be memory-intensive as it stores the state for every possible pair of vertices.

40.3.3 Converting Between Representations

  1. Converting from a graph diagram or notation to adjacency list/matrix
  • Start by identifying all vertices and edges in the graph.
  • For the adjacency list, create an empty list or hash table for each vertex. As you enumerate the edges, add the corresponding vertices to the lists of their connected vertices.
  • For the adjacency matrix, create a square matrix with dimensions equal to the number of vertices. As you enumerate the edges, set the cell at the intersection of the two connected vertices to 1 (or the weight of the edge in the case of a weighted graph).
  1. Converting from adjacency list/matrix to a graph diagram or notation
  • Start by identifying all vertices from the keys (in an adjacency list) or indices (in an adjacency matrix).
  • For the adjacency list, traverse each list and for each pair of vertices, draw an edge between them.
  • For the adjacency matrix, traverse the matrix and for each non-zero cell, draw an edge between the corresponding vertices. In the case of a weighted graph, the cell value corresponds to the weight of the edge.

Learning these methods to represent graphs in code and being able to switch between them equips you with the flexibility to choose the most efficient representation based on the specific requirements of your problem.

40.4 Graph Traversal

Imagine a challenge where you need to determine the average age of all Facebook users. Considering Facebook’s user base reaches into the billions, it’s untenable to hold the entire graph of the friend network in memory. Thus, a solution would involve addressing each user’s data individually, as needed. This requirement brings us to the concept of graph traversal algorithms.

Graph traversal algorithms enable us to visit each node (user, in the Facebook case), accumulate their age, and eventually calculate the average. For instance, we could load the data of a single user, push all their friends to a stack, and then continually pop from this stack, requesting each friend’s data from Facebook. As we receive data, we mark each user as ‘visited’ to avoid duplicating their age in our count. The friends of every loaded user are added to our stack, and this process repeats until our stack is empty.

This scenario underscores the crucial role of graph traversal, a foundational operation in graph theory. Graphs, due to their versatility and power, have found applications across diverse domains. Whether it’s modeling social networks, computer networks, transportation systems, or web pages, or even solving complex problems in games, graphs form an indispensable tool. The process of graph traversal allows us to navigate these graphs in various ways, leading to numerous applications like searching for specific nodes, identifying the shortest path between nodes, and analyzing the overall structure of a graph.

In the context of graph traversal, we generally commence from a start node and strive to visit all the remaining nodes. This task brings with it a host of challenges, such as dealing with unreachable nodes, avoiding revisiting nodes, and choosing the next node to visit from several potential options. Graph traversal algorithms tackle these challenges by leveraging different strategies and data structures to track visited nodes and those pending to be visited. The two most frequently used algorithms for this purpose are the Breadth-First Search (BFS) and Depth-First Search (DFS). These algorithms primarily differ in the sequence in which they visit nodes.

In many real-world scenarios, we may not have access to the entire graph upfront. Instead, we may only have information about a specific node and its adjacent nodes. For example, in the Facebook use case mentioned earlier, we don’t know the entire user network from the outset. But, we can employ graph traversal algorithms to solve problems associated with such large and dynamic graphs by visiting each node and examining their data individually, as required.

Now, let’s delve into two commonly used methods to traverse a graph:

  1. Breadth-First Search (BFS): This method is akin to exploring a graph layer by layer. We start at the root node (or any other chosen node), explore all the neighboring nodes at the current depth level before moving onto nodes at the next depth level.

  2. Depth-First Search (DFS): In contrast to BFS, DFS digs as deep as possible into the graph’s structure before backtracking. In essence, it explores an arbitrary path as far as possible before retracing steps.

By understanding these graph traversal methods and knowing how to implement them, you will be equipped to efficiently navigate and manipulate intricate graphs, enabling you to solve a wide variety of problems. Furthermore, the choice of graph traversal method can often be influenced by the representation of the graph (as an adjacency list or adjacency matrix), which we discussed in the previous section. Having a good grasp of graph representation techniques will thus aid in effective graph traversal.

40.4.1 Breadth-First Search (BFS)

Breadth-First Search (BFS) is a widely-used algorithm for graph traversal and pathfinding. The defining characteristic of BFS is that it explores a graph in ‘layers’. It begins at a chosen ‘start’ vertex and explores all the neighboring vertices at the current depth prior to moving on to nodes at the next depth level. BFS uses a queue as its core data structure, which provides an inherent ‘First In, First Out’ (FIFO) characteristic - it first visits the nodes that were introduced earlier in the process, thus ensuring that it traverses level by level in the graph.

The BFS algorithm’s layer-wise exploration makes it particularly well-suited for problems where the shortest path or minimum number of steps is required, such as navigating through a maze or finding the shortest route in a transportation network. However, it’s important to note that BFS can consume a significant amount of memory because it needs to store all the vertices of the current level.

To illustrate how BFS works, let’s consider a step-by-step BFS traversal on the following graph:

G A 1. A B 2. B A--B C 3. C A--C D 4. D B--D C--D E 5. E C--E
Figure 40.20: An example graph for BFS traversal. The numbers represent the order in which the vertices are visited.

Here’s how BFS traversal, starting from vertex A, would proceed:

  1. Visit A and add its neighbors B and C to the queue: [B, C]
  2. Visit B and add its unvisited neighbor D to the queue: [C, D]
  3. Visit C and add its unvisited neighbor E to the queue: [D, E]
  4. Visit D: [E]
  5. Visit E: []

Therefore, the BFS traversal order from A would be A, B, C, D, E.

We can encapsulate the BFS algorithm with the following pseudocode:

BFS(graph, start):
  Initialize an empty queue Q
  Mark start as visited
  Enqueue start into Q
  
  while Q is not empty:
    vertex = Dequeue(Q)
    Visit vertex
    
    for each neighbor of vertex:
      if neighbor is not visited:
        Mark neighbor as visited
        Enqueue neighbor into Q

This pseudocode presents the core logic of BFS. It starts with the ‘start’ vertex, explores its neighbors, and marks them as visited. Then it continues with the next node from the queue (the ‘oldest’ unvisited node) and repeats the process until all reachable nodes have been visited. It’s important to note that BFS will only visit nodes reachable from the ‘start’ node; any isolated nodes or nodes in a separate connected component will not be visited by this BFS execution.

BFS start Start initQ Initialize an empty queue Q start->initQ markStart Mark start as visited initQ->markStart enqueueStart Enqueue start into Q markStart->enqueueStart Qempty Is Q empty? enqueueStart->Qempty dequeue Dequeue a vertex from Q Qempty->dequeue No end End Qempty->end Yes forLoop For each neighbor of this vertex, if it is unvisited, mark it as visited and enqueue it into Q dequeue->forLoop forLoop->Qempty
Figure 40.21: Flowchart depicting the steps of the BFS algorithm.

By understanding the BFS algorithm and its application, you can efficiently solve a variety of problems involving layers or levels, shortest path, or minimal steps.

40.4.2 Depth-First Search (DFS)

Depth-First Search (DFS) is another essential technique for graph traversal and pathfinding. DFS explores a graph by visiting a vertex and its neighbors as deeply as possible before backtracking. This characteristic of DFS, going as deep as possible from a node before backtracking, is what distinguishes it from BFS. DFS can be implemented using recursion or an explicit stack data structure. The choice between recursion and stack implementation depends on the problem’s requirements and the size of the graph.

To illustrate how DFS works, consider the following graph:

A 1 A B 2 B A--B C 4 C A--C D 3 D B--D C--D E 5 E C--E
Figure 40.22: Example graph for DFS traversal. The numbers and colors indicate the order and depth of traversal, respectively.

The DFS traversal, starting from vertex A, would proceed as follows:

  1. Visit A and recurse on its first neighbor B
  2. Visit B and recurse on its first neighbor D
  3. Visit D and backtrack (no unvisited neighbors)
  4. Backtrack to A and recurse on its next neighbor C
  5. Visit C and recurse on its first neighbor E
  6. Visit E and backtrack (no unvisited neighbors)

This leads to a DFS traversal order of A, B, D, C, E.

DFS can be implemented either recursively or iteratively. Here are the pseudocodes for both methods:

Recursive implementation:

DFS(graph, vertex):
  Mark vertex as visited
  Visit vertex
  
  for each neighbor of vertex:
    if neighbor is not visited:
      DFS(graph, neighbor)
DFS start Start markVertex Mark vertex as visited start->markVertex visitVertex Visit vertex markVertex->visitVertex forLoop For each neighbor of vertex, if it is not visited, call DFS recursively visitVertex->forLoop forLoop->markVertex Recursive call end End forLoop->end
Figure 40.23: Flowchart depicting the recursive implementation of the DFS algorithm.

Iterative implementation (with a stack):

DFS(graph, start):
  Initialize an empty stack S
  Mark start as visited
  Push start onto S
  
  while S is not empty:
    vertex = Pop(S)
    Visit vertex
    
    for each neighbor of vertex:
      if neighbor is not visited:
        Mark neighbor as visited
        Push neighbor onto S
DFS start Start initStack Initialize an empty stack S start->initStack markStart Mark start as visited initStack->markStart pushStart Push start onto S markStart->pushStart Sempty Is S empty? pushStart->Sempty pop Pop a vertex from S Sempty->pop No end End Sempty->end Yes forLoop For each neighbor of vertex, if it is not visited, mark it as visited and push it onto S pop->forLoop forLoop->Sempty
Figure 40.24: Flowchart depicting the iterative implementation of the DFS algorithm.

40.4.3 Applications and Variations of BFS and DFS

BFS and DFS are versatile tools in graph theory with numerous applications. Understanding their core principles and the variations that can be implemented, provides the foundation for solving a wide range of graph-related problems:

  • Shortest path: BFS can find the shortest path between two vertices in an unweighted graph due to its level-wise exploration characteristic. BFS can be modified to not only keep track of the path length but also trace the actual path taken.

  • Connected components: Both BFS and DFS can identify the connected components of an undirected graph. This ability is particularly important in network analysis, where one might want to identify groups of interconnected nodes.

  • Topological sorting: DFS is excellent for topological sorting of a directed acyclic graph (DAG). Topological ordering is beneficial in task scheduling problems where certain tasks must be completed before others can start.

  • Bipartite graph check: A graph is bipartite if its vertices can be split into two independent sets, U and V, such that every edge connects a vertex in U to one in V. BFS or DFS can be adapted to color vertices during the traversal. If at any point during the traversal, two adjacent vertices have the same color, the graph is not bipartite.

  • Graph cycle detection: DFS can detect cycles in a graph. This feature is essential in dependency networks where cycles can lead to deadlocks.

In summary, BFS and DFS are powerful techniques for navigating the complex structures of graphs. By understanding these algorithms, their pros and cons, and their numerous applications, we can better analyze and extract meaningful information from various domains, including computer networks, social networks, transportation systems, and many others.