Representing graphs
Reading time5 minIn brief
Article summary
In this article, you will learn solutions to represent graphs in the memory of a computer. After the video, we will present you the graph structure you will manipulate in the PyRat project.
Main takeaways
-
The adjacency matrix of a graph can be represented in memory using an array.
-
Lists and dictionaries are also solutions to represent a graph in memory.
-
Each representation has its own advantages and drawbacks.
-
In PyRat, the representation of a graph is abstracted in an object.
Article contents
1 — Video
Please check the video below. You can also read the transcripts if you prefer.
2 — The PyRat representation of a graph
In the PyRat software, the graph is encapsulated inside an object, i.e, a complex structure that provides you a convenient interface for manipulating it. Simply put, you do not have a direct access to the adjacency matrix, list, or dictionary that stores the graph in memory. Instead, you have access to numerous attributes and methods, that allow you to manipulate the graph without entering into the details of its implementation.
More precisely, let g
be an instance of a PyRat Graph
.
You can use the following:
g.vertices
– Gives you a list of vertices in the graph.g.nb_vertices
– Gives you the number of vertices in the graph.g.edges
– Gives you the list of edges in the graph.g.nb_edges
– Gives you the number of edges in the graph.g.add_vertex(v)
– Adds vertexv
to the graph.g.add_edge(v1, v2, w)
– Adds an edge of weightw
betweenv1
andv2
.g.get_neighbors(v)
– Gives you the neighbors ofv
in the graph.g.get_weight(v1, v2)
– Gives you the weight of edge betweenv1
andv2
.g.remove_vertex(v)
– Removes vertexv
and all edges attached to it from the graph.g.remove_edge(v1, v2)
– Removes edge betweenv1
andv2
from the graph.g.is_connected()
– Indicates if the graph is connected.g.has_edge(v1, v2)
– Indicates if an edge exists between verticesv1
andv2
.g.edge_is_symmetric(v1, v2)
– Indicates if edge fromv1
tov2
can be used to go fromv2
tov1
.g.minimum_spanning_tree()
– Gives you a minimum spanning tree for the graph.g.as_dict()
– Gives you a dictionary representation of the graph.g.as_numpy_ndarray()
– Gives you a matrix representation of the graph (as anumpy.ndarray
).g.as_torch_tensor()
– Gives you a matrix representation of the graph (as atorch.tensor
).
In practice, a PyRat maze is an instance of class Maze
, which inehrits from Graph
.
Therefore, it has a few additional attributes (e.g., a width
and a height
in number of cells) and methods (e.g., coords_difference(v1, v2)
to get the difference along width and height axes, i_exists(v)
to check that a cell exists in the maze).
Check the documentation of Maze.py
for all info!
You see, at no moment you need to know the structure behind to use the graph for your project. This is very practical if the PyRat developer ever decides to change that structure for any reason. Indeed, as long as the interface remains unchanged, the users’ codes will still work.
There are many existing libraries online that provide such interfaces to use graphs. Whatever language you use, it is always a good idea to check for existing implementations before re-implementing a structure from scratch!
To go further
3 — More details on the list-based solution
We have seen before that an adjacency matrix is a convenient object for representing a graph in memory.
However, in most cases, graphs are sparse objects, i.e., the number of existing edges is low compared to the number of edges of a complete graph. A direct implication is that most of the entries of the adjacency matrix are 0s. Since the number of elements in an adjacency matrix is equal to the square of the graph order, this can quickly lead to a lot of memory space used.
A possible solution to circumvent this problem is to use a different data structure: a list of lists. Let us call such an object $L = (l_1, \dots, l_n)$, with $l_1, \dots, l_n$ being lists. In this structure, $l_i$ $(i \in [1, n])$ will represent the edges that can be accessed from vertex $i$.
As an example, consider the following adjacency matrix: $${\bf A} = \begin{pmatrix} 0 & 1 & 0 & 1 & 0 & 0 & 0 \\ 1 & 0 & 1 & 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 & 0 & 0 & 0 \\ 1 & 1 & 1 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 & 1 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 \end{pmatrix}$$
Assuming vertices to be labelled from 1 to $n$, this matrix is equivalent to the list $L = ( (2, 4) , (1, 3, 4, 6), (2, 4), (1, 2, 3, 6), (6), (2, 4, 5, 7), (6) )$.
We can quickly notice that the number of stored numbers has shrunk from $n^2$ = 49 to 18.
While this solution saves some memory space, it suffers from different limitations:
-
Checking existence of an edge $\{i, j\}$ requires to go through all elements of the list $l_i$ to verify if $j$ is one of its elements. This can take some time if $i$ has a lot of neighbors. In comparison, making the same check with an adjacency matrix ${\bf A}$ takes a single operation, as one just need to verify that ${\bf A}[i, j] \neq 0$.
-
It is not as easy to extend to weighted graphs. In the case of adjacency matrices, entries represent the weight associated with the edge. Here, entries are indices of non-zero elements, which cannot be altered without creating/deleting edges. A possible solution is to replace the lists of indices $l_i = (j, k, \dots)$ with lists of couples $l_i = ( (j, w_j), (k, w_k), \dots )$, where $w_j$ is the weight of edge $\{i, j\}$.
To go beyond
-
Understanding the efficiency of GPU algorithms for matrix-matrix multiplication.
A research paper illustrating one of the main reasons why matrices are frequently used.
-
Graph Processing on FPGAs: Taxonomy, Survey, Challenges.
A research paper illustrating the use of specific hardware (here, FPGA) for processing large graphs.