Representing graphs

Reading time5 min

In brief

Article summary

In this article, you will learn solutions to represent graphs in the memory of a computer. After the video, we will present you the graph structure you will manipulate in the PyRat project.

Main takeaways

The adjacency matrix of a graph can be represented in memory using an array.
Lists and dictionaries are also solutions to represent a graph in memory.
Each representation has its own advantages and drawbacks.
In PyRat, the representation of a graph is abstracted in an object.

Article contents

1 — Video

Please check the video below. You can also read the transcripts if you prefer.

Information

Click here to show the transcripts of the video.

Hi everyone! In this lesson, we will talk about how to choose data structures for representing graphs.

Representing data in computer memory

As I’m sure you’ll recall, computer memory is organized into locations in which data are stored.

Accessing data therefore requires us to know at which location (or address) they've been stored.

In the same way that it's easier to find a house by its address rather than by its description, or a library book by its call number, it's easier to find data when you know the specific address.

Basic data structures to represent graphs

Let’s imagine that we want to store the adjacency matrix of a graph.

One way to store data in memory is to use an array. Arrays are adjacent data structures that are used to represent sequences of data, where each piece of data uses the same size in memory.

So if you know the address of the first piece of data, the address of any other piece of data can easily be computed. For example, if the first element address is 45, and if each piece of data is stored using one address, then the fourth element address is 48.

So, this means that you can rapidly access the $n^{th}$ piece of data of an array. On the other hand, accessing the $n^{th}$ non-zero element can be time-consuming, because all cells need to be checked one by one to see whether or not they contain a zero.

In terms of graph adjacency matrices, arrays are efficient to check if two vertices $u$ and $v$ are connected by an edge.

On the other hand they are not that efficient if you want to retrieve all the neighbors of $u$, because you need to test each vertex $v$ one by one.

Another common data structure is a list. Lists are not adjacent in memory, and to find a piece of data, you need to find all the previous ones first.

The principle of lists is that each address not only contains data, but also the address of the next piece of data to look for.

So if you want to access the third piece of data in a list, you first must access the first, look at the address of the second, then look at the second to find the address of the third.

Accessing the $n^{th}$ piece of data in a list can be time-consuming. However, it is possible to overcome the time-consuming process of accessing the $n^{th}$ piece of data by only storing pieces of data of interest, which considerably reduces memory usage compared to an array.

In terms of graphs, lists can be used to store only information about the neighbors of vertices for example, since non-neighbors are often irrelevant. If each vertex has only a few neighbors, representing only neighbors with lists can significantly save memory. On the other hand, checking if $u$ and $v$ are connected by an edge may require you to search the full list of $u$‘s neighbors.

A final example of data structures is dictionaries. Dictionaries are elaborate structures that aim to combine the advantages of both arrays and lists. In particular, fast access and efficient memory usage, respectively.

Dictionaries make use of hash functions, which are basically mechanisms to transform contents in addresses. This is the optimal choice for prototyping, because it allows programmers to benefit from speed of access and low memory usage at the same time.

As a rule of thumb, dictionaries should always be used if you don’t know exactly what you’re doing.

Concluding words

Thanks for your attention! That’s it for memory structures, and I will see you again very soon to talk about good programming practices. Bye!

2 — The PyRat representation of a graph

In the PyRat software, the graph is encapsulated inside an object, i.e, a complex structure that provides you a convenient interface for manipulating it. Simply put, you do not have a direct access to the adjacency matrix, list, or dictionary that stores the graph in memory. Instead, you have access to numerous attributes and methods, that allow you to manipulate the graph without entering into the details of its implementation.

More precisely, let g be an instance of a PyRat Graph. You can use the following:

g.vertices – Gives you a list of vertices in the graph.
g.nb_vertices – Gives you the number of vertices in the graph.
g.edges – Gives you the list of edges in the graph.
g.nb_edges – Gives you the number of edges in the graph.
g.add_vertex(v) – Adds vertex v to the graph.
g.add_edge(v1, v2, w) – Adds an edge of weight w between v1 and v2.
g.get_neighbors(v) – Gives you the neighbors of v in the graph.
g.get_weight(v1, v2) – Gives you the weight of edge between v1 and v2.
g.remove_vertex(v) – Removes vertex v and all edges attached to it from the graph.
g.remove_edge(v1, v2) – Removes edge between v1 and v2 from the graph.
g.is_connected() – Indicates if the graph is connected.
g.has_edge(v1, v2) – Indicates if an edge exists between vertices v1 and v2.
g.edge_is_symmetric(v1, v2) – Indicates if edge from v1 to v2 can be used to go from v2 to v1.
g.minimum_spanning_tree() – Gives you a minimum spanning tree for the graph.
g.as_dict() – Gives you a dictionary representation of the graph.
g.as_numpy_ndarray() – Gives you a matrix representation of the graph (as a numpy.ndarray).
g.as_torch_tensor() – Gives you a matrix representation of the graph (as a torch.tensor).

Information

In practice, a PyRat maze is an instance of class Maze, which inehrits from Graph. Therefore, it has a few additional attributes (e.g., a width and a height in number of cells) and methods (e.g., coords_difference(v1, v2) to get the difference along width and height axes, i_exists(v) to check that a cell exists in the maze). Check the documentation of Maze.py for all info!

You see, at no moment you need to know the structure behind to use the graph for your project. This is very practical if the PyRat developer ever decides to change that structure for any reason. Indeed, as long as the interface remains unchanged, the users’ codes will still work.

Information

There are many existing libraries online that provide such interfaces to use graphs. Whatever language you use, it is always a good idea to check for existing implementations before re-implementing a structure from scratch!

To go further

3 — More details on the list-based solution

We have seen before that an adjacency matrix is a convenient object for representing a graph in memory.

However, in most cases, graphs are sparse objects, i.e., the number of existing edges is low compared to the number of edges of a complete graph. A direct implication is that most of the entries of the adjacency matrix are 0s. Since the number of elements in an adjacency matrix is equal to the square of the graph order, this can quickly lead to a lot of memory space used.

A possible solution to circumvent this problem is to use a different data structure: a list of lists. Let us call such an object $L = (l_1, \dots, l_n)$, with $l_1, \dots, l_n$ being lists. In this structure, $l_i$ $(i \in [1, n])$ will represent the edges that can be accessed from vertex $i$.

As an example, consider the following adjacency matrix: $${\bf A} = \begin{pmatrix} 0 & 1 & 0 & 1 & 0 & 0 & 0 \\ 1 & 0 & 1 & 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 & 0 & 0 & 0 \\ 1 & 1 & 1 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 & 1 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 \end{pmatrix}$$

Assuming vertices to be labelled from 1 to $n$, this matrix is equivalent to the list $L = ( (2, 4) , (1, 3, 4, 6), (2, 4), (1, 2, 3, 6), (6), (2, 4, 5, 7), (6) )$.

We can quickly notice that the number of stored numbers has shrunk from $n^2$ = 49 to 18.

While this solution saves some memory space, it suffers from different limitations:

Checking existence of an edge $\{i, j\}$ requires to go through all elements of the list $l_i$ to verify if $j$ is one of its elements. This can take some time if $i$ has a lot of neighbors. In comparison, making the same check with an adjacency matrix ${\bf A}$ takes a single operation, as one just need to verify that ${\bf A}[i, j] \neq 0$.
It is not as easy to extend to weighted graphs. In the case of adjacency matrices, entries represent the weight associated with the edge. Here, entries are indices of non-zero elements, which cannot be altered without creating/deleting edges. A possible solution is to replace the lists of indices $l_i = (j, k, \dots)$ with lists of couples $l_i = ( (j, w_j), (k, w_k), \dots )$, where $w_j$ is the weight of edge $\{i, j\}$.

To go beyond

Understanding the efficiency of GPU algorithms for matrix-matrix multiplication.

A research paper illustrating one of the main reasons why matrices are frequently used.
Graph Processing on FPGAs: Taxonomy, Survey, Challenges.

A research paper illustrating the use of specific hardware (here, FPGA) for processing large graphs.