Graph traversal

Reading time25 min

In brief

Article summary

In this article, you will learn how to navigate in a graph, using algorithms called traversals.

Main takeaways

  • A graph traversal explores a graph a vertex after the other, to build a spanning tree of the graph.

  • Depth-first search (DFS) goes as far as it can and comes back when stuck.

  • Breadth-first search (BFS) explores vertices in increasing distance from the start.

  • BFS finds the shortest path in a non-weighted graph.

  • Termination, correctness and complexity are important notions to evaluate an algorithm.

  • BFS and DFS have a complexity of $O(n)$.

Article contents

1 — Video

Please check the video below. You can also read the transcripts if you prefer.

Information

Hi! And welcome to this lesson about graph traversal.

Finding the exit of a maze requires us to traverse part of the maze. Graph traversal is the domain of graph theory that describes how to explore a graph, one vertex at a time, starting from an initial vertex.

In many cases, we’re interested in accessibility. Accessibility is finding how to reach a target vertex from an initial vertex, following a path in the graph. To obtain paths from the initial vertex to all the accessible ones, I’m going to show you how to build a tree, obtained from the initial graph, that will convey just enough information to guide us to any accessible vertex from the initial one.

Neighbors and spanning trees

Recall that a graph is a couple $G$ composed of a set of vertices $V$ and a set of edges $E$. Now we need to define a few more things.

We define the neighbors of $u$ in $G$ as the set of vertices $v$ such that $\{u,v\}$ is in $E$.

As we’ve seen in the previous lesson, a tree is a connected graph with no cycles.

For example, on the left is a drawing of a tree. However, the graph on the right is not a tree, because it includes a cycle.

Now, a spanning tree of a graph $G$ is a tree obtained from $G$ by removing some edges. To illustrate, consider the following graph. We can obtain a spanning tree by keeping only the following edges.

Note that for a given graph, other spanning trees can be defined, such as this one.

Finding spanning trees can have many applications. In particular, finding a spanning tree is very useful when you want to avoid loops, such as in routing algorithms in networking.

Another example of its usefulness is when assessing if a graph is connected. Imagine that you want to isolate a portion of a highway for maintenance. First, you need to make sure that the graph of streets remain connected during the constructions work.

In this lesson, we’re going to see how to find a path between two vertices of a connected graph, so that we’re able to navigate between any two points.

By definition, a spanning tree will include exactly one path between those two points.

We will compare two approaches: the depth-first search and the breadth-first search.

The main idea of a depth-first search is to keep exploring the graph, jumping from a given vertex to one of its unexplored neighbors. Sometimes, all neighbors of the vertex we’re in have already been explored, in which case we turn back, following the trail in the opposite direction until we find an unexplored vertex to jump to.

Take the following example. A depth-first search, or DFS, can be performed in the following way. Let's start from $v_1$.

First we go up. Note that we could have chosen to go right.

We continue exploring the graph, we go up again.

Next, we have only one unexplored neighbor, which is on the right. So we go right.

Now we go down. Alternatively, we could have continued right.

From here, there are four neighbors, two of which are unexplored. We choose to go right.

Next, we go up.

At this point, we can't go further, because all the current neighbors have already been explored. We go back to the last vertex for which there was an unexplored neighbor, which is the previous one.

And we go down.

From this vertex, there's only one remaining, unexplored neighbor, which is the one on the left. This is the last unexplored vertex in the graph.

The graph in red is a spanning tree obtained from the graph using a DFS.

DFS is a simple yet efficient strategy to ensure that all accessible vertices are explored.

It is an easy way to find the exit of a maze, or to make sure streets remain connected during construction work.

However, it doesn't necessarily yield shortest paths between pairs of vertices in the graph. In this example, it takes 7 hops to travel from $v_1$ to $v_2$, despite the fact that these could be linked with just 1 hop.

The other approach I will describe here is the breadth-first search. When using a breadth-first search, or BFS, we explore the graph by gradually increasing the number of hops from the initial vertex.

Let's consider the same example, where we start in $v_1$.

We start by exploring all vertices one hop away from $v_1$: we thus connect both $v_2$ and $v_4$ to $v_1$.

Then we explore all vertices two hops away from $v_1$. A vertex two hops away from $v_1$ is one hop away from a vertex that's one hop away from $v_1$. So we look at the neighbors of $v_2$ and $v_4$ and connect them to $v_2$ or $v_4$. We obtain $v_3$, $v_5$ and $v_7$.

Then we explore the vertices three hops away from vertex $v_1$. So we look at vertices two hops away, which are $v_3$, $v_5$ and $v_7$, and at their unexplored neighbors. We obtain $v_8$ and $v_6$.

Finally, the orange vertex is one hop away from $v_8$, which is three hops away from vertex $v_1$. We connect them.

The resulting graph in red is a spanning tree obtained from the BFS, which is made exclusively of shortest paths from vertex $v_1$ to the other vertices of the graph.

A BFS is a cautious strategy in which we try to keep the distances to the initial vertex as small as possible during exploration.

This is especially useful when the initial vertex contains specific resources that we might need at any moment. However, it does require more organization than its simpler DFS counterpart.

Comparison between BFS and DFS

the previous example, the spanning trees obtained by a BFS and a DFS are very different.

Here’s another example using the graph we considered earlier in this lesson.

This is a spanning tree obtained from a DFS starting from $v_1$.

While this is a spanning tree obtained from a BFS starting from $v_1$.

Because a BFS is performed by gradually increasing the hops from $v_1$, a spanning tree obtained from a BFS will always provide the shortest path from $v_1$ to any other vertex in the graph, provided that the graph is unweighted. This can be seen clearly in this example.

Concluding words

In the next lesson, we will see how the difference between a BFS and a DFS directly relates to the type of memory structure that can be used to implement these two algorithms.

To go further

Important

The content of this section is optional. It contains additional material for you to consolidate your understanding of the current topic.

When designing an algorithm, it is very important to check multiple aspects:

  • Termination – Does the program terminate? In other words, are there some scenarios in which it will loop infinitely?
  • Correctness – Does the algorithm do what it is supposed to do?
  • Complexity – Can we estimate the time needed for the algorithm to complete, given the size of its inputs?

Complexity is a difficult notion to apprehend, you will learn more about it in a dedicated programming session and later in the project. For now, just consider that an algorithm of complexity $O(n)$ requires a number of operations of the order of $n$.

There are also other properties that can be evaluated. For instance, space complexity measures the memory space needed by the program, as a function of its inputs. The three properties listed above are the most common. In the following points, we are going to evaluate them for both the BFS and the DFS algorithms.

2 — Properties of the BFS

2.1 — Termination

The BFS algorithm on a finite graph terminates.

Proof

The algorithm only visits once each vertex that can be attained. Since the number of vertices is finite, the algorithm will eventually terminate.

2.2 — Correctness

The BFS algorithm returns a shortest path from a vertex $u$ to all other vertices that can be attained from $u$ through graph edges.

Proof

We are going to show that:

  • All vertices are visited in increasing distance to $u$.
  • Obtained paths are of minimum length.
  • All attainable vertices are visited.

All vertices are visited in increasing distance to u

Let’s assume by the contradiction the first moment when a vertex $v$ is added to the FIFO, and there already exists a vertex $w$ inside it such that $d(u, v) < d(u, w)$.

Let’s note $p$ the vertex, neighbor of $v$, that adds $v$ to the FIFO. There can be two cases:

Either $w$ is closer to $u$ than $p$ and we reach a contradiction. Indeed, when $w$ was added, there was a vertex in the FIFO that was already farther. Therefore, this is not the first moment when this situation happens. Or the distance from $u$ to $p$ is the same as the one from $u$ to $w$. We therefore conclude that $v$ is closer from $u$ than $p$. Let $k$ be a predecessor of $v$ on a shortest path from $u$ to $v$.

$k$ has not been visited, otherwise $v$ would have added it to the FIFO. By recurrence, we see that a direct neighbor of $u$ hasn’t been added to the FIFO, thus $p = u$.

$v$ cannot thus be closer from $u$ than $p$.

Obtained paths are of minimum length

The proof is performed by induction. We will show that, at any moment, all paths in the FIFO are of minimal length.

This is trivially true for starting vertex $u$, as it is at distance 0 from itself.

As shown previously, all vertices are visited in increasing distance to $u$. This ensures that adding a vertex to the FIFO is always performed by a neighbor that is a predecessor along a shortest path initiated from $u$.

All attainable vertices are visited

By contradiction, let $v$ be an attainable vertex, but not yet visited, at minimal distance from $u$. Let us consider a shortest path from $u$ to $v$. Let $k$ be the predecessor of $v$ along that path.

Thus, $k$ has not been visited, otherwise it would have been added to the FIFO. Therefore, $k$ is attainable and not yet visited, and is closer from $u$ than $v$. We reach a contradiction.

2.3 — Complexity

The BFS algorithm is of complexity $O(n + m)$, where $m$ is the number of edges in the graph and $n$ is the number of vertices.

Proof

The algorithm only visits once each vertex, and checks the list of unprocessed neighbors each time. Therefore, its complexity is $O(n + m)$

3 — Properties of the DFS

3.1 — Termination

The DFS algorithm on a finite graph terminates.

Proof

The algorithm only visits once each vertex that can be attained. Since the number of vertices is finite, the algorithm will eventually terminate.

3.2 — Correctness

The DFS algorithm is such that it will visit all the vertices attainable from $u$.

Proof

To verify this property, we first remark that all the visited vertices during a DFS are indeed attainable. This can easily be shown by induction: $u$ is trivially attainable from itself, then neighboring vertices are also attainable.

Then, we have to verify that all attainable vertices are visited by the DFS. Let’s proceed by contradiction. A vertex $v$ which is attainable is, by definition, such that there exists a path from $u$ to $v$ in the graph. On such a path, we consider the vertex just before $v$. This vertex cannot be visited, otherwise $v$ would have been visited. By induction, we conclude that $u$ has not been visited, and reach a contradiction.

3.3 — Complexity

The DFS algorithm is of complexity $O(n + m)$, where $m$ is the number of edges in the graph and $n$ is the number of vertices.

Proof

The algorithm only visits once each vertex, and checks the list of unprocessed neighbors each time. Therefore, its complexity is $O(n + m)$

To go beyond

Important

The content of this section is very optional. We suggest you directions to explore if you wish to go deeper in the current topic.

  • Beam search.

    A variation of the BFS to accelerate it while sacrificing the guarantee it will find the shortest path.

  • IDDFS.

    A mixed approach between DFS and BFS that takes the best of both worlds.