Hi! And welcome to this lesson about graph traversal.
Finding the exit of a maze requires us to traverse part of the maze.
Graph traversal is the domain of graph theory that describes how to explore a graph, one vertex at a time, starting from an initial vertex.
In many cases, we’re interested in accessibility.
Accessibility is finding how to reach a target vertex from an initial vertex, following a path in the graph.
To obtain paths from the initial vertex to all the accessible ones, I’m going to show you how to build a tree, obtained from the initial graph, that will convey just enough information to guide us to any accessible vertex from the initial one.
Neighbors and spanning trees
Recall that a graph is a couple $G$ composed of a set of vertices $V$ and a set of edges $E$.
Now we need to define a few more things.
We define the neighbors of $u$ in $G$ as the set of vertices $v$ such that $\{u,v\}$ is in $E$.
As we’ve seen in the previous lesson, a tree is a connected graph with no cycles.
For example, on the left is a drawing of a tree. However, the graph on the right is not a tree, because it includes a cycle.
Now, a spanning tree of a graph $G$ is a tree obtained from $G$ by removing some edges.
To illustrate, consider the following graph. We can obtain a spanning tree by keeping only the following edges.
Note that for a given graph, other spanning trees can be defined, such as this one.
Finding spanning trees can have many applications.
In particular, finding a spanning tree is very useful when you want to avoid loops, such as in routing algorithms in networking.
Another example of its usefulness is when assessing if a graph is connected.
Imagine that you want to isolate a portion of a highway for maintenance.
First, you need to make sure that the graph of streets remain connected during the constructions work.
In this lesson, we’re going to see how to find a path between two vertices of a connected graph, so that we’re able to navigate between any two points.
By definition, a spanning tree will include exactly one path between those two points.
We will compare two approaches: the depth-first search and the breadth-first search.
Depth-first search
The main idea of a depth-first search is to keep exploring the graph, jumping from a given vertex to one of its unexplored neighbors.
Sometimes, all neighbors of the vertex we’re in have already been explored, in which case we turn back, following the trail in the opposite direction until we find an unexplored vertex to jump to.
Take the following example.
A depth-first search, or DFS, can be performed in the following way.
Let's start from $v_1$.
First we go up.
Note that we could have chosen to go right.
We continue exploring the graph, we go up again.
Next, we have only one unexplored neighbor, which is on the right.
So we go right.
Now we go down.
Alternatively, we could have continued right.
From here, there are four neighbors, two of which are unexplored.
We choose to go right.
Next, we go up.
At this point, we can't go further, because all the current neighbors have already been explored.
We go back to the last vertex for which there was an unexplored neighbor, which is the previous one.
And we go down.
From this vertex, there's only one remaining, unexplored neighbor, which is the one on the left.
This is the last unexplored vertex in the graph.
The graph in red is a spanning tree obtained from the graph using a DFS.
DFS is a simple yet efficient strategy to ensure that all accessible vertices are explored.
It is an easy way to find the exit of a maze, or to make sure streets remain connected during construction work.
However, it doesn't necessarily yield shortest paths between pairs of vertices in the graph.
In this example, it takes 7 hops to travel from $v_1$ to $v_2$, despite the fact that these could be linked with just 1 hop.
Breadth-first search
The other approach I will describe here is the breadth-first search.
When using a breadth-first search, or BFS, we explore the graph by gradually increasing the number of hops from the initial vertex.
Let's consider the same example, where we start in $v_1$.
We start by exploring all vertices one hop away from $v_1$: we thus connect both $v_2$ and $v_4$ to $v_1$.
Then we explore all vertices two hops away from $v_1$.
A vertex two hops away from $v_1$ is one hop away from a vertex that's one hop away from $v_1$.
So we look at the neighbors of $v_2$ and $v_4$ and connect them to $v_2$ or $v_4$.
We obtain $v_3$, $v_5$ and $v_7$.
Then we explore the vertices three hops away from vertex $v_1$.
So we look at vertices two hops away, which are $v_3$, $v_5$ and $v_7$, and at their unexplored neighbors.
We obtain $v_8$ and $v_6$.
Finally, the orange vertex is one hop away from $v_8$, which is three hops away from vertex $v_1$.
We connect them.
The resulting graph in red is a spanning tree obtained from the BFS, which is made exclusively of shortest paths from vertex $v_1$ to the other vertices of the graph.
A BFS is a cautious strategy in which we try to keep the distances to the initial vertex as small as possible during exploration.
This is especially useful when the initial vertex contains specific resources that we might need at any moment.
However, it does require more organization than its simpler DFS counterpart.
Comparison between BFS and DFS
the previous example, the spanning trees obtained by a BFS and a DFS are very different.
Here’s another example using the graph we considered earlier in this lesson.
This is a spanning tree obtained from a DFS starting from $v_1$.
While this is a spanning tree obtained from a BFS starting from $v_1$.
Because a BFS is performed by gradually increasing the hops from $v_1$, a spanning tree obtained from a BFS will always provide the shortest path from $v_1$ to any other vertex in the graph, provided that the graph is unweighted.
This can be seen clearly in this example.
Concluding words
In the next lesson, we will see how the difference between a BFS and a DFS directly relates to the type of memory structure that can be used to implement these two algorithms.