Graphs & paths

Reading time25 min

In brief

Article summary

In this article, you will learn what graphs are, and how they can provide a unified mathematical framework for multiple problems. Also, you will discover the notion of paths in graphs, which will be central to the project.

Main takeaways

  • Graphs are mathematical structures consisting of vertices and edges.

  • They can be represented graphically, or through an adjacency matrix.

  • A path is an ordered sequence of distinct edges in the graph.

Article contents

1 — Video

Please check the video below. You can also read the transcripts if you prefer.

Important

There are some “codes” that appear in this video. Please keep in mind that these “codes” are what we call pseudo-codes, i.e., they are not written in a real programming language. Their purpose is just to describe the algorithms, not to be interpretable per se. You will thus need a bit of work to make them fully-functional Python codes if you wish to.

Information

Hi, and welcome to this lesson! Today, we’re going to introduce some basic definitions related to graphs.

Basic definitions

Graph theory is probably one of the most common sub-fields of discrete mathematics. Graphs are mathematical structures that are used to represent the relationships between objects.

Mathematically, a graph is composed of a finite set of vertices, $V$, and a set of edges, $E$. We can therefore refer to a graph as a couple $G = (V, E)$ composed of $V$ and $E$.

A graph is often depicted using circles for vertices and lines for edges.

Some examples: graphs can be used to represent relationships between family members in a family tree, in which case members would be represented as vertices and their relationships as edges...

Or, similarly, to represent molecule interactions in a biological organism...

Or, neural networks in the brain...

Hyperlinks between websites...

Or, simply, a maze, in which cells can be represented as vertices, and adjacent cells not separated by a wall can be linked with edges.

That’s right. There are multiple ways to define graphs. In the simplest case, edges convey information about which pairs of vertices are connected, and which are not. This means that $E$ becomes a set of pairs of vertices.

A pair $\{u,v\}$ is an unordered set that contains two distinct elements, $u$ and $v$. Considering that they're unordered, the pairs $\{u,v\}$ and $\{v,u\}$ are identical.

The order of a graph is the number of vertices it contains, which is 7 here in the example.

The size of the graph is the number of edges, which is 9 in this example.

As you can see, this is a simple, effective way to visualize a graph.

But it might also be misleading, because the creation of a figure like this requires you to make arbitrary decisions about the coordinates of vertices and the forms of lines.

This can be seen in the following example figure, which shows two graphs that appear to be different. However, these two graphs are identical in the sense that they have exactly the same set of vertices and the same set of edges.

Graphs were first introduced by Euler in 1736.

He was interested in formally proving that no one could walk around the city of Konigsberg and cross each of its bridges exactly one time. To prove this, he showed that starting and finishing at one point would require the graph to contain 0 or 2 land masses with an odd number of bridges. Since Konigsberg land masses all contained an odd number of bridges, this route was impossible.

In some cases, edges can be directed, which means that a vertex $u$ can be connected to a vertex $v$, even if $v$ is not connected to $u$. In such cases, edges are made up of couples of vertices, such as, for example, $(u,v)$.

Couples are more informative than pairs because the order of vertices in the couple is meaningful.

For example, the couple $(u,v)$ is distinct from the couple $(v,u)$ if $u$ and $v$ are distinct themselves, whereas the pair $\{u,v\}$ and the pair $\{v,u\}$ are identical. When using couples, graphs are known as digraphs. We will only consider graphs whose edges are pairs in this course, but it's worth pointing out that the algorithms presented here do also apply to digraphs.

The adjacency matrix and weighted graphs

As in the previous examples, it's preferable to index vertices from 1 to $n$, which is the order of the graph. When vertices of the graph are indexed appropriately, an alternative way to represent a graph is to use a matrix.

The adjacency matrix of a graph is a matrix with as many rows and columns as the order of the graph. The adjacency matrix is built by putting a 1 at line $i$ and column $j$ if $\{i,j\}$ is an edge, and putting a 0 otherwise. Here's an example of a graph and its corresponding adjacency matrix.

You should note that the adjacency matrix of a graph is always symmetric.

Adjacency matrices can be generalized to take values other than 0s and 1s, which allows weighted graphs to be defined.

Weighted graphs are particularly useful when edges represent distances or delays between vertices, because values in the adjacency matrix can be used to indicate the corresponding quantities.

To take an example, a weighted graph can model the distances between major cities in the US, and connection weights can be used to represent these distances. As well as distances, weights can be used to represent occurrences, proximities, relationships, and so on.

Paths and geodesic distances

A path is an ordered sequence of edges that are distinct from one other, and is obtained from a sequence of vertices by joining any two consecutive vertices in the corresponding edge. The two extreme vertices of the sequence are called the extremities of the path.

Take a look at this graph. In this example, $(\{v_1,v_2\}, \{v_2,v_6\}, \{v_4,v_6\}, \{v_3,v_4\})$ is a path, and we've obtained it from the vertex sequence $(v_1,v_2,v_6,v_4,v_3)$.

Paths are often confused with walks. A walk is a sequence of vertices, such that any two consecutive vertices form an edge in the graph. The difference is that, in a path, an edge cannot appear twice.

For example, $(v_1, v_2, v_4, v_2)$ in the previous graph is a walk, but the corresponding sequence $(\{v_1,v_2\}, \{v_2, v_4\}, \{v_2,v_4\})$ is not a path, because of the repetition of the edge $\{v_2,v_4\}$.

A cycle in a graph is a path in which the extremities are identical.

For example, in the previous graph $(\{v_2,v_6\}, \{v_4,v_6\}, \{v_3,v_4\}, \{v_2,v_3\})$ is a cycle obtained from the sequence of vertices $(v_2, v_6, v_4, v_3, v_2)$.

The length of a path is the length of the sequence of edges. For weighted graphs, the length corresponds to the sum of the weights.

If there exists at least one path linking two vertices $u$ and $v$, there exists a shortest one. Most of the following lessons will be spent learning how to find the shortest paths between two vertices in a graph.

Finally, a graph is said to be connected if any two vertices there exists a path having these vertices as extremities.

All the example graphs we've considered so far are connected. Here's one that's not connected.

Standard graphs

Some graphs are very useful because they can appear in many situations.

For example, we encounter trees all the time: in file systems, in genealogical graphs, in sport competitions where tournaments are often depicted as trees, and so on.

Trees are connected graphs that are cycle-free.

Note that trees are often confused with rooted trees, in which connected vertices have a specific relationship with one another. Rooted trees can be defined by choosing one arbitrary vertex to be the root of the tree.

Another example of a standard graph is the complete graph. A complete graph includes all possible edges, and is often a good choice to test the abilities or the computation time of an algorithm that operates on graphs.

Mazes can also be represented as graphs. In this case, vertices can be used to represent the cells of the maze, and edges can be defined as neighboring cells that are not separated by a wall.

Concluding words

Thank you for your attention! I have really enjoyed talking about graph theory with you today.

To go further

2 — Counting walks between two vertices

An interesting problem in graph theory is to count the number of possible walks of a fixed length between two chosen vertices.

For simplicity, consider a graph with no weights or directed edges. Its adjacency matrix ${\bf A}$ therefore contains only 1s and 0s. The following result holds:

Let ${\bf M}^k$ denote the matrix power $k$ of a matrix ${\bf M}$, and let us note ${\bf M}[u, v]$ the entry of matrix ${\bf M}$ which is at intersection of its $u^{th}$ row and its $v^{th}$ column. The number of walks of length $k$ from vertex $u$ to vertex $v$ is equal to ${\bf A}^k[u, v]$.

Proof

In order to prove this, we proceed by induction, i.e., we are going to show two things:

  • That this is true under the initial conditions;
  • That if this is true for power $i$ (for any $i$), then it is true for power $i+1$.

Initial conditions

  • Case 1${\bf A}[u, v] = {\bf A}^1[u, v] = 1$ – The single walk of length 1 to go from $u$ to $v$ is to take edge $\{u,v\}$.

  • Case 2${\bf A}[u, v] = {\bf A}^1[u, v] = 0$ – There cannot be any walk of length 1 as there is no direct edge between $u$ and $v$.

Inductive step

For the purpose of induction, assume this is true for power $k$.

A walk of length $k+1$ from $u$ to $v$ can be seen as a sequence of two walks:

  • A walk of length $k$ from $u$ to some vertex $w$.

  • A walk of length 1 from $w$ to $v$.

As a consequence, the number of walks of length $k+1$ from $u$ to $v$ is equal to the sum of the number of walks of length $k$ from $u$ to $w$, each multiplied by the number of ways to go from that $w$ to $v$ in one move. Mathematically, this is given by the following equation: $${\bf A}^{k+1}[u, v] = \sum\limits_{w = 1}^n {\bf A}^k[u, w] {\bf A}[w, v]$$

Since this is exactly the formulation of the dot-product used in matrix multiplications, this concludes that ${\bf A}^{k+1}[u, v]$ gives the number of walks from $u$ to $v$.

To go beyond