Bruteforce and backtracking to solve NP-complete problems
Reading time30 minIn brief
Article summary
In this article, we present you an approach to solve the TSP, or more generally NP-complete problems. This approach is based on an exhaustive search of possible solutions, with a small acceleration strategy called “backtracking”.
Main takeaways
-
A bruteforce (exhaustive) approach can be used to solve a NP-Complete problem.
-
For the TSP, it builds a tree such that each branch is a possible solution.
-
The complexity of solving the TSP with a bruteforce algorithm is $O(n!)$.
-
Backtracking is an optimization that allows to stop exploring branches early in the search.
-
Branch-and-bound is another possible optimization.
Article contents
1 — Video
Please check the video below. You can also read the transcripts if you prefer.
There are some “codes” that appear in this video. Please keep in mind that these “codes” are what we call pseudo-codes, i.e., they are not written in a real programming language. Their purpose is just to describe the algorithms, not to be interpretable per se. You will thus need a bit of work to make them fully-functional Python codes if you wish to.
To go further
2 — The branch-and-bound algorithm
A third algorithm for solving exactly the TSP is the branch-and-bound algorithm.
The branch-and-bound algorithm explores the tree of possibilities (in the case of the traveling salesman, the tree of possible paths). To do this, it uses a BFS-like algorithm. At each vertex it encounters, it tries to estimate an interval (as narrow as possible) estimating the length of a shorter path obtained from the branch of the tree where it currently is.
To obtain these intervals, there are several techniques:
-
To obtain an upper bound, it is sufficient to use a greedy algorithm, that estimates the length of the remaining path by always going to the closest vertex. This results in a path that is not necessarily optimal, but which has a length that can only be longer than that of a shortest path.
-
To obtain a lower bound, we can for example count the number of vertices not yet visited (let’s call this number $k$). In the best case, they can be visited by going through the $k$ edges with the lowest weights. The sum of these $k$ weights gives the lower bound. This reduction is often coarse, and it is better to look for the finest.
When the algorithm has determined several intervals (one per branch to explore), it can compare them to each other. In the case of searching for a path of minimum length, if the interval of a branch $b_1$ has as its upper bound a value smaller than the lower bound of an interval of another branch $b_2$, then the sub-tree corresponding to $b_2$ is eliminated. Indeed, there is no point in exploring the sub-tree $b_2$ knowing that the best solution found will in any case be less interesting than the worst ones by continuing on branch $b_1$.
2.1 — Complexity
The complexity of the algorithm is in the worst case the same as an exhaustive search (if the estimation of the limits does not cost too much). This is for example the case with a complete graph in which all the edges have the same weight.
In practice, having good accuracy in estimating intervals can save a lot of time. An optimal case is one of linear complexity.
2.2 — Precision-performance compromise
As in the for all approximate algorithms, interval estimation poses the problem of the precision-performance trade-off. Indeed, the narrower the interval, the more it will allow parts of the tree to be pruned. However, if its estimation requires an expensive calculation, it has an impact on total complexity.
It is therefore necessary to find estimates of the bounds of the intervals that offer a good compromise between computation time and accuracy.
To go beyond
-
This algorithm solves the TSP in $O(n^2 \cdot 2^n)$.
-
This is a fast tool to solve the TSP and other optimization problems.
-
A possible Java/C++ implementation of the algorithm.