Practical activity

Duration2h30

Presentation & objectives

In this activity, you will develop a heuristic algorithm to approximate a solution to the TSP. Here, we choose to focus on a greedy algorithm.

In the optional parts, we suggest some alternate heuristics and improvements. The part on 2-OPT is not so hard, and interesting to check. Don’t hesitate to have a look!

Activity contents

1 — The greedy heuristic

1.1 — Get prepared

Before we start, let’s first create the files we need in the workspace:

pyrat_workspace/players/Greedy.py – Duplicate the file TemplatePlayer.py, and rename it Greedy.py. Then, as in the previous session, rename the class to Greedy.
pyrat_workspace/games/visualize_Greedy.ipynb – Create a file that will start a game with a Greedy player, in a maze with mud (argument mud_percentage), and 20 pieces of cheese (argument nb_cheese). Optionally, you can set a random_seed and a trace_length. You should draw inspiration from the other files in the games subdirectory you encountered in the previous session.
pyrat_workspace/tests/tests_Greedy.ipynb – Drawing inspiration from the unit test files you encountered in the previous sessions, create a test_Greedy.ipynb file. This file will contain the unit tests you will develop to test the methods of your Greedy player.

Information

As in the previous session, it is OK if you work with a maze that does not contain mud if you are late on Dijkstra’s algorithm. Still, note that for the last deliverable this will not be accepted anymore.

1.2 — Program the greedy algorithm

We want to use the following heuristic: the shortest path that goes through all pieces of cheese can be approximated by sequentially going to the closest piece of cheese. In order to write a greedy algorithm with that heuristic, follow the same steps as last time, i.e.:

Identify functions – You should first identify which functions you will need. Do not hesitate to come back to the articles you had to study for today. When identified, write their prototypes in your Greedy.py player file.
Identify tests – Determine which tests you should perform to guarantee these functions work as expected.
Program the methods – Then, write some code (tests and player) to make sure your functions work as expected, and pass the tests you designed.

Information

Later in this practical activity, we ask that you change the heuristic used by your greedy algorithm. So, please make sure that you think carefully about the functions you identify, i.e., do not write everything in one method!

The following video shows you an example of what you should observe:

1.3 — Compare with the exhaustive algorithm

As you have seen in the articles you had to study, an approximate algorithm is a compromise between precision, complexity, and sometimes other properties (required memory, need for particular devices…).

You should compare the needed time needed to grab all pieces of cheese, for the Greedy player you just created, compared to the Exhaustive players from last session. Please create a script in the games subdirectory, that will create a plot of the needed time, as a function of the number of pieces of cheese in the maze, for both players.

Information

Obviously, you should not go very high in terms of number of pieces of cheese for the Exhaustive player. Also, it would be interesting to plot averaged results over multiple games, as there might be variability across games.

1.4 — Be reactive

Up to now, you probably have followed the same reasoning as in previous sessions. In other words, the most direct way of programming the greedy algorithm is to compute the path in preprocessing, and to unroll it at each call to turn.

Now, let’s imagine you have an opponent in the game. If you follow such a predetermined path without considering that your opponent may eat some pieces of cheese in the meantime, you are definitely going to lose. Indeed, you are going to try to go for pieces of cheese that have disappeared, while the opponent will focus on remaining ones.

1.4.1 — A first improvement

Let’s rethink how the greedy algorithm works:

Consider you are on a particular cell in the maze, and you run your greedy algorithm.
The output should be a list of pieces of cheese to eat, given in the order in which to get them.
Once you have that list, you perform one action in the direction of that cheese.

Indeed, a greedy algorithm approximates a global optimum with a series of local optima. Using the description above, you should observe that you do not need to compute the whole path at once to be able to take that decision. Indeed, determining the next piece of cheese to go to is enough to be able to determine the next action.

Therefore, selecting the next piece of cheese only when you reach your target will allow you to ignore those that have been grabbed by the opponent in the meantime.

You should now create a new player GreedyEachCheese.py, that realizes this improvement. Also, create a visualize_GreedyEachCheese.ipynb script in the games subdirectory to run a game with it. Do not forget to add additional unit tests in a dedicated file if needed.

1.4.2 — Even better

The GreedyEachCheese player still has a drawback. Indeed, if the opponent grabs the piece of cheese that we are currently targetting, while we are moving in its direction, we will still finish our trajectory.

In order to make it even more reactive, let’s rething our GreedyEachCheese:

Consider you are on a particular cell in the maze, and you run your greedy algorithm.
The output should be the next piece of cheese to grab (let’s call it c).
Once you have determined c, you perform one action in its direction.

Well, if we go one step in the direction of c, we can distinguish the two following scenarios:

c is still here – Then, c is still the closest piece of cheese, as we reduced the needed turns to go there by 1 (or more of there was mud). We should continue in its direction.
c has been caught by an opponent – Then, we should focus on another piece of cheese.

Note that we can satisfy both scenarios by running our greedy algorithm at each turn to determine the next action (and not the next piece of cheese as GreedyEachCheese). By doing that, we would find the same path to follow if the cheese is still here, and would move elsewhere if needed.

You should now create a new player GreedyEachTurn.py, that realizes this improvement. Also, create a GreedyEachTurn.ipynb script in the games subdirectory to run a game with it. Do not forget to add additional unit tests in a dedicated file if needed.

1.4.3 — Compare programs

You now have three greedy algorithms. First, create a two-team game script match_Greedy_GreedyEachTurn.ipynb to have Greedy challenge GreedyEachTurn. To do so, draw inspiration from the scripts sample_game.ipynb and tutorial.ipynb in subdirectory pyrat_workspace/games.

Information

You should make the players start at random locations using a code similar to:

game.add_player(player_1, team="Team Ratz", location=StartingLocation.RANDOM)
game.add_player(player_2, team="Team Pythonz", location=StartingLocation.RANDOM)

Otherwise, both players will start at the middle of the maze, and will follow the same directions.

Then, answer the following questions:

On average on 100 games, which player wins?
By how many games?
Optional – Is it significant? To do it, you can use a one-sample T-test to verify that the list of differences in scores has (or not) a mean of 0.

Also, create another game script match_GreedyEachCheese_GreedyEachTurn.ipynb to have GreedyEachCheese challenge GreedyEachTurn, and answer the following questions:

Is it interesting to run the greedy algorithm at each turn compared to at each piece of cheese (from a results point of view)?
Optional – If there is an improvement, is it significant?

To go further

2 — 2-opt

There are many other approximate algorithms you can use to approximate the solution to the TSP. We can distinguish two categories:

Single solution – A single solution to the TSP problem is found, and then possibly improved to get a better solution. The greedy algorithm flls in this category.
Multiple solutions – Many solutions are found and mixed to produce better solutions. Genetic algorithms fall in this category.

The greedy algorithm builds a single solution and then uses it. It does not perform any improvement of that solution, although some easy ones could be made.

A simple heuristic to build upon the greedy result is 2-opt. The algorithm will start from a particular solution to the TSP (either random or built with a greedy algorithm), and will improve it gradually by removing crossing routes.

You can try to code this heuristic following indications on the Wikipedia page on 2-opt, or other resources.

3 — Factorize codes

If you have developed other heuristics, you can now try to factorize your code, so that you have an abstract class GreedyHeuristic (in the utils subdirectory), and child classes that encode the heuristics. Finally, create a player GreedyPlayer.py that exploits the greedy heuristic you want.

Here is a small UML diagram that represents a possible factorization. Check session 2 for a guide on how to read it.

4 — Another heuristic

Up to now, your heuristic was to favor the closest piece of cheese. Here are two possible heuristics you can implement:

Best $k$ pieces of cheese – Favor the piece of cheese c so that the series of $k$ pieces of cheese starting with c is the shortest. To do so, you can solve a TSP up to depth $k$.
High density areas – Favor pieces of cheese that are located in areas where there is a lot of cheese. To do so, you need to compute a measure of density for each cheese.

5 — Reuse traversals

You may have noticed already that you run multiple times Dijkstra’s algorithm across the game. Sometimes, you even use the algorithm multiple times from the same place. This can happen for instance when running the algorithm from a position you already visited earlier in the game, or when computing a heuristic from a piece of cheese.

Since it’s a costly operation, it may be a good idea to store the results into an attribute to avoid recomputing the routing tables if you already did it. A quick solution is to create an attribute and to first check its contents before running the costly code. Here is an example on how to do it:

def __init__ (...):
    self.dijkstras = {}

def my_method (...):
    if source not in self.dijkstras:
        self.dijkstras[source] = ...
    make_some_computations(self.dijkstras[source])

To go beyond

6 — Local search

Local search is a meta-heuristic that explores the space of solutions like a traversal in a graph. The idea is that we will create a graph of solutions dynamically, and explore it until we are blocked in a local optimum. In more details, what you can do to implement a local search is:

Start from a given solution $s$.
Identify all neighboring solutions of $s$. To do so, you need to define an operation that, when applied to $s$, will generate new solutions. For instance, you could permute two vertices in the solution. This will generate a neighboring solution for each pair of vertices.
Evaluate neighbors, and your new $s$ becomes the best solution among neighbors.
Loop until no neighbor is better than $s$.

Try different strategies to generate neighbors. You can also try different exploration strategies of neighbors, inspired by algorithms such as BFS or DFS.

7 — Simulated annealing

Simulated annealing is an efficient method for approximating TSP. As for local search, you need a method to generate neighbors. You can then apply simulated annealing as follows:

Start from an initial solution $s$.
Initialize the simulated annealing:
- Define the initial temperature $T$.
- Choose a temperature decay strategy, for example, exponential decay $T = T * \alpha$ where $\alpha$ is slightly less than 1 (e.g., 0.99).
- Set a stopping criterion, such as a maximum number of iterations or a minimum temperature.
For each iteration:
- Generate a new solution $s'$ by slightly modifying $s$.
- Calculate the quality difference $\delta$ between $s$ and $s'$.
- If $s'$ is better, accept it.
- If $s'$ is worse, accept it with a probability depending on the current temperature and the cost difference. This probability is calculated as $P = exp\left(-\frac{\delta}{T}\right)$.
- Reduce the temperature according to the decay strategy.
Stop when the temperature is sufficiently low or after a predefined number of iterations.