AI Solver: Weighted Sokoban Puzzle (A*)
The aim of this project was to design and implement an intelligent planning agent for a weighted variant of the Sokoban problem puzzle. The solution uses two search algorithms to find an optimal solution. A breadth first search algorithm to find all the cells reachable by the player character, and an A* graph search, used to determine the optimal path for the Sokoban puzzle using a heuristic.
If you would like to play the game and test out the code, you can get in touch with me through the contact page :))
The Game
Sokoban is a one-player game in which the player’s objective is to push all the boxes onto designated storage locations. The player can move up, down, left, or right, given there is no obstacle in the way. Although, you may not move through walls or push more than one box at a time; boxes can only be pushed, not pulled.
In this variant of the game, each box is assigned an individual pushing cost. Thus, the weight of the box must be considered when computing the total cost of a path. The cost of an action is the weight of the box being pushed, this may be zero if there is no box or the box weight is zero, plus one for the worker’s movement.
As mentioned earlier, the solution uses two search algorithms to find an optimal solution, a breadth first search algorithm -to find all the cells reachable by the player character-, and an A* graph search -used to determine the optimal path for the Sokoban puzzle using a heuristic. For the A* graph search solution to be optimal, it must be consistent, and thus, admissible.
Solution and Features
Interpretation of the problem instance and state representation is important in optimising the computational time required by the agent for each node. It would have been inefficient to use the entire warehouse object as the state representation due to the storage required to keep it in memory. Thus, it is simpler, and computationally more effective, to represent the state as the objects that are dynamic (moving), i.e., the worker and boxes, in the state representation, and the target squares, walls, and taboo squares, as static (not moving), which remain in the problem instance and referred to when required. The use of heuristics within search problems is highly advantageous as successive iterations are interdependent, with each level deciding which avenues to choose and discard based on their proximity to the desired solution. The A* algorithm combines the logic described by a uniform-cost search algorithm, which orders priority by path cost (or backwards cost) and a greedy search algorithm, which orders priority by goal proximity (or forward cost).
The heuristic uses a ‘Manhattan distance’ formula due to the movement limitations of the game and as the position coordinated are given as vector coordinates. The ‘Manhattan distance is defined as the sum of the absolute differences between the two vectors. The heuristic used was both admissible and consistent, where it found, for each box that was not already on a target, the distance between it and each target, multiplied by the weight of the box, adding the distance between the worker and the box, and then adding together the minimum of the values found, divided by the number of boxes not on a target.
Weighted Boxes
The affect of weighted boxes is demonstrated below. Where, even though the most cost affective solution may seem to be to push the boxes to the holes closest to them. In instances where the there is a great difference in the weight of the boxes compared to the distances to the whole, it may be more cost affective to swap them, as shown below.
Performance and Limitations
Although the search agent provides an optimal solution, using an admissible and consistent heuristic, there are still many opportunities for the algorithm to improve. In particular, the A* search algorithm is potentially exponential in its use of space, thus, the use of an iterative-deepening A* search could be used to trade off space requirements for time, particularly on solutions with deeper solutions.
Another improvement could have been made in the calculation of the heuristic. As this value can lie anywhere between zero and the true cost to the solution, a heuristic that is closer to the true cost will in turn result in a more efficient algorithm. One solution could have been to calculate all permutations of boxes to targets with calculated distances and costs, although, as the number of boxes increases, the length of these calculations increases factorially. Thus, the decision must be made whether the trade-off between quality of estimate and work per node is worth it.
A limitation of the solution was found when the warehouse walls were not completely closed off. This was due to the nature of the breadth first search, where, to find the coordinates of the cells inside the warehouse, each node is expanded until there are no further nodes to expand. Thus, when the problem doesn’t have closed walls, the algorithm continues searching with no end.