Playing Abak Evolution with Reinforcment Learning

Reinforcment learnging has change the game for everyone on AI. I used it to train Abak's neural network with outstanding results.

In 2018, Google beat the GO Champion with its AlphaGo Neural Network, and there was so much fuss. New excellent open-source Machine Learning libraries became available, using home GPUs to speed up the training process.

I took the decision to try to do it with Abak, but when I first started, I barely knew what a neural network was. I had had a little temptation in college (1996), but lacking data, a GPU, and a goal, it was complex to address. Furthermore: Math-wise, Neural Networks is a heavy subject, and you need a lot of inspiration.

After a lot of intense and obsessive research, I was able to build a model and make it learn.

Standing on the shoulders of giants, my first intent was to learn from Dr. Gerald Tesauro, who developed TD-Gammon in the '90s. If I recall correctly, this was the first practical implementation of a reinforcement learning algorithm.

But since Abak is a two-dimensional game, I couldn't use the proposed model and had to develop my own. I decided to use the output of some of the algorithms as part of the model, as expert information, so the NN would have the learning process easier. I felt like cheating doing that.

Here is a comparison between both models, Backgammons and Abak.

Backgammon Model

Backgammon's widely used TD model describes columns. For each column of the board, 4 inputs are used to describe it.

No checkers: [0,0,0,0]
One checker: [1,0,0,0]
Two checkers: [1,1,0,0],
Three checkers: [1,1,1,0]
More than three checkers: [1,1,1,1].

It has 28 of these sets (one per column of the board) to refer to each team, so 28*2*4 inputs in total, plus a few to count checkers at the bar. Unfortunately, I'm writing this from memory, which might not be accurate.

Abak Evolution Model

Abak's model is very different. It describes checkers, and each of them has a set of features. In newer versions, I did add a simplified description of the board to complement, with results that weren't incredible, but the features are still there.

The game description model has four parts [477 inputs]:

Checkers Description (14x30): The description includes stats for the checker, distance, height, etc. It does not include the class, wich is inherited by its position in the model.
Team Status (4x2): Counting of some checker or status features.
Map of strength (24x2).
Game status: who rolls next! (1).

You can check out a complete description of the model below.

Versions:

1.- Python Cumpy (TD-V1).

The first version of this AI was trained in pure Python using Cumpy, a Numpy-like library that runs on the GPU. I wanted to learn from scratch, so even though I played for a while with Tensorflow, I decided to go nude (well, equipped with the fantastic Python and Cumpy combo).

Features:

One network that estimates the chances of team 0 winning the game.
Unaware of points.
Didn't have the "who rolls next" flag.
Didn't have the strength map
One hidden layer, with sigmoid activators.
Took 45.000 games to beat my previous AI written with expert information (GOAFI).
Learned for 4.500.000 games. And won 75% of the cases to GOFAI.
It was in production for 2 years.

2.- Tensorflow (TD-V2).

My pain with Version 1 was that it wasn't good calculating chances of winning because it was trained without the "who rolls next" flag. Somehow, the NN results were quite good for selecting a good move, but math-wise, they were not consistent. Version 2 fixed that problem and added a new network to calculate the chances of the game ending in 1, 2, or 3 points.

I chose tensor flow this time to train the new model because I wanted to learn a framework, and I found an excellent example to start for that one.

Features:

One network to calculate the chances of team 0 winning the game.
One network to calculate the probability of the game ending in 1, 2, or 3 points.
Includes the new strength map.
Two hidden layers, with different activators: Leaky RELU for the hidden layers and sigmoid in the output layer.
Took 12.000 games to win 50% of the time to TD-V1.
Learned for 350.000 games and reached an 80% winning rate against TD-V1.

3.- Version 3: On Hold.

As the final mile in the decision-making of selecting the best move, there is a simple algorithm that seeks the best Equity [%W*%p1+%W*%p2+%W*%p3].

Depending on the goal of the match, the number of points needed for each player to win, and the cube value, it weights the networks' output differently.

I would like to make a NN that handles that. That will be V3. Not in development right now.

Abak's Neural Network Model:

Checker Description (14 inputs x 30 checkers ) :

Distance to home
x/25
Amount of checkers up
x/4
Amount of checkers down
x/4
Is in the bar
[0,1]
Is at home
[0,1]
Is safe (at distance = 0).
[0,1]
Is doing a block with another checker
[0,1]
Is trappable by the druid
[0,1]
Is trapped by the druid
[0,1]
Is trapping (this input is only for each druid)
[0,1]
Risk of getting hit in the near zone (6 positions ahead)
[0..1]
Risk of getting hit in the long zone (12 positions ahead)
[0..1]
Risk of getting trapped by druid
[0..1]
Chances of movement
[0..1]

Game-related input for each team (4x2):

Amount of checkers in the bar.
x/15
Amount of Safe checkers.
x/15
All checkers are at home.
[0,1]
Amount of safe checkers 0.
[0,1]

Strength Map: (24*2):

Lastly, a map of force, similar to what Doctor Tessauros proposed for Backgammon, but simplified. For each position and for each team, it has a number between 0 and 1, representing how strong that block is:

0.0
if empty.
0.1
if there is one checker.
0.5
if that checker is a guard.
1.0
if two or more checkers are there.

Here is a comparison between both models, Backgammons and Abak.

Versions:

1.- Python Cumpy (TD-V1).

Features:

2.- Tensorflow (TD-V2).

Features:

3.- Version 3: On Hold.

Abak's Neural Network Model:

Checker Description (14 inputs x 30 checkers ) :

Game-related input for each team (4x2):

Strength Map: (24*2):

Abak Evolution Backgammon is an awesome class-based Backgammon variant, that literally adds a new dimension to the classical board game.