Simple clear advice in plain English

The technology behind backgammon

The die is cast as artificial intelligence takes online backgammon to a new level

It is able to look 18 moves ahead, which is six more than IBM’s Big Blue. In contrast, the top backgammon programs, which are based on neural net techniques, will run happily on a bog-standard Pentium under Windows 95.

Learning by doing

Backgammon-playing programs are entirely self taught using the technique of reinforcement learning, which is one of the most promising avenues of research into artificial intelligence.

In essence, it’s no different from the way we train dogs, dishing out a reward when the animal does what we want it to and withholding the reward when it doesn’t.

In AI, the trainee or agent, is a neural network, and it is the aim of the agent to maximise the total amount of reward it receives. An agent won’t respond at all to a biscuit or a pat on the head, so the reward is a numerical one based on the agent’s most recent action.

What the agent must try to do is maximise the cumulative reward it receives for succeeding in its goal and not get hung up on the immediate reward for making one good decision.

The goal in any game is, of course, to win and it’s easy to score the outcome by awarding +1 points for a win, -1 points for losing and 0 points for a draw or uncompleted game.

An agent needs to understand the environment in which it operates and must know when it has achieved its goal, but in the case of backgammon this is incredibly simple.

A backgammon board is basically a one-dimensional race track split into 24 segments, with opponents racing in opposite directions and all checkers moving identically. A draw is impossible and the goal has been achieved when the agent gets all its checkers round the track before its opponent.

Timing is the key

The mode of reinforcement learning that has been so successful in teaching computers to play backgammon is called temporal difference (TD) learning, which is based on the differences between temporally successive predictions.

Each move by each notional player in the game (the computer plays both sides) is regarded as a time step, and there is a heuristic reward signal sent to the agent after each step and at the end of each game.

The agent learns to predict the best move by adjusting the prediction at each time step to make it more closely match the prediction at the next time step. It is the difference between successive predictions which is the only measure of error, and the program is never explicitly instructed as to what is the best move.

Gerald Tesauro, an IBM researcher, is responsible for pioneering TD techniques with backgammon. His program,

TD-Gammon, was developed after abandoning experiments with a supervised learning program called Neurogammon, in which the good and bad moves were hard-coded.

Neurogammon never reached an expert level of play, whereas TD-Gammon went on improving for 1,500,000 games and became a world-class player. Readers with long memories may recall that a version of TD-Gammon was included in the 1996 Family Funpak for OS2/Warp.

The next commercially available neural net program came in 1998, in the form of Fredrik Dahl’s Jellyfish, and this was soon followed by Olivier Egger’s Snowie. The current version of Snowie is regarded as the state of the art in terms of its playing skills and analysis tools, and it is priced accordingly.

However, there is a free alternative in the form of GNU Backgammon. This was the brainchild of Gary Wong, who by 1999 had drawn on the work of Tesauro and others to produce a neural net backgammon player called Costello.

He donated his code to the GNU Project, and GNU Backgammon (as it became known) is still under development. It plays an extremely strong game and has not stopped learning.

A version of it plays on the First Internet Backgammon Server (FIBS), where it ranks in the top 20 of over 6,000 players.

Reader Comments

   

Add your comment

All fields must be completed. Your email address will not be displayed or used to send marketing messages.

All messages will be checked by moderators before appearing on the site.

See our Privacy Policy for more information.

Related articles

Ommwriter

Ommwriter Dana word-processing software that rivals Microsoft Word

Distraction-free Ommwriter program for Windows is an alternative to Microsoft Word and other word-processing software

Hands on: How to add a search feature to Microsoft Visual Studio

Add a search feature and display an image in Visual Studio

image-happy-feet-game

Review: Happy Feet game

We really advise you not to p-p-p-pick up this penguin-fest

Question & Answer

Q.Why are some of the keys on my keyboard doing strange...

> Read the answer

Q.Is my phone’s Bluetooth any use?

> Read the answer

Q.Can I switch boot drives so that I can work on older...

> Read the answer

Best deals on the web

img

Samsung RV520-A07

£359.98- Buy it now

img

Acer Aspire 5750G (LX.RXP02.019)

£399.99- Buy it now

img

Apple MacBook Pro (MD313B/A)

£904.37- Buy it now

Latest issue & subscription deals

Poll

Are you concerned about viruses that target mobile phones?

Jargon Buster

Computing terms explained in plain English

Virtual drive

A set of files seen by Windows as a separate hard disk.

Great shopping deals from Computeractive