Faculty Publications

A Reinforcement Learning Model Based on Temporal Difference Algorithm

Xiali Li, Minzu University of China
Zhengyu Lv, Minzu University of China
Song Wang, Minzu University of China
Zhi Wei, Department of Computer Science
Licheng Wu, Minzu University of China

Document Type

Article

Publication Date

1-1-2019

Abstract

In some sense, computer game can be used as a test bed of artificial intelligence to develop intelligent algorithms. The paper proposed a kind of intelligent method: a reinforcement learning model based on temporal difference (TD) algorithm. And then the method is used to improve the playing power of the computer game of a special kind of chess. JIU chess, also called Tibetan Go chess, is mainly played in places where Tibetan tribes gather. Its play process is divided two sequential stages: preparation and battle. The layout at preparation is vital for the successive battle, even for the final winning. Studies on Tibetan JIU chess have focused on Bayesian network based pattern extraction and chess shape based strategy, which do not perform well. To address the low chess power of JIU chess from the view of artificial intelligence, we developed a reinforcement learning model based on temporal difference (TD) algorithm for the preparation stage of JIU. First, the search range was limited within a 6 × 6 area at the center of the chessboard, and the TD learning architecture was combined with chess shapes to construct an intelligent environmental feedback system. Second, optimal state transition strategies were obtained by self-play. In addition, the results of the reinforcement learning model were output as SGF files, which act as a pattern library for the battle stage. The experimental results demonstrate that this reinforcement learning model can effectively improve the playing strength of JIU program and outperform the other methods.

Identifier

85085491035 (Scopus)

Publication Title

IEEE Access

External Full Text Location

https://doi.org/10.1109/ACCESS.2019.2938240

e-ISSN

21693536

First Page

121922

Last Page

121930

Volume

Grant

61602539

Fund Ref

National Natural Science Foundation of China

Recommended Citation

Li, Xiali; Lv, Zhengyu; Wang, Song; Wei, Zhi; and Wu, Licheng, "A Reinforcement Learning Model Based on Temporal Difference Algorithm" (2019). Faculty Publications. 8075.
https://digitalcommons.njit.edu/fac_pubs/8075

This document is currently not available here.

COinS

DOI

10.1109/ACCESS.2019.2938240

Faculty Publications

A Reinforcement Learning Model Based on Temporal Difference Algorithm

Document Type

Publication Date

Abstract

Identifier

Publication Title

External Full Text Location

e-ISSN

First Page

Last Page

Volume

Grant

Fund Ref

Recommended Citation

DOI

Search

Browse

Author Corner

Links

Faculty Publications

A Reinforcement Learning Model Based on Temporal Difference Algorithm

Authors

Document Type

Publication Date

Abstract

Identifier

Publication Title

External Full Text Location

e-ISSN

First Page

Last Page

Volume

Grant

Fund Ref

Recommended Citation

Share

DOI

Search

Browse

Author Corner

Links