An adaptive wordpiece language model for learning Chinese word embeddings

Document Type

Conference Proceeding

Publication Date

8-1-2019

Abstract

Word representations are crucial for many nature language processing tasks. Most of the existing approaches learn contextual information by assigning a distinct vector to each word and pay less attention to morphology. It is a problem for them to deal with large vocabularies and rare words. In this paper we propose an Adaptive Wordpiece Language Model for learning Chinese word embeddings (AWLM), as inspired by previous observation that subword units are important for improving the learning of Chinese word representation. Specifically, a novel approach called BPE+ is established to adaptively generates variable length of grams which breaks the limitation of stroke n-grams. The semantical information extraction is completed by three elaborated parts i.e., extraction of morphological information, reinforcement of fine-grained information and extraction of semantical information. Empirical results on word similarity, word analogy, text classification and question answering verify that our method significantly outperforms several state-of-the-art methods.

Identifier

85072967510 (Scopus)

ISBN

[9781728103556]

Publication Title

IEEE International Conference on Automation Science and Engineering

External Full Text Location

https://doi.org/10.1109/COASE.2019.8843151

e-ISSN

21618089

ISSN

21618070

First Page

812

Last Page

817

Volume

2019-August

Grant

51775385

Fund Ref

National Natural Science Foundation of China

This document is currently not available here.

Share

COinS