Faculty Publications

Labeling News Article’s Subject Using Uncertainty Based Active Learning

Meet Parekh, CBInsights
Yash Patel, New Jersey Institute of Technology

Document Type

Conference Proceeding

Publication Date

1-1-2021

Abstract

In Natural Language Processing, labeling a text corpus is often an expensive task that requires a lot of human efforts and cost. Whereas unlabeled text corpora in varying domains are readily available. For a couple of decades, research efforts have concentrated on algorithms that can be used for labeling the corpus, thus minimizing the number of articles required to be labeled manually. Semi-Supervised Learning and Active Learning have been a great promise for labeling the articles using a trained model. Also, Semi-Supervised learning algorithms and Active learning algorithms have strong theoretical guarantees. This study aims to tag 1183 articles from The New York Times and The Wall Street Journal with the subject (i.e. primary organization related to news articles) employing Active Learning algorithm. We used Active Learning algorithm which uses Random Sampling along with Uncertainty Based Querying. This Active Learning approach is used to train Naïve Bayes classifier using Bag of Words features. This classifier is used to tag 1183 articles of which only 167 required manual review, thus achieving reduction of 85.89% with 78.18% accuracy. Also, for verifying quality of labeled corpus, SVM classifier using same features was trained on labeled corpus giving accuracy of 74.45% on test data.

Identifier

85111083932 (Scopus)

ISBN

[9783030760625]

Publication Title

Lecture Notes of the Institute for Computer Sciences Social Informatics and Telecommunications Engineering Lnicst

External Full Text Location

https://doi.org/10.1007/978-3-030-76063-2_15

e-ISSN

1867822X

ISSN

18678211

First Page

200

Last Page

208

Volume

372

Recommended Citation

Parekh, Meet and Patel, Yash, "Labeling News Article’s Subject Using Uncertainty Based Active Learning" (2021). Faculty Publications. 4509.
https://digitalcommons.njit.edu/fac_pubs/4509

This document is currently not available here.

COinS

DOI

10.1007/978-3-030-76063-2_15

Faculty Publications

Labeling News Article’s Subject Using Uncertainty Based Active Learning

Document Type

Publication Date

Abstract

Identifier

ISBN

Publication Title

External Full Text Location

e-ISSN

ISSN

First Page

Last Page

Volume

Recommended Citation

DOI

Search

Browse

Author Corner

Links

Faculty Publications

Labeling News Article’s Subject Using Uncertainty Based Active Learning

Authors

Document Type

Publication Date

Abstract

Identifier

ISBN

Publication Title

External Full Text Location

e-ISSN

ISSN

First Page

Last Page

Volume

Recommended Citation

Share

DOI

Search

Browse

Author Corner

Links