Faculty Publications

K-means-based feature learning for protein sequence classification

Paul Melman, New Jersey Institute of Technology
Usman W. Roshan, New Jersey Institute of Technology

Document Type

Conference Proceeding

Publication Date

1-1-2018

Abstract

Protein sequence classification has been a major challenge in bioinformatics and related fields for some time and remains so today. Due to the complexity and volume of protein data, algorithmic techniques such as sequence alignment are often unsuitable due to time and memory constraints. Heuristic methods based on machine learning are the dominant technique for classifying large sets of protein data. In recent years, unsupervised deep learning techniques have garnered significant attention in various domains of classification tasks, but especially for image data. In this study, we adapt a k-means-based deep learning approach that was originally developed for image classification to classify protein sequence data. We use this unsupervised learning method to preprocess the data and create new feature vectors to be classified by a traditional supervised learning algorithm such as SVM. We find the performance of this technique to be superior to that of the spectrum kernel and empirical kernel map, and comparable to that of slower distance matrix-based approaches.

Identifier

85048538587 (Scopus)

ISBN

[9781943436118]

Publication Title

Proceedings of the 10th International Conference on Bioinformatics and Computational Biology Bicob 2018

Volume

2018-March

Recommended Citation

Melman, Paul and Roshan, Usman W., "K-means-based feature learning for protein sequence classification" (2018). Faculty Publications. 8952.
https://digitalcommons.njit.edu/fac_pubs/8952

This document is currently not available here.

COinS

Faculty Publications

K-means-based feature learning for protein sequence classification

Document Type

Publication Date

Abstract

Identifier

ISBN

Publication Title

Volume

Recommended Citation

Search

Browse

Author Corner

Links

Faculty Publications

K-means-based feature learning for protein sequence classification

Authors

Document Type

Publication Date

Abstract

Identifier

ISBN

Publication Title

Volume

Recommended Citation

Share

Search

Browse

Author Corner

Links