Document Type

Thesis

Date of Award

Spring 5-31-2016

Degree Name

Master of Science in Computer Science - (M.S.)

Department

Computer Science

First Advisor

Jason T. L. Wang

Second Advisor

Xiaoning Ding

Third Advisor

Chase Qishi Wu

Abstract

MapReduce comes from a traditional problem solving method: separating a big problem and solving each small parts. With the target of computing larger dataset in more efficient and cheaper way, this is implement into a programming mode to deal with massive quantity of data. The users get a map function and use it to abstract dataset into key / value logical pair and then use a reduce function to group all value with the same key. With this mode, task can be automatic spread the job into clusters grouped by lots of normal computers. MapReduce program can be easily implemented and gain much more efficiency than tradition computing programs. In this paper there are some sample programs and one GRN detection algorithm program to study about it.

Detecting gene regulatory networks (GRN), the regulatory molecules connection among various genes, is one of the main subjects in understanding gene biology. Although there are algorithms developed for this target, the increase of gene size and their complexity make the processing time more and more hard and slow. MapReduce mode with parallelize computing can be one way to overcome these problems. In this paper, a well-defined framework to parallelize mutual information algorithm is presented. The experiments and result performances shows the improvement of using parallelizing MapReduce model.

Recommended Citation

Du, Zongxuan, "Data analytics with mapreduce in apache spark and hadoop systems" (2016). Theses. 269.
https://digitalcommons.njit.edu/theses/269

Download

Included in

Computer Sciences Commons

COinS

Theses

Data analytics with mapreduce in apache spark and hadoop systems

Document Type

Date of Award

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Abstract

Recommended Citation

Included in

Search

Browse

Author Corner

Links

Theses

Data analytics with mapreduce in apache spark and hadoop systems

Author

Document Type

Date of Award

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Abstract

Recommended Citation

Included in

Share

Search

Browse

Author Corner

Links