Document Type
Thesis
Date of Award
Spring 5-31-2016
Degree Name
Master of Science in Computer Science - (M.S.)
Department
Computer Science
First Advisor
Jason T. L. Wang
Second Advisor
Xiaoning Ding
Third Advisor
Chase Qishi Wu
Abstract
MapReduce comes from a traditional problem solving method: separating a big problem and solving each small parts. With the target of computing larger dataset in more efficient and cheaper way, this is implement into a programming mode to deal with massive quantity of data. The users get a map function and use it to abstract dataset into key / value logical pair and then use a reduce function to group all value with the same key. With this mode, task can be automatic spread the job into clusters grouped by lots of normal computers. MapReduce program can be easily implemented and gain much more efficiency than tradition computing programs. In this paper there are some sample programs and one GRN detection algorithm program to study about it.
Detecting gene regulatory networks (GRN), the regulatory molecules connection among various genes, is one of the main subjects in understanding gene biology. Although there are algorithms developed for this target, the increase of gene size and their complexity make the processing time more and more hard and slow. MapReduce mode with parallelize computing can be one way to overcome these problems. In this paper, a well-defined framework to parallelize mutual information algorithm is presented. The experiments and result performances shows the improvement of using parallelizing MapReduce model.
Recommended Citation
Du, Zongxuan, "Data analytics with mapreduce in apache spark and hadoop systems" (2016). Theses. 269.
https://digitalcommons.njit.edu/theses/269