Page Rank Algorithm Implementation in the Apache Spark Cloud Computing System: Senior Research Project Proposal

Statement of Purpose:

In this project, I want to determine how scheduling can improve cloud computing systems. Additionally, I hope to implement scheduling in the Spark Cloud Computing System.

BASIS Advisor: Patricia Pearson, Ph.D.

Arizona State University Advisor: Lei Ying, Ph.D.

Background:

I am interested in combining scheduling with cloud computing, because with technology becoming a larger part of our everyday lives more data needs to be stored. Advances in cloud computing will decrease the time it takes to process large amounts of data. I have taken courses in the Java programming language, and have researched the “Sparrow” randomized scheduling algorithm.

Significance:

The Spark system is relatively new, with a speed one hundred times greater than that of its predecessor, Hadoop MapReduce. Because it is new, Spark is advanced in software and in hardware, but there is not too much yet on scheduling. The implementation of scheduling algorithms in this program can help to increase speed and, therefore, decrease expense.

Research Methodology:

For this project, I will primarily utilize websites, scholarly research, and library research. In my internship, I will be working closely with a professor at ASU and some of the PhD students working with him. They will help to guide me in my project and explain difficult concepts. I will consult scholarly research surrounding randomized scheduling algorithms and cloud computing methods. Through these readings, I will collect relevant information and compile it. I will also consult the source code of the Spark program, as it is open source online.

Anticipated Problems:

I anticipate that I will have problems understanding the source code for Spark and scheduling/random scheduling algorithms. In addition, I may have trouble building up knowledge in math (statistics and probability is the primary form of mathematics that is utilized in this area) and the concepts of Spark and the language in which Spark is written. For issues in understanding Spark and other concepts, I will consult with Professor Ying and some of his PhD students. In order to overcome obstacles in learning Statistics, I will utilize online resources for examples and lessons. The language in which Spark is written is rooted in the Java programming language. Although I have experience with Java, I may have some trouble with some of the specific classes and keywords utilized in Spark’s source code. To resolve these issues, I will use websites that provide information on specific programming questions, such as Stack Exchange.

Bibliography:

1. Ousterhout, K., Wendell, P., Zaharia, M., & Stoica, I. (2013, September 24). Sparrow: Distributed, Low Latency Scheduling. Retrieved from http://www.eecs.berkeley.edu/~keo/publications/sosp13-final17.pdf

2. Over 800 contributors (2015). Apache Spark (Version 1.5.2) [Software]. Available from http://spark.apache.org/downloads.html

3. Leskovec, Jure, Anand Rajaraman, Jeffrey Ullman. Mining of Massive Datasets. Palo Alto: Cambridge University Press, 2014. Print.

4. Flanagan, David. Java_TM in a Nutshell, Second Edition. Sebastopol: O’Reilly & Associates, Inc., 1997. Print

5. Lippman, Stanley B. C++ Primer. 2nd ed. Reading: Addison-Wesley, 1997. Print.

6. Wang, Weina, Kai Zhu, Lei Ying, Jian Tan, Li Zhang. Map Task Scheduling in MapReduce with Data Locality: Throughput and Heavy-Trafﬁc Optimality. PDF.

7. Zaharia, Matei, Mosharaf Chowdury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin, Scott Shenker, Ion Stoica. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. PDF.

8. Purohit, Abhijeet, Md. Abdul Waheed, Asma Parveen. “LOAD BALANCING IN PUBLIC CLOUD BY DIVISION OF CLOUD BASED ON THE GEOGRAPHICAL LOCATION.” International Journal of Research in Engineering and Technology 3.3 (2014): 316 – 320. Print.

9. Begum, Suriya, Dr. Prashanth. “Review of Load Balancing in Cloud Computing.” International Journal of Research in Engineering and Technology 10.1 (2013): 343 – 352. Print.

Page Rank Algorithm Implementation in the Apache Spark Cloud Computing System

Pages

Friday, February 12, 2016

Senior Research Project Proposal

No comments:

Post a Comment

Blog Archive