Statement of Purpose:
In this project, I want to determine how scheduling can improve cloud computing systems. Additionally, I hope to implement scheduling in the Spark Cloud Computing System.
BASIS Advisor: Patricia Pearson, Ph.D.
Arizona State University Advisor: Lei Ying, Ph.D.
Background:
I am interested in combining scheduling with cloud computing, because with technology becoming a larger part of our everyday lives more data needs to be stored. Advances in cloud computing will decrease the time it takes to process large amounts of data. I have taken courses in the Java programming language, and have researched the “Sparrow” randomized scheduling algorithm.
Significance:
The Spark system is
relatively new, with a speed one hundred times greater than that of its
predecessor, Hadoop MapReduce. Because it is new, Spark is advanced in software
and in hardware, but there is not too much yet on scheduling. The
implementation of scheduling algorithms in this program can help to increase
speed and, therefore, decrease expense.
Research Methodology:
For this
project, I will primarily utilize websites, scholarly research, and library
research. In my internship, I will be working closely with a professor at ASU
and some of the PhD students working with him. They will help to guide me in my
project and explain difficult concepts. I will consult scholarly research
surrounding randomized scheduling algorithms and cloud computing methods. Through
these readings, I will collect relevant information and compile it. I will also
consult the source code of the Spark program, as it is open source online.
Anticipated Problems:
I anticipate that I will have problems understanding the source code for Spark and scheduling/random scheduling algorithms. In addition, I may have trouble building up knowledge in math (statistics and probability is the primary form of mathematics that is utilized in this area) and the concepts of Spark and the language in which Spark is written. For issues in understanding Spark and other concepts, I will consult with Professor Ying and some of his PhD students. In order to overcome obstacles in learning Statistics, I will utilize online resources for examples and lessons. The language in which Spark is written is rooted in the Java programming language. Although I have experience with Java, I may have some trouble with some of the specific classes and keywords utilized in Spark’s source code. To resolve these issues, I will use websites that provide information on specific programming questions, such as Stack Exchange.
Bibliography:
1. Ousterhout, K.,
Wendell, P., Zaharia, M., & Stoica, I. (2013, September 24). Sparrow: Distributed, Low Latency Scheduling.
Retrieved from http://www.eecs.berkeley.edu/~keo/publications/sosp13-final17.pdf
2. Over 800 contributors (2015). Apache Spark (Version 1.5.2) [Software]. Available from http://spark.apache.org/downloads.html
3. Leskovec, Jure, Anand Rajaraman,
Jeffrey Ullman. Mining of Massive Datasets.
Palo Alto: Cambridge University Press, 2014. Print.
4. Flanagan, David. JavaTM in a Nutshell, Second
Edition. Sebastopol: O’Reilly & Associates, Inc., 1997. Print
5. Lippman,
Stanley B. C++ Primer. 2nd ed. Reading: Addison-Wesley, 1997.
Print.
6. Wang,
Weina, Kai Zhu, Lei Ying, Jian Tan, Li Zhang. Map Task Scheduling in MapReduce with Data Locality: Throughput and
Heavy-Traffic Optimality. PDF.
7. Zaharia,
Matei, Mosharaf Chowdury, Tathagata Das, Ankur Dave, Justin Ma, Murphy
McCauley, Michael Franklin, Scott Shenker, Ion Stoica. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for
In-Memory Cluster Computing. PDF.
8. Purohit, Abhijeet, Md. Abdul
Waheed, Asma Parveen. “LOAD BALANCING IN PUBLIC CLOUD BY DIVISION OF CLOUD
BASED ON THE GEOGRAPHICAL LOCATION.” International
Journal of Research in Engineering and Technology 3.3 (2014): 316 – 320.
Print.
9. Begum, Suriya, Dr. Prashanth.
“Review of Load Balancing in Cloud Computing.” International Journal of Research in Engineering and Technology
10.1 (2013): 343 – 352. Print.
No comments:
Post a Comment