Friday, February 12, 2016

Senior Research Project Syllabus

Student: Farhan Khan

Project Title: Scheduling in the Spark Cloud Computing System

Location: ASU Goldwater Center

BASIS Advisor: Dr. Pearson

On-site Advisor: Dr. Lei Ying

On-site Advisor Contact Information:
Office: 436 Goldwater Center
Phone (o): 480-965-7003
Fax: 480-965-3837
Email: lei.ying.2@asu.edu
Mail: Arizona State University
PO Box 875706
Tempe, AZ 85287-5706

Mode of Daily Contact: Blog

Course Goals: Scheduling in the Spark Cloud Computing System has a few main objectives: first, I will get familiar with the Spark System and scheduling algorithm; second, I will answer the question, how can scheduling improve cloud computing systems; and third, I will attempt to implement scheduling in the Spark Cloud Computing System. In pursuit of these goals, I will firstly answer the following question. What is networking for big data? Secondly, I will familiarize myself with the programming language, syntax, and different types of computations within Data Centers. Thirdly, I will learn Spark system, understand the architecture and learn to use Spark to process big data. All of this information will be learned from books, scholarly articles, and Dr. Lei Ying and his students.

Course Texts:
Leskovec, Jure, Anand Rajaraman, Jeffrey Ullman. Mining of Massive Datasets. Palo Alto: Cambridge University Press, 2014. Print.

Ousterhout, K., Wendell, P., Zaharia, M., & Stoica, I. (2013, September 24). Sparrow: Distributed, Low Latency Scheduling. Retrieved from http://www.eecs.berkeley.edu/~keo/publications/sosp13-final17.pdf

Flanagan, David. JavaTM­ in a Nutshell, Second Edition. Sebastopol: O’Reilly & Associates, Inc., 1997. Print

Lippman, Stanley B. C++ Primer. 2nd ed. Reading: Addison-Wesley, 1997. Print.

Wang, Weina, Kai Zhu, Lei Ying, Jian Tan, Li Zhang. Map Task Scheduling in MapReduce with Data Locality: Throughput and Heavy-Traffic Optimality. PDF.

Zaharia, Matei, Mosharaf Chowdury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin, Scott Shenker, Ion Stoica. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. PDF.

Purohit, Abhijeet, Md. Abdul Waheed, Asma Parveen. “LOAD BALANCING IN PUBLIC CLOUD BY DIVISION OF CLOUD BASED ON THE GEOGRAPHICAL LOCATION.” International Journal of Research in Engineering and Technology 3.3 (2014): 316 – 320. Print.

Begum, Suriya, Dr. Prashanth. “Review of Load Balancing in Cloud Computing.” International Journal of Research in Engineering and Technology 10.1 (2013): 343 – 352. Print.

Project Product Description:
I will write code for a program that conveys my understanding of the programming language, scheduling, and load balancing. The final product will be a working program that has been thoroughly tested.

No comments:

Post a Comment