Hello Readers,
I believe that I figured out the problem in the program. After more testing with the edited algorithm from the USC paper and with a different, smaller dataset, I think that the outputted ranks align better with what should be coming up. Now, the greatest rank is around 0.003 or so, and the total of all the ranks is about 0.9. The smaller dataset only has about 10,000 nodes, just an eighth of the other dataset that I was testing with. I switched to this smaller one because all of the nodes would load into the program, while the larger dataset would be truncated. The smaller one performed well when run through the program. It looks as though all nodes were loaded and ranked. I also reduced the number of iterations. It appears that as you increase the number of iterations, the total rank decreases. I tried to do that with the larger dataset to see if increasing the number of iterations would have any effect on the outputted ranks. However, I ran into an issue. The most iterations that I had run so far was 100. So, I tried running 300 iterations. It ran for about 20 minutes, then ran into a Stack Overflow error. This is when the call stack pointer exceeds the stack bound. Essentially, the call stack consists of a limited amount of predefined space and if a program attempts to access memory beyond the call stack's bounds, it results in a program crash. This is usually due to infinite recursion (when a function calls itself so many times that the space needed to store the variables and information exceeds the stack limit). However, since my program has worked successfully with lower iteration amounts, it was not infinite recursion. More likely it was due to very deep recursion, which is just a recursive function that will, in theory terminate, but requires too much memory to execute completely. For the sake of being thorough, I will mention the third major cause of stack overflow errors: very large stack variables. This is an attempt to allocate more memory than will fit on the stack. For example, if you declare an array, but the array has some huge number for its index, the program may run into a stack overflow error. Anyway, back to stack overflow for my purposes. I tried again with 250 iterations and then again with 200 and once more 150, but calculating the page rank iteratively for 80,000 nodes 150 times requires a lot of space allocation. In any case, I think I solved my problem but if I want to try it out for the large dataset, I will need to increase the heap size of the JVM (Java Virtual Machine).
As always, if you have any questions or comments please feel free to leave a comment below and I will try to get back to you as soon as possible. Same goes for any clarification of topics or concepts.
Thanks for reading and have a great day!
No comments:
Post a Comment