VLDB Test Of Time Reflections

Sep 19, 2023

My collaborator Prof.

Joseph E. Gonzalez

in his post here, discusses the story of the paper, how it came about, and some lessons for other Phd students.

The VLDB Test of Time award came as a massive surprise. I read the email with slightly blurry eyes holding my required morning brew. My first thoughts were "this is most excellent spam". Then after perusing the email list more carefully and observing that a few of my fellow co-authors have replied, the reality finally sank in. My next thought were “I am getting old!” .

The paper that received the award is “Distributed GraphLab: A Framework for Machine Learning in the Cloud” published in VLDB 2012. It was a paper that we worked on for over 1.5 years, was rejected from numerous venues, and we almost gave up on it. This was the hardest paper of my PhD career, and this was the paper that got the Test of Time award.

What I had to do next, was to re-read the paper and get a fresh perspective on it. And with the re-read, I realized that it was actually a pretty good paper. The paper is remarkably dense. Having been on the other side of the paper, I know why that is the case. The original version of the paper was almost twice the number of pages and much condensing (and LaTeX tricks) were needed to reduce the paper to the target page limit.

One of the big lessons for me is that when you are knee-deep in a big task, everything can seem wrong. Whether you are finishing up a big project, cutting a “v1” release of a product, launching a website, etc. Because you are the one who has been working on all the details, you also see all the flaws. It is important to take a step back from time to time, take a look from a fresh perspective and see what went well and what was actually achieved. This lesson has served me well through much of my career. For instance a related, common advice for startups is “the first version of your product should be embarrassing”. This VLDB Test of Time Award is yet another affirmation of this perspective.

To try to make a prediction for the future with regards to the paper and how it could still be applied to the future. I offer the following beliefs:

Scale out is inevitable: Our work focused on scale-out (distributed systems) as opposed to scale up (make single machines faster). GPUs have enabled an incredible amount of scale-up, but we will eventually hit power limits, data and models will grow faster than than the machine, and we will have to scale-out.

Sparsity is inevitable: There are 3 cases,

Sparsity in memory: sparse vectors, sparse matrices, dictionaries, etc
Sparsity in communication: Not every machine should need to talk to every other machine all the time.
Sparsity in computation: Focus compute on the harder problems.

Simply, not every word sent to an LLM should involve the entire model, and not every ML task is equally difficult. Trends are starting with the rise of sparse parameters, and mixture of expert models. And we will soon need new systems and methods to handle these efficiently.

While I do not know if the GraphLab/PowerGraph model is the solution, the basic design principles we focused on of sparse dependencies and non-uniform computation I believe will persist and will return in the future.

Yucheng's Durable Store

Discussion about this post