ACCU: Pregel: A System for Large-Scale Graph Processing with Matt Austern

Matt Austern from Google gave a talk on “Pregel: a system for large-scale graph processing” at ACCU tonite.

Pregel is a Google-internal C/STL API for parallel graph algorithm programming and operations. It was named after the Pregel river running under The Seven Bridges of Königsberg.

The Seven Bridges of Königsberg
The Seven Bridges of Königsberg in the time of Euler

Pregel is in use for dozens of batch applications at Google. (Possibly 80% of Google calculations are done by MapReduce, up to 20% by Pregel according to 1 online source.)

It does calculations in supersets and checkpoints calculations globally to Google File System (GFS) every so many supersets.

According to Matt, Google has been able to implement any graph algorithm know to be parallelizable using Pregel.

Matt showed code to implement the original published PageRank and bipartite matching in 1 slide each. 🙂

He showed some benchmarks for running a 80 billion vertex calculation on 300 machines/800 worker processes in about 10 minutes, also a 120 billion vertex problem.

Some of the issues with Pregel are that it’s speed is limited by the slowest worker node, and it cannot handle Hamiltonian cycles.

A general problem with any graph calculation software is that while breadth-first search is not a problem, depth-first may never finish.

He has observed that when implementing newer, supposedly more optimized algorithms from the literature that there is little perceived improvement in performance these days for their applications.

Matt seemed to be pretty happy with Pregel. Google uses large numbers of commodity PCs, so using a very large single computer with a different computing model is not an option.

Pregel is not Open Source, and likely never will be, as it is integrated into Google. However, there is a paper on it from SIGMOD10. The closest Open Source utility at this point is The Parallel Boost Graph Library.

Matt’s an articulate speaker on a pretty dry subject (graph algorithms) and has an interesting bio:

“Matt Austern is the author of “Generic Programming and the STL”. He’s the former chair of the C++ standardization committee’s library working group, and is a moderator of comp.std.c++. Before working at Google, Matt was at Apple and before that at SGI. Matt received his PhD in physics from Berkeley in 1994, and participated in the discovery of the top quark.”

Thanks to Symantec for hosting the meeting once again.

Google Research: Large-scale Graph Processing at Google (2009) Pregel: Google’s other data-processing infrastructure
Why the days are numbered for Hadoop as we know it
Pregel: A System for Large-Scale Graph Processing (2010)

This entry was posted in Linux, Open Source, Tech, User Groups. Bookmark the permalink.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.