Pioneering Petascale Computing in Biological Sciences

Supercomputers help scientists build virtual worlds to explore blood flow for stroke prevention, design new proteins for life-saving drugs, and diagnose brain disorders. But even with today’s largest machines, researchers are only beginning to capture the key features of many complex problems.

To take supercomputers to the next level of power and realism, today’s frontier goal is “petascale” computing, announced by the National Science Foundation (NSF) in the Leadership-Class System Acquisition - Creating a Petascale Computing Environment for Science and Engineering. The initiative calls for building a supercomputer by around 2011 that is capable of sustained performance of one petaflop for important scientific applications – that’s 10^15 or one thousand trillion floating point operations per second. If all 6.5 billion people in the world worked together on a problem, each using a calculator and doing one calculation per second, it would take them 150,000 times longer than a petaflops supercomputer.

But will today’s application codes in the biological sciences, the geosciences, and other disciplines – the programs that simulate problems and analyze data – be able to scale up to take full advantage of the next generation of supercomputers?

Hand in hand with the effort to develop petascale-level high performance architectures, preparing applications to take advantage of these extraordinarily powerful machines will require a concerted effort in extreme application scaling to target and develop petascale-capable codes that can run on parallel supercomputers with hundreds of thousands of processors.

To help with this effort, the NSF Biological Sciences Directorate sponsored a workshop entitled “Petascale Computing in the Biological Sciences” in which biological and computer scientists teamed to identify applications and plan development efforts to reach petascale biology applications. Organized by Allan Snavely of the San Diego Supercomputer Center (SDSC), David Bader of the Georgia Institute of Technology, and Gwen Jacobs of Montana State University, the workshop was held at NSF headquarters in Washington D.C. August 29-30, 2006. The final workshop report is now available online in PDF format at http://www.sdsc.edu/PMaC/BioScience_Workshop/Publications/PetascaleBIOworkshopreport.pdf.

“A key workshop goal was to examine the new opportunities for progress in the biological sciences that will be made possible by having a petascale computational capability,” said Snavely, Director of SDSC’s Performance Modeling and Characterization (PMaC) laboratory. “We also worked together to identify the steps for a smooth path for the biology community to take advantage of these amazing resources when they come on line.”

The primary finding of the workshop, says Snavely, is that it is vital for collaborating teams of biologists and computer scientists to work closely together, starting now, in order to enable petascale computations of scientific importance to begin running on the petascale facility in just a few years.

In addition to the final workshop report, workshop presentations are also available at http://www.sdsc.edu/PMaC/BioScience_Workshop/biosciences.html.

SDSC hosted a related workshop, also organized by Snavely, “PetaScale Computation for the Geosciences Workshop,” on April 4-5, 2006. Presentations for this workshop are available at http://www.sdsc.edu/PMaC/GeoScience_Workshop/geosciences.html. The second part of this two-part workshop will be held in January, 2007 at the National Center for Atmospheric Research (NCAR), and a final workshop report will then be issued.

David A. Bader
David A. Bader
Distinguished Professor and Director of the Institute for Data Science

David A. Bader is a Distinguished Professor in the Department of Computer Science at New Jersey Institute of Technology.