Skywatchers Look to Cloud for Storing 'Tsunami' of Data
The digital archives where astronomers keep their earthly records of the heavens are running out of room. Blame better telescope cameras and the extensive sky surveys that have become possible over the past few years, said astronomer Bruce Berriman, a program manager at NASA's Infrared Processing and Analysis Center in Pasadena, Calif. The amount of data in the NASA/IPAC Infrared Science Archive, for example, has increased more than eightfold since 2008.
Worldwide, archives now store about a petabyte 1 million gigabytes of astronomical information.
By 2020, it could be as much as 100 petabytes, Berriman said. We need to develop a new model of computing in astronomy.
Huge data sets stored in astronomy archives have led to the measurement of distant quasars and the discovery of brown dwarfs, the coldest known type of stars. Further discoveries will depend on keeping mission data safely stored and readily available to astronomers. Astronomy archives also will need greater processing chops to help researchers automatically analyze these growing data sets, which are quickly getting too big for analysis by any one astronomer's computer alone, Berriman said.
Continually buying new storage would not be practical for many universities and organizations, so astronomers and their data keepers need to explore other options, Berriman argued in a report published in October in ACM Queue, a publication of the Association for Computing Machinery.
One possible solution for dealing with the oncoming tsunami of astronomical data, Berriman said, is for astronomers to mirror the current trend in consumer computing and move their data online to the cloud. Alternatively, astronomers might want to look into adapting graphical processing units the processors that power the graphics cards in computers for data crunching, or writing new software to process the increasingly complex queries astronomers are entering in databases.
For the Canadian Astronomy Data Center, the cloud is the answer. In cloud-based storage, data is kept remotely, in distant servers that can be accessed over the Internet from multiple devices and workstations, instead of in the archive's own limited machines. Astronomical data someday could be saved in much the same way your emails are saved in free online services such as Gmail or Yahoo Mail.
Around the end of 2011, we're going to stop taking data, said David Schade, the Canadian center's group leader. The center, based in Victoria, British Columbia, is moving all its data onto a nonprofit, government-funded network called the Canadian Advanced Network for Astronomical Research. The network will act like a library card catalog, Schade explained receiving researchers' queries on its website and automatically directing them to where the right information is stored.
The center has already moved 500 terabytes, or 500,000 gigabytes, of its archives onto servers in Saskatoon, nearly 800 miles (1,200 kilometers) to the east. The transition hasn't been perfectly smooth. Center staff have had to rewrite some software to deal with the remote storage, and the network doesn't always work well over long distances. It's trickier, Schade said. But the astronomers who use the center's data find it a worthwhile tradeoff, he said, because they have access to more computing power to run calculations on the data they're interested in.
A combination of cloud and the other techniques will help the world' s great astronomy archives face the oncoming data deluge, NASA's Berriman says.
All of these practices will be used somewhere, he said.