World Repository of Human Genetics Now Hosted by Amazon
Amazon's cloud service is now hosting the world's largest database of human genetics.
CREDIT: Zlatko Guzmic | Shutterstock.com
The U.S. National Institutes of Health announced Friday (March 30) that it'll be hosting data from its 1,000 Genomes Project for free on Amazon's cloud service. The 1,000 Genomes Project is the world's largest database of human genetics. It was created to act as a "reference population," including people of different ethnicities around the world, and it captures all the major ways in which humankind varies genetically. Now that they are hosted on Amazon's servers, the data in 1000 Genomes will be easier and cheaper for scientists to obtain and analyze.
"[The Amazon hosting] makes the data available to researchers in a way that is more useful and that avoids the researcher having to spend lots of money on storing the data themselves, on their local systems," Eric Schadt, director of the genomics institute at the Mount Sinai School of Medicine in New York, wrote to InnovationNewsDaily in an email. "This is definitely cool."
In spite of its name, the project actually holds genetic information from 1,700 anonymous people, with 900 more to come this year. The main difficulty with the database is that it's so large — 200 terabytes, an amount that would fill 30,000 DVDs. The information in the database has always been freely available at 1000genomes.org, but before the Amazon hosting deal, scientists had to pay for the Internet bandwidth and storage space to download the data, Schadt explained. People who did not have access to the powerful computers needed to store 1,000 Genome's data couldn't read the data at all.
Amazon Web Services also offers its superpowered computing resources to researchers who want to do calculations on the enormous genetics database. For that, Amazon will charge. The company charged one pharmaceutical client $1,279 an hour to run very large calculations, the New York Times' Bits blog reported. Yet researchers may still find it to be worth the price. "Many will be willing to bear this cost because it is far less expensive than buying 500 terabytes of disk storage and a modest-sized computer cluster to analyze those data locally," Schadt wrote.
By making this genomics data more accessible and affordable to researchers, the Amazon deal may ultimately help scientists predict diseases more reliably, based on a person's genetics, Schadt wrote.
The deal is a part of a new initiative from the Obama administration that will invest $200 million to researching better ways to store, analyze and find interesting points in extremely large datasets such as 1,000 Genomes.