Download PDF
IBM > Case Studies > Fast transfers improve research data accessibility for life sciences community
IBM Logo

Fast transfers improve research data accessibility for life sciences community

Technology Category
  • Application Infrastructure & Middleware - Data Exchange & Integration
Applicable Industries
  • Life Sciences
Applicable Functions
  • Product Research & Development
Services
  • Data Science Services
The Challenge
GigaScience, an online open-access, open-data life sciences journal, publishes 'big-data' articles covering a wide range of biological and biomedical sciences. The journal hosts the complete data sets associated with each published article in a comprehensive public database, GigaDB. However, the data sets submitted in support of articles published in the GigaScience journal can reach multiple terabytes in size. GigaScience found that FTP was not suitable for moving large files because transfers were often exceedingly slow, and if a user encountered a network problem, the transfer would have to be restarted from the beginning. Plus, transfers over long distances were particularly time-consuming and unreliable due to the high latency on the network. On one occasion, GigaScience was presented with the challenge of uploading a 15 TB liver-cancer data set. Ruling out FTP, GigaScience had to load the data onto 8 hard drives and physically transport it from the submitter to the journal, a costly and time-intensive process.
About The Customer
GigaScience is an online open-access, open-data life sciences journal, co-published by BGI and BioMed Central. The journal publishes 'big-data' articles covering the full spectrum of biological and biomedical sciences, including fields based on difficult-to-access data such as imaging studies, neuroscience, and systems biology. All of the manuscripts that are accepted and published in the journal focus on the use, analysis, or tool-development for large-scale data sets. GigaScience set out to provide a solution to the problem of reproducibility of data-heavy scientific studies. With a goal of making research reproducible and reusable, research articles transparent, and large-scale data easily accessible and citable, GigaScience hosts the complete data sets associated with each published article in a comprehensive public database, GigaDB. It further provides each dataset with a 'digital object identifier,' which makes it easier for people to locate the files they are looking for and also provides the means for people to directly cite the data when reusing or reproducing research.
The Solution
To handle the transfer of such enormous datasets, GigaScience adopted a suite of IBM Aspera software products to provide authors, reviewers, and other users with the tools to upload and download all the large data sets that accompany manuscripts at maximum speed. GigaScience selected IBM® Aspera® Connect Server to rapidly transfer all the data sets that accompanied submitted manuscripts to the GigaScience database and IBM® Aspera® Console to manage and monitor the entire end-to-end transfer process. Authors use Aspera’s free downloadable Aspera Connect plug-in to submit manuscript-associated data sets to a private data storage site at GigaScience. Staff reviewers then access the files, using the browser plug-in to download and upload files at high-speed. If a paper is accepted for publishing, the data is then transferred to the journal’s public database, GigaDB, via Aspera, where it is readily available for journal readers to view and download, again using the Aspera Connect plug-in.
Operational Impact
  • Fast transfers: Using Aspera’s Connect Server, uploads and downloads to GigaDB are accomplished at maximum speed, regardless of file size, transfer distance, or network conditions.
  • Ease of use: With an intuitive web-based interface and the self-installing Aspera Connect plug-in, Aspera provides ease of use for every user of GigaDB, no matter the level of computational expertise.
  • Reliability: With automatic resume and retry for partial or failed transfers, GigaScience and its users are confident their transfers will complete dependably.
Quantitative Benefit
  • Large data sets are uploaded in hours instead of days.
  • With high-speed transfers, GigaScience can review, accept, and publish manuscripts more quickly and return their decisions to the submitter within their target of two weeks.

Related Case Studies.

Contact us

Let's talk!

* Required
* Required
* Required
* Invalid email address
By submitting this form, you agree that IoT ONE may contact you with insights and marketing messaging.
No thanks, I don't want to receive any marketing emails from IoT ONE.
Submit

Thank you for your message!
We will contact you soon.