The National Institutes of Health (NIH) today announced the addition of Amazon Web Services (AWS) to its Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) Initiative. Launched in July of this year, STRIDES aims to partner with and harness the power of commercial cloud computing for NIH biomedical researchers and make high-value data and technology-intensive research more accessible to them.
The partnership is expected to speed up discoveries in biomedical research, according to Teresa Carlson, vice president of Worldwide Public Sector at AWS. “We’re committed to providing those researchers participating in the STRIDES Initiative with access to high-value NIH datasets, enabling them to further their research to study, treat and prevent the most devastating diseases.”
Amazon Web Services is a subsidiary of Amazon.com that provides on-demand cloud computing platforms-on a paid subscription basis-to governments, companies and individuals. The service allows subscribers to have access to a virtual cluster of computers, available all the time, through the Internet.
Administrators of STRIDES hope that data made available through partnerships with commercial cloud service providers (CSPs) like AWS will incorporate standards endorsed by the biomedical research community to make data Findable, Accessible, Interoperable, and Reusable (FAIR). They also want the associations to work directly with the NIH and its funded investigators to develop and test new ways to make large data sets and associated computational tools available and accessible by wider audiences. The CSPs and investigators of the NIH Data Commons Pilot Phase will set up cloud storage and services for three test case data sets that will be used to develop principles, policies and processes. Services are expected to become available to the NIH-supported community after a series of these pilot programs refine policies and procedures for the program.
The three NIH-funded test case data sets were chosen based on their value to users in the biomedical research community, the diversity of the data they contain, and their coverage of both basic and clinical research, the NIH said. According to the NIH, Data Commons efforts will expand to include other data resources once the pilot phase has achieved its primary objectives. For now, the three data sets include the following:
The GTEx program explores how human genes are expressed and regulated in different tissues, and the role that genomic variation plays in changing gene expression. GTEx has collected multiple human tissues from over 900 deceased donors whose DNA and RNA were sequenced to assess variation within their genomes, their effects on gene expression, and which tissues contribute to predisposition to disease. GTEx data and biospecimens are a research community resource.
The Model Organism Databases (MODs) provide in-depth biological data for intensively studied model organisms. Six MODs are working as a consortium with the Gene Ontology Consortium to create an integrated resource known as the Alliance of Genome Resources (AGR). The goal of the AGR is to streamline and standardize data models and interfaces, and to provide a web-based resource where data from all Alliance groups are integrated and searchable in a single place. The six participating MODs include: Saccharomyces Genome Database, WormBase, FlyBase, Zebrafish Information Network, Mouse Genome Database and Rat Genome Database.
The TOPMed program collects and pairs whole-genome sequencing (WGS) and other large-scale data with molecular, behavioral, imaging, environmental and clinical data from studies focused on heart, lung, blood and sleep (HLBS) disorders. TOPMed aims to collect WGS data from 120,000 individuals.
The NIH says the agreement with AWS will help its own researchers, as well as scientists and investigators at more than 2,500 academic institutions across the United States receiving NIH support, make use of AWS’s wide range of technologies.
“Teaming with Amazon Web Services will give NIH researchers powerful cloud-based resources to more efficiently collaborate and analyze data,” Andrea T. Norris, Director of NIH’s Center for Information Technology and NIH Chief Information Officer, said in a statement. “Expanding our cloud service provider network will allow us to provide the research community access to the tools they need to advance science. AWS’s longstanding leadership in the cloud space will help bolster the innovative research being conducted through NIH support.”
The STRIDES Initiative is part of the NIH Common Fund’s New Models of Data Stewardship Program (NMDS), which was designed to enhance biomedical discovery and improve efficiency through new digital data management strategies or in short, develop a modern “biomedical data ecosystem,” as described in the NIH Strategic Plan for Data Science.
“The way the biomedical research community interacts with data is changing,” according to the NIH. “Advances in storage, communications, and processing have led to new research methods and tools not possible a decade ago. Data-related innovations like machine learning and artificial intelligence may yield transformative changes for biomedical research.”
The NIH holds that applying data innovations to biomedical data and data-related innovation will “drive new discoveries that enable more accurate disease risk prediction, tailored diagnostics and prevention and treatment strategies.”
And yet, while there is immense potential here to advance human health, according to the NIH, “We stand at an inflection point, but there are still considerable challenges to realizing that potential.”
Launched in 2017, the New Models of Data Stewardship program will run through fiscal year 2020. The NIH says the current utility of large data sets is limited because the data are:
- difficult for users to find and access
- expensive to generate, store, download and compute on
So ultimately, participants hope the NIH Data Commons will hasten new biomedical and life-saving discoveries by developing and testing a cloud-based platform where researchers can store, share, access and interact with digital objects such as data and software generated from biomedical and behavioral research. Connecting these digital objects and making them accessible, will, according to the NIH, allow the Data Commons to promote new scientific research “that was not possible before, including hypothesis generation, discovery and validation.”