DataStaR, a Data Staging Repository for Digital Research Data
No Access Until
Permanent Link(s)
Other Titles
Author(s)
Abstract
Advances in computational capacity, the accelerating accumulation of data in many disciplines, and the resulting opportunities for new discoveries based on data sharing and re-use, demand a robust infrastructure for sharing and archiving digital research data. In an effort to serve Cornell researchers and their collaborators, we are developing an institutionally-based data staging repository (DataStaR). DataStaR is intended to facilitate the documentation and transmission of research data sets from a variety of disciplines to domain-specific repositories and/or institutional repositories. The data staging model leverages the ability of a researcher's local institution to provide accessible support and services related to research data, early in the research process, and serves to promote the deposition of data in domain-specific repositories, thus making data available to the larger research community.
For many researchers, preparing data for archiving, creating metadata, and sharing data openly with others are new and unfamiliar activities. This means that many researchers will benefit from personal assistance, regardless of the capabilities of the system or software they may use to complete these tasks. Accordingly, a key component of DataStaR's activities includes recruiting willing data owners to participate by documenting and publishing their data to established data and/or institutional repositories, with the assistance of librarians. Current research group partners include the Upper Susquehanna River Basin Agricultural Ecology Program, the Cornell Biological Field Station, the Cayuga Lake Watershed Network, the Cornell Language Acquisition Lab, as well as individual researchers.
Key features of the DataStaR platform, currently in development, include the ability for researchers to create preliminary metadata for research data sets; share preliminary data publicly, or only with selected colleagues; complete a more detailed metadata record using a form-based editor and optionally upload completed data sets to the staging repository; export metadata in any number of domain-specific formats; and re-use elements of existing metadata records in the creation of new metadata records.
An additional area of research for the DataStaR group is the evaluation and application of best practices for digital preservation repositories to the "staging" repository environment. While DataStaR is not intended to be a preservation repository, preservation of digital research data is the desired outcome. Accordingly, the group is exploring the utility of tools such as the Trusted Repositories Audit and Certification (TRAC) checklist and the Open Archival Information Systems (OAIS) reference model for informing DataStaR's system design, policies, and procedures.