Search and Cite your Data

By Shelley Littin, CyVerse

Citations for scientific and technical publications are like credentials behind a name. Having your work cited by another publication is not only rewarding, it adds credibility to your production. But in order to cite your work, authors need to find it first.

Digital Object Identifiers, or DOIs, allow easy access to data for a proper citation by simply clicking on the DOI, and they always point to the dataset even if it is moved to another repository. Conversely, DOIs ease the process of finding and citing data created by others that you use for your research. DOIs are used in the main scientific repositories in the U.S. and internationally.

Now, DOI submission tools within CyVerse allow you to quickly and easily generate a DOI for a dataset that you create and make publicly available in the CyVerse Data Commons Repository. As a central part of your research, the data that you create, organize and describe for publication in CyVerse can be reused and cited by others. This greatly expands the potential for curating, indexing, and cross-referencing data stored in CyVerse.

Terabytes of data are being generated through analyses done in CyVerse, but there is often no repository to house that data upon publication. The CyVerse Data Commons provides a home for datasets that community members want to make publicly available for use within CyVerse infrastructure. The Data Commons Repository houses permanent, stable datasets that are of high value to the CyVerse user community.

CyVerse subscribes to EZID to create and manage unique identifiers. When users request a DOI, EZID automatically generates one. However, it still relies on users to organize their data logically and add metadata manually (bulk upload is available to help with metadata management, and the Center for Expanded Data Annotation and Retrieval is useful for auto-curation), as well as some manual data curation.

CyVerse follows DataCite guidelines for collecting metadata for datasets published in CyVerse. DataCite metadata is focused on publication, so is generally not sufficient for data reuse. Therefore, users are requested to supply documentation on how their data was collected and how it can be used, generally in the form of a readme file.

If a canonical repository for a data type already exists, such as the National Center for Biotechnology Information for sequence data, then users are directed to store their data in the repository rather than request a DOI from CyVerse. CyVerse will issue DOIs for datasets that use sequence data if the data is in the appropriate repository.

Once published, datasets with DOIs are available anonymously in CyVerse and via the Discovery Environment under community_data >commons_repo>curated. Datasets also are available without authentication. For general questions about DOIs or identifiers, users should see Permanent Identifier FAQ, and CyVerse Ask for other questions.

CyVerse also can issue Archival Resource Keys (ARKs) upon request for datasets that are less stable. ARKS are identifiers that also point to a URL, but ARKs can be deleted by the curator. They can be used to keep track of data and easily share it with others before deciding that the data will be cited. Once ready to release a publication citing a dataset, an ARK can be transitioned into a DOI. Users should contact datacommons@cyverse.org for ARKs or with special needs for a DOI such as a complex dataset.