Step-by-step guide to organize and submit SPARC datasets with SODA for SPARC
Prepare and submit SPARC datasets with SODA
The typical process for submitting your SPARC dataset consists of organizing your data according to the SPARC Data Structure (SDS), adding metadata files, uploading everything on the Pennsieve data platform where more metadata needs to be added, and finally sharing the dataset with the SPARC Curation Team who will review it for compliance. Once approved by the Curation Team, you will have to share your dataset as embargoed dataset and it will become accessible to all members of the SPARC Consortium through Pennsieve. Once the embargo period is over (one year after initial upload or after publication of related manuscript(s), wichever comes first), you will have to publish your dataset and it will then become accessible publicly through the SPARC Data Portal.
We describe below the suggested workflow for preparing and submitting your SPARC datasets with SODA. All these steps are mandatory (unless marked otherwise) if you wish to satisfy the SPARC requirements.
A. Preliminary Steps
These steps only need to be completed once.
- Dowload and install SODA
- All SPARC datasets must be uploaded on the Pennsieve data platform. Get access to Pennsieve as well as the SPARC Consortium organization on Pennsieve by filling out this form. We also suggest to request access to the SPARC Airtable sheet through the same form as it will come in handy when your prepare your SPARC metadata files.
- Download and install the Pennsieve agent required to upload files through SODA
- Watch our quick video to familiarize yourself with the user interface of SODA (note: optional but recommended)
- Read about the SPARC requirements for organizing and sharing datasets to familiarize yourself with the process (note: optional but recommended)
B. Prepare Dataset on Pennsieve
The SPARC guidelines require each dataset to have specific metadata on Pennsieve. We recommend starting with this such that everything is set on Pennsieve when you are ready to upload your data and metadata files (Step D). This metadata can be easily added to Pennsieve through SODA.
- Connect your Pennsieve account with SODA. This is only required the first time you use SODA
- Create a new Pennsieve dataset
- Make PI of the SPARC award the owner of the dataset.
- If others need to contribute to your dataset, give access to your dataset to other members/teams
- Add a subtitle
- Add a description
- Upload a banner image
- Assign a license
- Add/edits tags
C. Prepare SPARC Metadata Files
The SPARC guidelines require each dataset to have specific metadata files, as described by the SPARC Data Standards (SDS). These metadata files can be conveniently prepared through SODA.
- Prepare protocol on protocols.io following the instructions provided here. This is not supported through SODA since protocols.io already provides an intuitive interface for preparing the protocol.
- Prepare the submission file
- Prepare the dataset description file
- Prepare the README file
- If your study includes subjects, prepare the subjects file
- If your study includes samples, prepare the samples file
- If your study includes a computational model, prepare the code metadata files with help from the O2S2PARC team (email support@osparc.io)
- If you are publishing a new version of a dataset, prepare the CHANGES file
D. Organize Dataset According to the SPARC Data Structure
All SPARC datasets must be organized according to the structure described by the SPARC Data Standards (SDS). Briefly, all data must be organized into one of the following six high-level folders: primary
, source
, derivative
, code
, protocol
, and docs
. Each of these folders must have have a manifest metadata file that summarizes the content of the folder. Additionally, all the metadata files created during Step C must be located at the highest-level of the dataset, alongside the high-level folders. SODA provides a intuitive interface for organizing your dataset according to the SDS and upload it on Pennsieve with automatically generated manifest files.
E. Submit Dataset to the Curation Team for Review
Once all the previous steps have been completed, it is time to share your dataset with the SPARC Data Curation Team for review.
F. Post-curation steps
These steps must be completed ONLY after your dataset is approved by the Curation Team