Guide to preparing data for sharing
When you share research data, you make it possible for others to review the research results and reuse the data in further studies. For this to be possible, the data must be organized and presented in a self-explanatory way.
Certified repositories (such as SND) apply a review process to ensure that data and documentation meet specific requirements before being published and shared. Reviewers, either at SND or your organization's research data support, will work with you to ensure that the published dataset is reusable. It is important that you are reachable and actively participate during the review process.
Here are some useful recommendations:
Personal data
- Data containing personal information may be described and shared via DORIS only if your organization and SND have an agreement in place. You find more information about which organizations offer their researchers this opportunity and how it works in practice on the page Data with personal information in DORIS.
- Data containing personal information should generally not be shared openly; they need to be shared with restricted access.
- If you belong to an organization that does not offer the possibility of sharing data with personal information via DORIS, you must ensure the data are anonymized. You can read more about anonymization here. You may also contact SND at snd@snd.se to explore possible solutions.
Data files
- Data files should have a file format that is widely used, open, and non-proprietary. If possible, you can share data files in multiple formats so that they are both suitable for use and more adapted for long-term storage. You can read more about choosing a file format on the File formats pages on Researchdata.se.
- File and folder names should be meaningful and consistent. File names with sequential numbers or codes should be explained, for example in a README file.
- Datasets that consist of multiple files should be structured in a way that is intuitive to other users. The structure and file relations can, if necessary, be explained in a README file.
- When a data material consist of many files, it is often best to pack them into one or several ZIP archives for easier downloading, which can also help reduce file sizes. You may consider splitting the entire data material and publishing data as several separate datasets, which can then linked to each other as related datasets in the Researchdata.se catalogue.
- Files should be cleared of irrelevant information. This includes variables that are not described, calculated variables that can be reconstructed from the primary data, variables of an administrative nature, or colours for text and formulas.
- If the file format supports variable-level metadata, include relevant metadata in the data files (e.g., variable names and codes for variable values for tabular data, information about coding standards, or the meaning of different formatting for textual data). The important thing is that the information is saved with the data files, the exact format is secondary.
Metadata
Metadata is structured information used to describe and categorize digital information. Metadata makes it easier for users to search, find, and understand research material.
- You create structured metadata by describing the dataset in DORIS.
- Mandatory fields in DORIS specify the minimum level of metadata required by SND. However, the more information you provide, the easier it is for others to find the dataset and understand the contents of the files.
- Metadata should be as precise as possible. If the data are from field work in Colombia and Peru, enter Colombia and Peru as the geographic coverage areas, rather than just South America.
- Reference and link to articles or other publications that describe or are based on the dataset. You can also link to other related resources.
- If the data are shared due to a specific article or publication, the dataset title should be “Data for/for: [publication title],” unless you find that a descriptive title for the dataset is more appropriate.
Documentation
Relevant documentation must be attached to the data description so that other researchers can understand and reuse the data. Give careful thought to what type of documentation is needed to understand the data.
This may include:
- Variable lists with explanations of the content of each variable
- Questionnaires or surveys
- Interview forms, including interview guides
- Code lists and codebooks
- A list of the data material
- Links to articles or other publications
- Method descriptions or technical reports
- Information on how the data have been processed
- Syntaxed for derived variables
- Final reports
- Instructions for how to manage the data in custom-developed software
- Fieldwork diaries or log books.
How documentation is designed and what it is called varies across research fields and disciplines. SND does not require a specific format for the documentation; the content of the documents is what is most important. If no pre-existing documentation is available, relevant information can be summarized in a README file. An example of a README file can be found in this template developed by Cornell University.
Simply citing a published article or report associated with the research data is rarely considered sufficient documentation. Even if there is an open-access article describing how the data was collected or created, you should include a README file explaining how the content of the data files relates to what is described in the article. A typical README file for a tabular dataset, for example, will list and describe all the columns in the data file, specify the units of variables or values for categorical variables, explain quality codes for missing values, and so on.
Keep in mind that the person who wants to reuse the research data may come from a different research discipline, so it is helpful if the documentation is understandable to other audiences.
If you are unsure about what documentation is required, feel free to contact SND or your organization's local research data support.
Relevant links: