Data errors

An important function of good data management is to prevent errors from finding their way into the research data during the course of a project. Data errors can require extra work to correct, but what’s worse is if you don’t notice the errors and they lead to a faulty analysis.

The worst types of errors lead to loss of data. In order to prevent this from happening to the project data, you should remember to make frequent backups of the data. Most HEIs have systems for automatic backup, but if you want to be certain of which systems are in use, we recommend that you consult with your local IT Services. Sometimes you may become aware that the project demands more than the general standard in your organisation.

Technical issues can also lead to the corruption of individual values in a dataset. Over time, storage media tend to experience “bit rot” or data degradation; the material on the storage medium decays, so that some 1s are interpreted as 0s, and vice versa. The result is that the file becomes more or less unreadable, or that values change. This can happen if you store a material for a long time, so it's important to use high-quality storage media, and files should regularly be copied to new media devices. Avoid USB drives, CDs, and external hard drives that aren’t regularly backed up. If you are uncertain of which storage medium you should choose, please consult with your IT Services. You may also want to use check sums to detect data errors in files.

The most common source of errors in a data material is the human factor. As research data can be of so many types, sources, and origins, you cannot say exactly what should or could be done to prevent data corruption or data errors. But there are some general recommendations that may minimise the risk of data errors:

  • Plan. Use clear, written instructions and lists to minimise the risk of errors from ad hoc solutions in, for instance, coding, thematisation, and file and folder names.
  • Don’t rely on memory. Document all changes to the data, so that you can trace and correct possible errors. Documentation is important in all projects, but particularly so in projects where several people work with the same data.
  • Keep things in order. Avoid errors that arise because you, or another project member, select the wrong file in a processing or analysis step. Create an intuitive and logical folder structure, conventions for how to choose file names, and a carefully constructed system for file versioning that all project members can follow.
  • Write a data management plan. Then you will have a place where you can collect all information about what to document and how, which rules that apply for file versioning and naming, which folder structure you are using, and other information about how you plan the data management. Don’t forget to keep the DMP up to date! As a resource, you can use the SND DMP checklist.

If you follow these recommendations, it will be easier to avoid errors. Should an error still appear in the data, you can go back to a previous file version and correct the error.