Table of Contents

Data collection

Best practices for data collection

Collection or creation of data is the first step in the life cycle of data management. In many cases, data already exists but must be found and then cleaned (checked for accuracy and consistency), organized for a specific purpose, saved, shared, and updated. Data collection should be conducted with an awareness of what data is of interest, to whom, and how they will use it. Otherwise irrelevant or incomplete data will be collected.

There is a hierarchy of sources for data collection:

During data collection, also identify who is responsible for each part and have a system for tracking who worked on what, so that any later questions can be appropriately directed. Checking data can be more time-consuming than setting high, consistently applied standards for collecting it in the first place. If there is conflicting information from the same source, or missing information, be sure to make a clear note about the issue so that further review can be done later.

Citations

The source of data should always be noted for future reference. Without proper citations, information cannot be verified by others. Citation style should include basic information such as the name of the individual or organization who created the information, the year of production, the title of the document, the publisher (if different from the creator), and the link to the information.

Sources of data should be saved in case they become unavailable in the future. To prevent missing information from broken links to online sources, save screenshots of webpages or submit them to archive.org for preservation. This is especially important for official government documents that may not remain online.