Collection or creation of data is the first step in the life cycle of data management. In many cases, data already exists but must be found and then cleaned (checked for accuracy and consistency), organized for a specific purpose, saved, shared, and updated. Data collection should be conducted with an awareness of what data is of interest, to whom, and how they will use it. Otherwise irrelevant or incomplete data will be collected.
There is a hierarchy of sources for data collection:
During data collection, also identify who is responsible for each part and have a system for tracking who worked on what, so that any later questions can be appropriately directed. Checking data can be more time-consuming than setting high, consistently applied standards for collecting it in the first place. If there is conflicting information from the same source, or missing information, be sure to make a clear note about the issue so that further review can be done later.
The source of data should always be noted for future reference. Without proper citations, information cannot be verified by others. Citation style should include basic information such as the name of the individual or organization who created the information, the year of production, the title of the document, the publisher (if different from the creator), and the link to the information. A standard guide is the Chicago Manual of Style of Citation (http://www.chicagomanualofstyle.org/tools_citationguide.html).
Web sources of data should be saved in case they become unavailable in the future. To prevent missing information from broken links to online sources, save screenshots of webpages or submit them to archive.org (Preferred option) for preservation. This is especially important for official government documents that may not remain online.
Internet Archive (http://archive.org/web/) should be used to capture a web page as it appears at the time of access for use as a trusted citation in the future.
All sources of information should be archived on this site if the source is cited in anyway that provides evidence of information attained. The link generated from the web archive can be added to the resource record on CKAN as a url and/or in the reference section for the landing pages or topic pages. The document type Archive web content should be selected when entering in the library record.
All government or civil society groups websites should be archived as these sites have the highest probability of being altered or shut down.
Publication place - should list the exact URL that the site is located.
Publication date - should be the date the site or page was last updated.
Screenshots should be cited as follows:
[Source name], [Title of page], screenshot from [Source name] website on [date], [insert URL]
Example: Ministry of Agriculture, Forestry and Fisheries, Economic Land Concession Profile: (Cambodia) Research Mining and Development, screenshot from MAFF website on 21 June 2011 (insert URL).