User Tools

Site Tools




Basic concepts and guidelines

User Guides

For users

For mappers


Learn about Dokuwiki


Data collection

Best practices for data collection

Collection or creation of data is the first step in the life cycle of data management. In many cases, data already exists but must be found and then cleaned (checked for accuracy and consistency), organized for a specific purpose, saved, shared, and updated. Data collection should be conducted with an awareness of what data is of interest, to whom, and how they will use it. Otherwise irrelevant or incomplete data will be collected.

There is a hierarchy of sources for data collection:

  • Primary source: the entity that is directly responsible for creating the original version of the information (not translated or otherwise modified). For example, if the government is understood as the top authority for issuing a contract to a company, then a signed and stamped government document describing the contract is the primary(and preferred) source. If available, primary sources in their original language should always be the preferred type for data collection.
  • Secondary source: an entity that may not be a main actor or have complete authority but is still involved in documentation. For example, a newspaper article about a contract. A company-issued press release about the contract could also be considered a secondary source, depending on the focus of interest (some might define the company as a main actor, making it a primary source). Consider the reputability of a secondary source before collecting data from it.

During data collection, also identify who is responsible for each part and have a system for tracking who worked on what, so that any later questions can be appropriately directed. Checking data can be more time-consuming than setting high, consistently applied standards for collecting it in the first place. If there is conflicting information from the same source, or missing information, be sure to make a clear note about the issue so that further review can be done later.


The source of data should always be noted for future reference. Without proper citations, information cannot be verified by others. Citation style should include basic information such as the name of the individual or organization who created the information, the year of production, the title of the document, the publisher (if different from the creator), and the link to the information. A standard guide is the Chicago Manual of Style of Citation (

Web sources of data should be saved in case they become unavailable in the future. To prevent missing information from broken links to online sources, save screenshots of webpages or submit them to (Preferred option) for preservation. This is especially important for official government documents that may not remain online.

Internet Archive

Internet Archive ( should be used to capture a web page as it appears at the time of access for use as a trusted citation in the future.

All sources of information should be archived on this site if the source is cited in anyway that provides evidence of information attained. The link generated from the web archive can be added to the resource record on CKAN as a url and/or in the reference section for the landing pages or topic pages. The document type Archive web content should be selected when entering in the library record.

All government or civil society groups websites should be archived as these sites have the highest probability of being altered or shut down.

Publication place - should list the exact URL that the site is located.

Publication date - should be the date the site or page was last updated.


Screenshots should be cited as follows:

[Source name], [Title of page], screenshot from [Source name] website on [date], [insert URL]

Example: Ministry of Agriculture, Forestry and Fisheries, Economic Land Concession Profile: (Cambodia) Research Mining and Development, screenshot from MAFF website on 21 June 2011 (insert URL).

public/data_collection.txt · Last modified: 2020/06/23 15:04 (external edit)