User Tools

Site Tools


Menu

 ?

About

Basic concepts and guidelines

User Guides

For users

For mappers

Technical

Learn about Dokuwiki

public:data_collection

This is an old revision of the document!


Data collection

Best practices for data collection

Collection or creation of data is the first step in the life cycle of data management. In many cases, data already exists but must be found and then cleaned (checked for accuracy and consistency), organized for a specific purpose, saved, shared, and updated. Data collection should be conducted with an awareness of what data is of interest, to whom, and how they will use it. Otherwise irrelevant or incomplete data will be collected.

There is a hierarchy of sources for data collection:

  • Primary source: the entity that is directly responsible for creating the original version of the information (not translated or otherwise modified). For example, if the government is understood as the top authority for issuing a contract to a company, then a signed and stamped government document describing the contract is the primary(and preferred) source. If available, primary sources in their original language should always be the preferred type for data collection.
  • Secondary source: an entity that may not be a main actor or have complete authority but is still involved in documentation. For example, a newspaper article about a contract. A company-issued press release about the contract could also be considered a secondary source, depending on the focus of interest (some might define the company as a main actor, making it a primary source). Consider the reputability of a secondary source before collecting data from it.

During data collection, also identify who is responsible for each part and have a system for tracking who worked on what, so that any later questions can be appropriately directed. Checking data can be more time-consuming than setting high, consistently applied standards for collecting it in the first place. If there is conflicting information from the same source, or missing information, be sure to make a clear note about the issue so that further review can be done later.

Citations

The source of data should always be noted for future reference. Without proper citations, information cannot be verified by others. Citation style should include basic information such as the name of the individual or organization who created the information, the year of production, the title of the document, the publisher (if different from the creator), and the link to the information. A standard guide is the Chicago Manual of Style of Citation http://www.chicagomanualofstyle.org/tools_citationguide.html.

Web sources of data should be saved in case they become unavailable in the future. To prevent missing information from broken links to online sources, save screenshots of webpages or submit them to archive.org for preservation. This is especially important for official government documents that may not remain online.

Screenshots should be cited as follows:

[Source name], [Title of page], screenshot from [Source name] website on [date], [insert URL]

Example: Ministry of Agriculture, Forestry and Fisheries, Economic Land Concession Profile: (Cambodia) Research Mining and Development, screenshot from MAFF website on 21 June 2011 (insert URL).

public/data_collection.1456690923.txt.gz · Last modified: 2020/06/23 15:03 (external edit)