Table of Contents

Metadata

What is metadata, and why is it important?

(adapted from Introduction to Metadata)

Metadata, literally “data about data,” is everything that can be said about an information object or asset at any level of aggregation (item, folder, or entire system). An information asset is a discrete entity that can be addressed and manipulated by an information system or user, such as an electronic file. In general, all information assets have three features that can and should be described through metadata—content, context, and structure.

Metadata helps ensure that information systems are accessible, interoperable, and scalable. It provides important context about an informational asset’s source and manner of creation, as well as in what applications or environments the asset is relevant. Metadata also has the following purposes:

Guidelines for Creating Metadata

Information is often imperfect, whether it is produced by members of the Open Development Network or by others. Details may be missing, badly defined, or even completely wrong. Sometimes it is possible to improve the quality of the information by contacting its source. But even even then, problems may remain.

Still, we must aim for the metadata to be as correct as possible. What does that mean? How can the metadata be correct if the information it is describing is flawed?

Metadata aims to describe the extent of our knowledge about an asset. It should clearly state what is known about the asset and what is not known or problematic. Metadata changes when the asset itself or knowledge about its condition changes.

If information is missing or inconsistent, indicate that is the case instead of disregarding it. Also mention any steps being taken to address these issues, along with an expected timeline.

The template below is a guide for building a catalog of information with proper metadata. It explains the information that should be included as metadata and the instructions for each aspect.

CKAN's default metadata fields

Those are metadata fields common for all dataset types in the CKAN platform

Label Field Name (API) Definition Guidelines Example
Title* title Name given to the dataset. Short phrase, written in plain language. Should be sufficiently descriptive to allow for search and discovery. Aquaculture Production and Consumption in Cambodia (2011)
Description* description Short description explaining the content and its origins. Description of a few sentences, written in plain language. Should provide a sufficiently comprehensive overview of the resource for anyone to understand its content, origins, and any continuing work on it. The description can be written at the end, since it summarizes key information from the other metadata fields. This dataset contains attributes of aquaculture production and consumption for each of Cambodia’s provinces in 2011. The data was provided by………
Tags tags An array of Taxonomic terms stored as tags Taxonomic terms Access to education, Bamboo
License* lincese_title The license that applies to published dataset. All resources wholly created by Open Development Mekong are licensed as Creative Commons Attribution-Sharealike (CC-BY-SA). Resources from other sources retain their original licenses, as does each component from an resource aggregated from multiple sources. If unclear, contact the source to determine the resource's license.
Copyright odm_copyright The copyright that applies to the dataset. Select 'yes', 'no' or 'unclear copyright' about the copyright of the dataset. If copyright of any type is present, describe further in Access and User Constraints 'All rights reserved', © or 'Copyright 2009 by Jane Smith'
Access and Use Constraints odm_access_and_use_constraints A few sentences describing legal constraints of dataset, such as copyrights Standardized statements found on datasets that cover intellectual property and copyright. In deference to Cambodian law, Open Development Cambodia (ODC) site users understand and agree to take full responsibility for reliance on any site information provided and to hold harmless and waive any and all liability against individuals or entities associated with its development, form and content for any loss, harm or damage suffered as a result of its use.
Organization* organization Organization the datasets belongs to See list of organizations on http://data.opendevelopmentmekong.net/organization/ odm-cambodia
Version* version Version of dataset Increase manually after editing 1.0
Contact* odm_contact Contact information for the individual or organization that is responsible for or most knowledgeable about the dataset. This could be the author of a report, the contact information for the relevant department of an organization that produced a report, or the data analyst, mapper or researcher that produced a dataset or report. Name / Organization / Phone / Website / Address
Uploader* [H] maintainer Uploader of the dataset The person who created the dataset. Only visible to registered users of the ODI CKAN data hub. Joe Bloggs
Uploader contact* [H] maintainer_email Contact details of uploader The email or other contact details of the person who created the dataset. Only visible to administrators of the uploader's organisation. joe@example.com
  • Note: Fields marked with * are mandatory.
  • Note: Fields marked with [H] are hidden and filled automatically

Open Development Network Metadata Template

Label Field Name (API) Definition Guidelines Example
Language* odm_language Language(s) of the dataset, including resources within dataset Lowercase list of ISO 639-1 language name abbreviations, separated by comma. Most common: en (English), km (Khmer), lo (Lao), my (Burmese), th (Thai), vi (Vietnamese) en, km
Date Created* odm_date_created Date the dataset was first published by its creator. Write as YYYY-MM-DD. If an element is unknown, write a 0 in place of the digit. 2011-12-00
Date Uploaded* odm_date_uploaded Date the dataset was first uploaded to the OD CKAN database. Write as YYYY-MM-DD. If an element is unknown, write a 0 in place of the digit.
Date Modified odm_date_modified Date a new version or update of the dataset was uploaded to the OD CKAN database. Write as YYYY-MM-DD. If an element is unknown, write a 0 in place of the digit. If no new version has been uploaded, then write NA. NA
Temporal Range odm_temporal_range The period of time for which the dataset is relevant (i.e. 2011-01-01:2011-12-31). Write as YYYY-MM-DD: YYYY-MM-DD. If a date element is unknown, write a 0s in its place. If the resource covers an entire calendar year, the period should begin with the first day of the year and end with the last. 2011-01-01:2011-12-31
Spatial Range * odm_spatial_range The geographic area that the dataset is relevant to (i.e. Cambodia, Laos). Provide all countries that the resource is relevant to, separated by a comma. Cambodia, Laos, Myanmar, Thailand, Vietnam (if the resource is relevant to all countries in the Mekong region)
Accuracy odm_accuracy Details on the level of accuracy of the dataset and any existing issues. Accuracy can refer to the spatial resolution of a satellite image, disagreements others have expressed about the resource's contents, any other references that contain contrary information, and other issues. Should be written in plain language, with numbers when appropriate. If you are not aware of any problems, write: ‘There are no known issues with accuracy.’ If the resource or some of its content describes something that is constantly changing, add: ‘Information subject to change’ and describe the nature of the change, if possible. Data is given at the provincial level. There are no known issues with accuracy.
Logical Consistency odm_logical_consistency Issues with logical consistency in the dataset and the steps, if any, being taken to validate its content. Description in a few sentences. Can be a mix of numbers and words. If you are not aware of any problems, write: ‘There are no known issues with logical consistency.’ Concessions A and B overlap 50 hectares. We anticipate conducting field validation of these concessions by 2014-10-31.
Completeness odm_completeness Brief description of the level of completeness of the dataset's contents and the steps, if any, being taken to make the dataset more complete. Description in a few sentences. Can be a mix of numbers and words. If you are not aware of any problems, write: ‘There are no known issues with completeness.’ Locations are included for 30 of 50 concessions in the dataset. We are in the process of acquiring official government documentation on the remaining 20.
Process(es)* odm_process The steps taken to acquire, aggregate, or transform any of the resources in the dataset. Short description, written in plain language. Include any details regarding updates occurring to the dataset. Available data were provided in SHP file format by The Atlas of Cambodia. The data were then exported in CSV and GEOJSON format by Open Development Cambodia using QGIS.
Source(s)* odm_source Ordered citations for all information sources that went into producing the dataset. Use a standard citation style or at least include the following information: Organization Name. “Title of dataset.” File type (CSV, JSON, etc.). Accessed Month ##, year. link. World Bank. “Annual GDP Growth (%).” XLS. Accessed November 31, 2014. http://www.worldbank.org/en/publication/global-economic-prospects/data?variable=NYGDPMKTPKDZ&region=EAP
Metadata Reference Information odm_metadata_reference_information Information about how up-to-date the metadata is and who is responsible for maintaining it. Write as: Metadata last updated on YYYY-MM-DD. For inquiries, see contact.
Attributes odm_attributes Details about the information content of the dataset. List of attributes, who or which organization defined them, and what they describe (including any units of measurement in parentheses). Write as: Attribute Name / Organization : Attribute Definition Land area (sq. km) / World Bank: Land area is a country's total area, excluding area under inland water bodies, national claims to continental shelf, and exclusive economic zones. In most cases the definition of inland water bodies includes major rivers and lakes.
  • Fields marked with * are mandatory
  • Fields with contents being striked through are marked for removal

Library Publications Metadata Template

Following template could be implemented on CKAN for Librarians to continue populating the Library database with new records:

MARC21 field Field label Field name (API) Field description Field type Example
245 Title title Main title String
520 Summary notes Abstract or Summary of book or article String
250 Version / Edition * marc21_250 Version of publication String 2nd edition
020 ISBN marc21_020 The International Standard Book Number (ISBN) is a unique numeric commercial book identifier based upon the 10 or 13-digit Standard Book Numbering (SBN). String 978-981-4311-87-8 or 0-1223-4023-1
022 ISSN marc21_022 International Standard Serial Number (ISSN) a unique numeric commercial serial identifier based upon the 8-digitSeriel standard number String 2049-3630
100 Author marc21_100 Main Entry-Personal Name (author) String Barney, Keith
110 Corporate Author marc21_110 Main Entry-Corporate Name (corporate author) or title of journal String Asia Development Bank
700 Co-Author [M] marc21_700 Personal Name (co-author), more than one author. String Williamson, Andrew
710 Co-Author (Corporate)* marc21_710 Corporate Name, more than one Corporate. String Cambodia. Ministry of Environment
246 Varying Form of Title marc21_246 Parallel title or translation String Khmer title if have both
260$a Publication Place marc21_260a Place of publisher String Oxford
260$b Publisher marc21_260b Name of publishing organization String Oxford University Press
260$c Publication Date marc21_260c Date published Date 2012
300 Pagination [M] marc21_300 Physical Description (pagination) String 123 p
500 General Note [M] marc21_500 General Note String Published in English and Khmer
  • Note: The Field name with [M] mark means can have more than one entry, please separate with commas.
  • Note: The fields with Strike-through text are not currently used (hidden or marked for deprecation).

Also, library fields feature some of the metadata fields defined in the Open Development Network Metadata Template:

Label Field Name (API) Definition Guidelines Example
Language* odm_language Language(s) of the dataset, including resources within dataset Lowercase list of ISO 639-1 language name abbreviations, separated by comma. Most common: en (English), km (Khmer), lo (Lao), my (Burmese), th (Thai), vi (Vietnamese) en, km
Date Uploaded* odm_date_uploaded Date the dataset was first uploaded to the OD CKAN database. Write as YYYY-MM-DD. If an element is unknown, write a 0 in place of the digit.
Spatial Range * odm_spatial_range The geographic area that the dataset is relevant to (i.e. Cambodia, Laos).

Note: Fields marked with * are mandatory.

Other metadata fields

Other metadata fields exposed by the CKAN API:

Label Field Name (API) Definition Guidelines Example
Type* type Dataset type dataset or library_record dataset
Resources* resources Array with information about resources
Tags* tags Array with information about resources