(adapted from Introduction to Metadata)
Metadata, literally “data about data,” is everything that can be said about an information object or asset at any level of aggregation (item, folder, or entire system). An information asset is a discrete entity that can be addressed and manipulated by an information system or user, such as an electronic file. In general, all information assets have three features that can and should be described through metadata—content, context, and structure.
Metadata helps ensure that information systems are accessible, interoperable, and scalable. It provides important context about an informational asset’s source and manner of creation, as well as in what applications or environments the asset is relevant. Metadata also has the following purposes:
Information is often imperfect, whether it is produced by members of the Open Development Network or by others. Details may be missing, badly defined, or even completely wrong. Sometimes it is possible to improve the quality of the information by contacting its source. But even even then, problems may remain.
Still, we must aim for the metadata to be as correct as possible. What does that mean? How can the metadata be correct if the information it is describing is flawed?
Metadata aims to describe the extent of our knowledge about an asset. It should clearly state what is known about the asset and what is not known or problematic. Metadata changes when the asset itself or knowledge about its condition changes.
If information is missing or inconsistent, indicate that is the case instead of disregarding it. Also mention any steps being taken to address these issues, along with an expected timeline.
The template below is a guide for building a catalog of information with proper metadata. It explains the information that should be included as metadata and the instructions for each aspect.
Those are metadata fields common for all dataset types in the CKAN platform
Label | Field Name (API) | Definition | Guidelines | Example |
---|---|---|---|---|
Title* | title | Name given to the dataset. | Short phrase, written in plain language. Should be sufficiently descriptive to allow for search and discovery. | Aquaculture Production and Consumption in Cambodia (2011) |
Description* | description | Short description explaining the content and its origins. | Description of a few sentences, written in plain language. Should provide a sufficiently comprehensive overview of the resource for anyone to understand its content, origins, and any continuing work on it. The description can be written at the end, since it summarizes key information from the other metadata fields. | This dataset contains attributes of aquaculture production and consumption for each of Cambodia’s provinces in 2011. The data was provided by……… |
Tags | tags | An array of Taxonomic terms stored as tags | Taxonomic terms | Access to education, Bamboo |
License* | lincese_title | The license that applies to published dataset. | All resources wholly created by Open Development Mekong are licensed as Creative Commons Attribution-Sharealike (CC-BY-SA). Resources from other sources retain their original licenses, as does each component from an resource aggregated from multiple sources. If unclear, contact the source to determine the resource's license. | |
Copyright | odm_copyright | The copyright that applies to the dataset. | Select 'yes', 'no' or 'unclear copyright' about the copyright of the dataset. If copyright of any type is present, describe further in Access and User Constraints | 'All rights reserved', © or 'Copyright 2009 by Jane Smith' |
Access and Use Constraints | odm_access_and_use_constraints | A few sentences describing legal constraints of dataset, such as copyrights | Standardized statements found on datasets that cover intellectual property and copyright. | In deference to Cambodian law, Open Development Cambodia (ODC) site users understand and agree to take full responsibility for reliance on any site information provided and to hold harmless and waive any and all liability against individuals or entities associated with its development, form and content for any loss, harm or damage suffered as a result of its use. |
Organization* | organization | Organization the datasets belongs to | See list of organizations on http://data.opendevelopmentmekong.net/organization/ | odm-cambodia |
Version* | version | Version of dataset | Increase manually after editing | 1.0 |
Contact* | odm_contact | Contact information for the individual or organization that is responsible for or most knowledgeable about the dataset. This could be the author of a report, the contact information for the relevant department of an organization that produced a report, or the data analyst, mapper or researcher that produced a dataset or report. | Name / Organization / Phone / Website / Address | |
Uploader* [H] | maintainer | Uploader of the dataset | The person who created the dataset. Only visible to registered users of the ODI CKAN data hub. | Joe Bloggs |
Uploader contact* [H] | maintainer_email | Contact details of uploader | The email or other contact details of the person who created the dataset. Only visible to administrators of the uploader's organisation. | joe@example.com |
Label | Field Name (API) | Definition | Guidelines | Example |
---|---|---|---|---|
Language* | odm_language | Language(s) of the dataset, including resources within dataset | Lowercase list of ISO 639-1 language name abbreviations, separated by comma. Most common: en (English), km (Khmer), lo (Lao), my (Burmese), th (Thai), vi (Vietnamese) | en, km |
Date Created* | odm_date_created | Date the dataset was first published by its creator. | Write as YYYY-MM-DD. If an element is unknown, write a 0 in place of the digit. | 2011-12-00 |
Date Uploaded* | odm_date_uploaded | Date the dataset was first uploaded to the OD CKAN database. | Write as YYYY-MM-DD. If an element is unknown, write a 0 in place of the digit. | |
Date Modified | odm_date_modified | Date a new version or update of the dataset was uploaded to the OD CKAN database. | Write as YYYY-MM-DD. If an element is unknown, write a 0 in place of the digit. If no new version has been uploaded, then write NA. | NA |
Temporal Range | odm_temporal_range | The period of time for which the dataset is relevant (i.e. 2011-01-01:2011-12-31). | Write as YYYY-MM-DD: YYYY-MM-DD. If a date element is unknown, write a 0s in its place. If the resource covers an entire calendar year, the period should begin with the first day of the year and end with the last. | 2011-01-01:2011-12-31 |
Spatial Range * | odm_spatial_range | The geographic area that the dataset is relevant to (i.e. Cambodia, Laos). | Provide all countries that the resource is relevant to, separated by a comma. | Cambodia, Laos, Myanmar, Thailand, Vietnam (if the resource is relevant to all countries in the Mekong region) |
Accuracy | odm_accuracy | Details on the level of accuracy of the dataset and any existing issues. | Accuracy can refer to the spatial resolution of a satellite image, disagreements others have expressed about the resource's contents, any other references that contain contrary information, and other issues. Should be written in plain language, with numbers when appropriate. If you are not aware of any problems, write: ‘There are no known issues with accuracy.’ If the resource or some of its content describes something that is constantly changing, add: ‘Information subject to change’ and describe the nature of the change, if possible. | Data is given at the provincial level. There are no known issues with accuracy. |
Logical Consistency | odm_logical_consistency | Issues with logical consistency in the dataset and the steps, if any, being taken to validate its content. | Description in a few sentences. Can be a mix of numbers and words. If you are not aware of any problems, write: ‘There are no known issues with logical consistency.’ | Concessions A and B overlap 50 hectares. We anticipate conducting field validation of these concessions by 2014-10-31. |
Completeness | odm_completeness | Brief description of the level of completeness of the dataset's contents and the steps, if any, being taken to make the dataset more complete. | Description in a few sentences. Can be a mix of numbers and words. If you are not aware of any problems, write: ‘There are no known issues with completeness.’ | Locations are included for 30 of 50 concessions in the dataset. We are in the process of acquiring official government documentation on the remaining 20. |
Process(es)* | odm_process | The steps taken to acquire, aggregate, or transform any of the resources in the dataset. | Short description, written in plain language. Include any details regarding updates occurring to the dataset. | Available data were provided in SHP file format by The Atlas of Cambodia. The data were then exported in CSV and GEOJSON format by Open Development Cambodia using QGIS. |
Source(s)* | odm_source | Ordered citations for all information sources that went into producing the dataset. | Use a standard citation style or at least include the following information: Organization Name. “Title of dataset.” File type (CSV, JSON, etc.). Accessed Month ##, year. link. | World Bank. “Annual GDP Growth (%).” XLS. Accessed November 31, 2014. http://www.worldbank.org/en/publication/global-economic-prospects/data?variable=NYGDPMKTPKDZ®ion=EAP |
Metadata Reference Information | odm_metadata_reference_information | Information about how up-to-date the metadata is and who is responsible for maintaining it. | Write as: Metadata last updated on YYYY-MM-DD. For inquiries, see contact. | |
Attributes | odm_attributes | Details about the information content of the dataset. | List of attributes, who or which organization defined them, and what they describe (including any units of measurement in parentheses). Write as: Attribute Name / Organization : Attribute Definition | Land area (sq. km) / World Bank: Land area is a country's total area, excluding area under inland water bodies, national claims to continental shelf, and exclusive economic zones. In most cases the definition of inland water bodies includes major rivers and lakes. |
Following template could be implemented on CKAN for Librarians to continue populating the Library database with new records:
MARC21 field | Field label | Field name (API) | Field description | Field type | Example |
---|---|---|---|---|---|
245 | Title | title | Main title | String | |
520 | Summary | notes | Abstract or Summary of book or article | String | |
250 | Version / Edition * | marc21_250 | Version of publication | String | 2nd edition |
020 | ISBN | marc21_020 | The International Standard Book Number (ISBN) is a unique numeric commercial book identifier based upon the 10 or 13-digit Standard Book Numbering (SBN). | String | 978-981-4311-87-8 or 0-1223-4023-1 |
022 | ISSN | marc21_022 | International Standard Serial Number (ISSN) a unique numeric commercial serial identifier based upon the 8-digitSeriel standard number | String | 2049-3630 |
100 | Author | marc21_100 | Main Entry-Personal Name (author) | String | Barney, Keith |
110 | Corporate Author | marc21_110 | Main Entry-Corporate Name (corporate author) or title of journal | String | Asia Development Bank |
700 | Co-Author [M] | marc21_700 | Personal Name (co-author), more than one author. | String | Williamson, Andrew |
710 | Co-Author (Corporate)* | marc21_710 | Corporate Name, more than one Corporate. | String | Cambodia. Ministry of Environment |
246 | Varying Form of Title | marc21_246 | Parallel title or translation | String | Khmer title if have both |
260$a | Publication Place | marc21_260a | Place of publisher | String | Oxford |
260$b | Publisher | marc21_260b | Name of publishing organization | String | Oxford University Press |
260$c | Publication Date | marc21_260c | Date published | Date | 2012 |
300 | Pagination [M] | marc21_300 | Physical Description (pagination) | String | 123 p |
500 | General Note [M] | marc21_500 | General Note | String | Published in English and Khmer |
Also, library fields feature some of the metadata fields defined in the Open Development Network Metadata Template:
Label | Field Name (API) | Definition | Guidelines | Example |
---|---|---|---|---|
Language* | odm_language | Language(s) of the dataset, including resources within dataset | Lowercase list of ISO 639-1 language name abbreviations, separated by comma. Most common: en (English), km (Khmer), lo (Lao), my (Burmese), th (Thai), vi (Vietnamese) | en, km |
Date Uploaded* | odm_date_uploaded | Date the dataset was first uploaded to the OD CKAN database. | Write as YYYY-MM-DD. If an element is unknown, write a 0 in place of the digit. | |
Spatial Range * | odm_spatial_range | The geographic area that the dataset is relevant to (i.e. Cambodia, Laos). |
Note: Fields marked with * are mandatory.
Other metadata fields exposed by the CKAN API:
Label | Field Name (API) | Definition | Guidelines | Example |
---|---|---|---|---|
Type* | type | Dataset type | dataset or library_record | dataset |
Resources* | resources | Array with information about resources | … | … |
Tags* | tags | Array with information about resources | … | … |