User Tools

Site Tools


Open Data

What is Open Data

(adapted/excerpted from the Open Data Handbook)

Open data is data that can be freely used, reused and redistributed by anyone - subject only, at most, to the requirement to attribute and share-alike. Some requirements for data to be truly open are as follows.

  • Availability and access: the data must be available as a whole and at no more than a reasonable reproduction cost, preferably by downloading over the internet. The data must also be available in a convenient and modifiable form
  • Reuse and redistribution: the data must be provided under terms that permit reuse and redistribution including the intermixing with other datasets.
  • Universal participation: everyone must be able to use, reuse and redistribute - there should be no discrimination against fields of endeavour or against persons or groups. For example, ‘non-commercial’ restrictions that would prevent ‘commercial’ use, or restrictions of use for certain purposes (e.g. only in education), are not allowed.

Why is it so important to be clear about the definition of “open”? The answer: interoperability, which is the ability of diverse systems and organizations to work together (inter-operate). In this case, it is the ability to usefully combine datasets from different countries in the lower Mekong region.

Interoperability allows for different components to work together. This ability to make components and to plug them together is essential to building large, complex systems. Without interoperability this becomes nearly impossible.

The core of a “commons” of data (or code) is that one piece of “open” material can be freely intermixed with other “open” material. This interoperability is key to achieving the main practical benefits of “openness”: the enhanced ability to combine different datasets together and thereby to develop more and better products and services. This ability to combine separate pieces from different sources into larger, more sophisticated systems is the real value of the openness standard.

Why open data?

(from the Open Knowledge Foundation)

Some common reasons for supporting open data:

  • Transparency: In a well-functioning, democratic society citizens need to know what their government is doing. To do that, they must be able freely to access government data and information and to share that information with other citizens. Transparency isn’t just about access, it is also about sharing and reuse — often, to understand material it needs to be analyzed and visualized and this requires that the material be open so that it can be freely used and reused.
  • Releasing social and commercial value: In a digital age, data is a key resource for social and commercial activities. Everything from finding your local post office to building a search engine requires access to data, much of which is created or held by government. By opening up data, government can help drive the creation of innovative business and services that deliver social and commercial value.
  • Participation and engagement – participatory governance or for business and organizations engaging with your users and audience: Much of the time citizens are only able to engage with their own governance sporadically — maybe just at an election every 4 or 5 years. By opening up data, citizens are enabled to be much more directly informed and involved in decision-making. This is more than transparency: it’s about making a full “read/write” society, not just about knowing what is happening in the process of governance but being able to contribute to it.

Which file formats are better for open data?

When exploring the Open Development Mekong data catalog, you are likely to find data provided in a variety of formats. The formats were chosen to best match the information the data describes: images as PNGs and TIFFs, text documents as Word Documents (DOC), Text (TXT) files, and sometimes PDFs, spreadsheets as CSVs, and geospatial information as GeoJSON, TopoJSON, KML, and ESRI Shapefile. These are all considered open formats and can be used freely in many applications and on most computer operating systems.

The information below will help you better understand the file formats used in the ODM catalog and how you can begin investigating their contents.

CSV (Comma-Separated Value)

  • What: A tabular (spreadsheet) data format, where the column values are separated by commas. CSV files are both human and machine readable. In the wild, you may see many other “delimiters” used, including tabs.
  • How: Many applications, including Microsoft Excel, OpenOffice, and Google Docs and by text editors like Sublime Text, TextWrangler, Apple TextEdit, and Microsoft Notepad. When the CSV includes geographic coordinates, you may also open them in desktop mapping applications, such as TileMill and QGIS, and with web-mapping tools, like and CartoDB.

DOC (Microsoft Word document)

  • What: A widespread document format developed by Microsoft for word processing.
  • How: In addition to Microsoft Word, you can open DOC files in OpenOffice, Apple Pages, LibreOffice, and other word processors.

JPEG (Joint Picture Experts Group)

  • What: A common image format that usually produces smaller file sizes but at a loss in image quality/resolution.
  • How: Most operating systems have built-in image-viewing applications to automatically open JPEGs, such as Microsoft Paint. Adopbe PhotoShop is more advanced software for viewing and editing.

JSON (JavaScript Object Notation)

  • What: JSON is an easily human and machine readable open standard format, which transmits data objects consisting of attribute-value pairs. GeoJSON is an extension of JSON that allows for the encoding of simple geographical features (points, lines and polygons) along with non-spatial attributes. TopoJSON itself extends GeoJSON by “stitching” together shared geometries (e.g. borders). This reduces file size and also facilitates certain visualizations.
  • How: GeoJSON files can be opened by desktop mapping applications and web-mapping tools, by R (with the right extensions), and by text editors. The online utility ogr2ogr supports conversion from many geospatial file formats into GeoJSON.

KML (Keyhole Markup Language)

  • What: KML is an XML notation format used to express geographic information (longitude, latitude, altitude) on two- and three-dimensional maps. These files are easily readable by humans and machines.
  • How: These files were originally developed for use with Google Earth. They can also be opened in a variety of other desktop GIS applications and web-mapping platforms.

PDF (Portable Document Format)

  • What: A widespread document format developed by Adobe for sharing text and images in a fixed, un-layered layout. Text from PDFs created with optical-character recognition technology can be copied and pasted to a more open format.
  • How: Most modern web browsers (e.g. Firefox, Google Chrome, Opera, Safari), can open PDF files without additional plugins. There are several free applications for viewing (but not necessarily editing) PDFs, including Adobe Reader, Apple Preview, and OpenOffice.

PNG (Portable Network Graphics)

  • What: PNG is a raster graphics file format that supports lossless data compression. The data catalog uses PNG format for general images.
  • How: PNG files can be opened in any image viewer and can be pasted into documents.

SHP (ESRI Shapefile)

  • What: Shapefiles are one of the most common geospatial formats out there. Like GeoJSON, shapefiles can store both spatial geometries (points, lines and polygons) and other feature attributes. In our data catalog, you will find shapefiles zipped together with a few other files (with extension .shx, .dbf, .sbn).
  • How: In addition to ArcGIS, these files can be opened in free an open source GIS applications like QGIS. They can also be converted to many other data formats.

TIFF (Tagged Image File Format)

  • What: TIFF is a computer file format used to store raster graphic images, which are made up of usually rectangular grids of pixels. In our data catalog, TIFFs are used for geographic images (e.g. satellite imagery or heat maps). These TIFFs include georeferencing information (or metadata) that allow you to project them onto a map and are referred to as GeoTIFFs.
  • How: TIFFs can be opened in image viewers, like Apple Preview. In the case of GeoTIFFs, you might find it useful to use GIS software, such as QGIS.

TXT (Text)

  • What: Essentially just textual data without stylistic formatting commands. All metadata is currently stored in TXT format, though this may change when a new data management platform is adopted.
  • How: Can be opened in any text editor.
public/open_data.txt · Last modified: 2015/09/07 18:22 by acorbi