User Tools

Site Tools


Menu

 ?

About

Basic concepts and guidelines

User Guides

For users

For mappers

Technical

Learn about Dokuwiki

migrating_from_previous_system

Migrating from previous System

Who is this guide for

  • System administrators deploying ODM's system in a new server
  • Developers working on testing or improvements willing to replicate the productions server status

What this guide teaches

  • What are the contents being maintained on the current OD Mekong site and old ODC site.
  • How information is being categorized after OD Mekong's Taxonomy
  • How to run python scripts to migrate/initialize contents

Migration Planning

For the migration planning, please have a look at this link: https://docs.google.com/document/d/1ibDJNgf_HNvSOgAMCxRJcWqpmA1vZ0I716W7Gx2RefE/edit

Things to know in forehand

Inventory

This is a list of items on the server, we can go through it and make sure that all the components are accounted for and we have a migration plan for them.

Component Location Migration Plan? Component
http://www.opendevelopmentcambodia.net/company-profiles/hydropower-sub-stations/ /home/devnet/public_html/references Wordpress front-end
http://geoserver.opendevelopmentcambodia.net:8181/geoserver/web/ /home/devnet/public_html/geoserver_data yes CKAN
http://library.opendevelopmentcambodia.net:8080/newgenlibctxt/ /usr/NGL3/apache-tomcat-6.0.32/webapps/newgenlibctxt/CatalogueRecords yes CKAN
Store all files for downloading /home/devnet/public_html/download Wordpress front-end
http://www.opendevelopmentcambodia.net/maps/downloads/ /home/devnet/public_html/download/maps Wordpress front-end
http://www.opendevelopmentcambodia.net/laws-regulations/ /home/devnet/public_html/download/law Wordpress front-end
Wordpress /home/devnet/public_html/wp-content Wordpress front-end

Migration scripts are shipped with the odm-internal repository

Scripts for automated migration of contents are available in the odm-internal repository. After cloning the repo from within the CKAN instance, you will find a series of utility scripts in the scripts folder.

 cd odm-internal/odm-migration/CKAN/import-scripts/scripts 

The scripts come with a test suite included, for running tests:

  1. Rename the test.ini.sample file to test.ini and modify according to the current setup. (pay special attention to the solr_url and who.config_file)
  2. Do not forget to activate the virtual environment first
    . ~/.virtualenvs/ckan/bin/activate
  3. Run
    nosetests --ckan --with-pylons=test.ini tests

    from the odm-internal/odm-migration/CKAN/import-scripts/ folder

In order to run the scripts a sysadmin API Key is needed

Follow the ckan documentation for creating a Sysadmin user: http://docs.ckan.org/en/latest/maintaining/getting-started.html#create-admin-user

. ~/.virtualenvs/ckan/bin/activate
paster --plugin=ckan sysadmin add <USERNAME> -c /etc/ckan/default/production.ini

Once created, head to http://CKAN_URL/user/login and type the access information you have just specified. Once logged in, the profile page of the created sysadmin user will appear, we need to look at the bottom and take note of the API Key, we will need it in the next step.

Script configuration

Prior to runing the scripts, the odm_theme_config.sample.py file needs to be renamed to odm_theme_config.py:

cd config
mv odm_theme_config.sample.py odm_theme_config.py

In addition, the variables contained in this file need to be initialized:

  • DEBUG (BOOL): Show debug errors and messages
  • SKIP_N_DATASETS (NUMBER): Skips input data in order to resume stoped imports.
  • SKIP_EXISTING (BOOL): If False, modifies existing datasets. If true, skips them.
  • CKAN_URL (URL): URL of the CKAN instance (include port if necessary)
  • CKAN_APIKEY (KEY): API Key as described above
  • GEOSERVER_URL (URL): URL of the Geoserver (most probably http://geoserver.opendevelopmentcambodia.net:8181/geoserver/)
  • GEOSERVER_AUTH (KEY): Basic auth
  • GEOSERVER_MAP (DICT): Python dictionary containing ontology config. example:
    {'ontology':'*','organization':'cambodia-organization','groups':[{'name':'maps-group'},{'name':'cambodia-group'}]}
  • NGL_URL (URL): URL of the NewGenLib instance (most probably http://library.opendevelopmentcambodia.net:8080/newgenlibctxt/)
  • NGL_MAP (DICT): Python dictionary containing ontology config. example:
    {'ontology':'*','organization':'odm-library','groups':[{'name':'library-group'},{'name':'cambodia-group'}]}
  • ODC_MAP (LIST): Python list of dictionaries containing ontology config. example:
    ODC_MAP=[{'ontology':'ODC/laws','organization':'cambodia-organization','groups':[{'name':'laws-group'},{'name':'cambodia-group'}],'field_prefixes':[{'field':'file_name_kh','prefix':'http://cambodia.opendevelopmentmekong.net/wp-content/blogs.dir/2/download/law/'},{'field':'file_name_en','prefix':'http://cambodia.opendevelopmentmekong.net/wp-content/blogs.dir/2/download/law/'}]}]
  • ODM_ADMINS_PASS: The password for the default admin user of each of the organizations created by the insert_initial_odm_data.py script (See below.)
  • DELETE_MAP (DICT): A dicctionary specifying configuration for the delete_datasets_in_group script. example:
    {'group':'laws-group','limit':500,'field_filter':{'odm_contact':'ODM Importer','odm_contact_email':'info@opendevmekong.net
  • CHANGE_TYPE_MAP: A dicctionary specifying configuration for the change_dataset_type_in_group script. example:
    {'type':'library_record','organization':'odm-library','state':'active','limit':500,'field_filter':{'odm_contact':'OD Mekong Importer','odm_contact_email':'info@opendevmekong.net'}}

IMPORTANT: Disable Googleanalytics extension prior to the migration

There is a known Issue on the ckanext-googleanalytics extension ( which we use for traking use of the ckan instance ) which causes the scripts to fail after a certain number of requests. Therefore, please be sure that the plugin is not included in the ckan.plugins parameter in the development.ini or production.ini

IMPORTANT: Disable review system prior to the migration

As part of the workflow, the ckanext-issues extension modifies the system in a way that newly created datasets are automatically made private thus becoming unpublished. This is not the expected behaviour for the import scripts and in order for this situation to be avoided, the variable ckanext.issues.review_system should be set to False prior to running the scripts. After all the import scripts have been run successfully, the variable should be set again to True

Script initialization

On the scripts folder, there is a script that needs to be run first. It is called insert_initial_odm_data.py and creates a basic set of Users, Organizations and Groups that will be necessary for the import scripts below. In order to run it

cd scripts
python insert_initial_odm_data.py

Please be sure to edit the script and change CKAN_ADMIN_API_KEY to your user's API Key (which you noted before), otherwise the script will fail.

Migrating GeoServer's spatial data to the Data Hub

A python script has been developed in order to extract the list of Layers hosted on GeoServer, pull the metadata, GEOJson representation (if available), link to OpenLayers and other visualisation formats (stored in GeoServer) in order to be stored/updated on CKAN. Here is the Pseudocode of the script:

  1. Parse and iterate through the list of layers, extracting its name which will be used to name the corresponding dataset on CKAN (http://docs.geoserver.org/stable/en/user/rest/api/layers.html#layers-l-format).URL to call: http://geoserver.opendevelopmentcambodia.net:8181/geoserver/rest/layers/LAYER_NAME.json (i.e LAYER_NAME = Provinces)
  2. Generate the links to its GEOJson, PNG, PDF and OpenLayers representation. The first 3 should be downloaded to a temp file and then uploaded to CKAN as a resource along with the dataset, the latter will be just linked as resource within the same dataset.
  3. Before creating a new dataset, the script should check whether it already exists. In that case, metadata and existing resources within the dataset will be replaced by the new ones ensuring the information stays up-to-date.

In order to avoid issues deriving from GeoServer been moved to another location thus changing its IP, it should be always address with the current domain name: http://geoserver.opendevelopmentcambodia.net

Executing the script

Run import_from_geoserver.py which can be found under odm-scripting/ckan-scripts. This script downloads and initializes the map layers from GeoServer.

python import_from_geoserver.py

Please be sure to edit the script and change <CKAN_URL_AND_PORT>, <CKAN_ADMIN_API_KEY> and <GEOSERVER_BASIC_AUTH> to your CKAN's user API Key and Geoserver's Authorization header (Basic Auth) respectively, otherwise the script will fail.

Migrating Library publications from NGL to the Data Hub

Currently, NextGenLib is used to maintain a collection of Library Publications on old's ODC website. This system not only offers users the possibility to browse through a book and article catalog but to check the availability of certain publications in ODC's physical library. The existing records along with its metadata need to be imported into the new datahub module. However, the functionality to check the availability of publications in the physical library is not supported by ckan and would need to be programmed extra. By the moment this won't be supported.

The records stored on NGL should be imported to the Data Hub programatically. For that, a script has been developed that aims to automate this process. Following workflow has been conceived:

  1. Current Library publication records, stored on ODC's NextGenLib instance (http://library.opendevelopmentcambodia.net/) need to be extracted in MARC21 format. This export produces a single binary file. Let's call it records.mrc
  2. The generated file, which contains the actual records, needs to be uploaded to the odm-library repository on Github in order to be available for the import script.
  3. The developed import script can be found on the scripts folder maintained on the odm_internal repository on Github. This repository needs to be cloned in order for the script to be run.
  4. Once the repository is cloned, the script can be found under /ckan_scripts/import_from_ngl.py.

Executing the script

Run import_from_ngl.py which can be found under odm-internal/odm-migration. This script imports the Library publication records from NextGenLib into CKAN

python import_from_ngl.py

Please be sure to edit the script and change <CKAN_URL_AND_PORT>, <CKAN_ADMIN_API_KEY> and <NGL_URL> to CKAN's URL and PORT, your CKAN's user API Key and URL of the NGL instance respectively, otherwise the script will fail.

Archiving contents from the former ODC Wordpress site

In order to replicate the efect of the wpckan wordpress plugin to all of the previously created contents on opendevelopmentcambodia.net. A script has beeen written, which pulls XML files with exports of each relevant category on the wordpress site and archives it into CKAN assigning the created or modified datasets to specific Organizations and/or Groups. See above.

Executing the script

Run import_odc_contents.py which can be found under odm-internal/odm-migration.

python import_odc_contents.py

Please be sure to edit the script and change <CKAN_URL_AND_PORT>, <CKAN_ADMIN_API_KEY> and <ODC_MAP> to CKAN's URL and PORT, your CKAN's user API Key.

Loading Taxonomy structure into CKAN

Information across the platform is structured following a taxonomy which helps contents to be categorized after certain topics. This structure has to be also maintained in the Data Hub. The Taxonomy is available on the odm-localization repository along with its translation in several languages. For importing the Taxonomy elements into ODM's CKAN instance, a script has being written which gets the structure of the taxonomy from the odm-localization repository and imports it into ckan as Tag Vocabularies.

Executing the script

Run import_taxonomy_tag_dictionaries.py|import_taxonomy_tag_dictionaries.py which can be found under odm-scripting/ckan-scripts. This script downloads and initializes the taxonomy structure into CKAN.

python import_taxonomy_tag_dictionaries.py

Please be sure to edit the script and change <CKAN_URL_AND_PORT>, <CKAN_ADMIN_API_KEY> to your user's API Key (which you noted before), otherwise the script will fail.

Importing translation terms for ODM's Taxonomy

The Taxonomy is available on the odm-localization repository along with its translation in several languages. For importing the trasnalted Taxonomy elements into ODM's CKAN instance, a script has being written which gets the structure of the translated taxonomy elements and imports it into ckan as Term translations.

Executing the script

Run import_taxonomy_term_translations.py which can be found under odm-scripting/ckan-scripts. This script downloads and initializes the taxonomy structure into CKAN.

python import_taxonomy_tag_dictionaries.py

Please be sure to edit the script and change <CKAN_URL_AND_PORT>, <CKAN_ADMIN_API_KEY> to your user's API Key (which you noted before), otherwise the script will fail.

Deleting datasets from a certain group

Sometimes it will be needed to remove datasets from a certain group (i.e Laws or maps). For that, the delete_datasets_in_group can be used. The script can be configured by specifying the following details in the DELETE_MAP variable in the config file:

  • organization: The organization owning the dataset. NOTE: This parameter overrides the group parameter.
  • group: The group from which the datasets will be deleted.
  • state: Delete only datasets with a certain state ( draft/active)
  • limit: Limit the number of datasets to remove everytime the script is run
  • field_filter: Here Key:Value objects can be specified in order to filter datasets to select after the contents of their extra fields.

Use field_filter parameter to remove only datasets imported by the import scripts, specify this value

 'field_filter':{'odm_contact':'ODM Importer','odm_contact_email':'info@opendevmekong.net'} 

Executing the script

Run delete_datasets_in_group.py which can be found under odm_internal/odm_migration. This script gathers the list of datasets and removes them bulk-wise.

python delete_datasets_in_group.py

Please be sure to edit the script and change <CKAN_URL_AND_PORT>, <CKAN_ADMIN_API_KEY> to your user's API Key (which you noted before), otherwise the script will fail.

When we run this script, filtered datasets will get the state 'deleted' but will still be available in the DB. In order to delete these datasets permanently, login with sysadmin credentials and point your browser to: http://data.opendevelopmentmekong.net/ckan-admin/trash. Alternativelly, the instruction found under http://wiki.opendevelopmentmekong.net/code_snippets#purge_all_datasets_marked_as_deleted_on_ckan can be run in order to purge the deleted datasets.

migrating_from_previous_system.txt · Last modified: 2020/06/23 15:04 (external edit)