====== Migrating from previous System ====== ===== Who is this guide for ===== * System administrators deploying ODM's system in a new server * Developers working on testing or improvements willing to replicate the productions server status ===== What this guide teaches ===== * What are the contents being maintained on the current OD Mekong site and old ODC site. * How information is being categorized after OD Mekong's Taxonomy * How to run python scripts to migrate/initialize contents ===== Migration Planning ===== For the migration planning, please have a look at this link: https://docs.google.com/document/d/1ibDJNgf_HNvSOgAMCxRJcWqpmA1vZ0I716W7Gx2RefE/edit ===== Things to know in forehand ===== ==== Inventory ==== This is a list of items on the server, we can go through it and make sure that all the components are accounted for and we have a migration plan for them. ^ Component ^ Location ^ Migration Plan? ^ Component ^ | http://www.opendevelopmentcambodia.net/company-profiles/hydropower-sub-stations/ | /home/devnet/public_html/references | | Wordpress front-end | | http://geoserver.opendevelopmentcambodia.net:8181/geoserver/web/ | /home/devnet/public_html/geoserver_data | yes | CKAN | | http://library.opendevelopmentcambodia.net:8080/newgenlibctxt/ | /usr/NGL3/apache-tomcat-6.0.32/webapps/newgenlibctxt/CatalogueRecords | yes | CKAN | | Store all files for downloading | /home/devnet/public_html/download | | Wordpress front-end | | http://www.opendevelopmentcambodia.net/maps/downloads/ | /home/devnet/public_html/download/maps | | Wordpress front-end | | http://www.opendevelopmentcambodia.net/laws-regulations/ | /home/devnet/public_html/download/law | | Wordpress front-end | | Wordpress | /home/devnet/public_html/wp-content | | Wordpress front-end | ==== Migration scripts are shipped with the odm-internal repository ==== Scripts for automated migration of contents are available in the [[https://github.com/OpenDevelopmentMekong/odm-internal|odm-internal]] repository. After cloning the repo from within the CKAN instance, you will find a series of utility scripts in the scripts folder. cd odm-internal/odm-migration/CKAN/import-scripts/scripts The scripts come with a test suite included, for running tests: - Rename the **test.ini.sample** file to **test.ini** and modify according to the current setup. (pay special attention to the **solr_url** and **who.config_file**) - Do not forget to activate the virtual environment first . ~/.virtualenvs/ckan/bin/activate - Run nosetests --ckan --with-pylons=test.ini tests from the odm-internal/odm-migration/CKAN/import-scripts/ folder ==== In order to run the scripts a sysadmin API Key is needed ==== Follow the ckan documentation for creating a Sysadmin user: http://docs.ckan.org/en/latest/maintaining/getting-started.html#create-admin-user . ~/.virtualenvs/ckan/bin/activate paster --plugin=ckan sysadmin add -c /etc/ckan/default/production.ini Once created, head to http://CKAN_URL/user/login and type the access information you have just specified. Once logged in, the profile page of the created sysadmin user will appear, we need to look at the bottom and take note of the API Key, we will need it in the next step. {{ ::ckan_api_key.png?direct&600 |}} ==== Script configuration ==== Prior to runing the scripts, the **odm_theme_config.sample.py** file needs to be renamed to **odm_theme_config.py**: cd config mv odm_theme_config.sample.py odm_theme_config.py In addition, the variables contained in this file need to be initialized: * **DEBUG** (BOOL): Show debug errors and messages * **SKIP_N_DATASETS** (NUMBER): Skips input data in order to resume stoped imports. * **SKIP_EXISTING** (BOOL): If False, modifies existing datasets. If true, skips them. * **CKAN_URL** (URL): URL of the CKAN instance (include port if necessary) * **CKAN_APIKEY** (KEY): API Key as described above * **GEOSERVER_URL** (URL): URL of the Geoserver (most probably http://geoserver.opendevelopmentcambodia.net:8181/geoserver/) * **GEOSERVER_AUTH** (KEY): Basic auth * **GEOSERVER_MAP** (DICT): Python dictionary containing ontology config. example: {'ontology':'*','organization':'cambodia-organization','groups':[{'name':'maps-group'},{'name':'cambodia-group'}]} * **NGL_URL** (URL): URL of the NewGenLib instance (most probably http://library.opendevelopmentcambodia.net:8080/newgenlibctxt/) * **NGL_MAP** (DICT): Python dictionary containing ontology config. example: {'ontology':'*','organization':'odm-library','groups':[{'name':'library-group'},{'name':'cambodia-group'}]} * **ODC_MAP** (LIST): Python list of dictionaries containing ontology config. example: ODC_MAP=[{'ontology':'ODC/laws','organization':'cambodia-organization','groups':[{'name':'laws-group'},{'name':'cambodia-group'}],'field_prefixes':[{'field':'file_name_kh','prefix':'http://cambodia.opendevelopmentmekong.net/wp-content/blogs.dir/2/download/law/'},{'field':'file_name_en','prefix':'http://cambodia.opendevelopmentmekong.net/wp-content/blogs.dir/2/download/law/'}]}] * **ODM_ADMINS_PASS**: The password for the default admin user of each of the organizations created by the insert_initial_odm_data.py script (See below.) * **DELETE_MAP** (DICT): A dicctionary specifying configuration for the delete_datasets_in_group script. example: {'group':'laws-group','limit':500,'field_filter':{'odm_contact':'ODM Importer','odm_contact_email':'info@opendevmekong.net * **CHANGE_TYPE_MAP**: A dicctionary specifying configuration for the change_dataset_type_in_group script. example: {'type':'library_record','organization':'odm-library','state':'active','limit':500,'field_filter':{'odm_contact':'OD Mekong Importer','odm_contact_email':'info@opendevmekong.net'}} ==== IMPORTANT: Disable Googleanalytics extension prior to the migration ==== There is [[https://github.com/ckan/ckanext-googleanalytics/issues/12|a known Issue]] on the ckanext-googleanalytics extension ( which we use for traking use of the ckan instance ) which causes the scripts to fail after a certain number of requests. Therefore, please be sure that the plugin is not included in the **ckan.plugins** parameter in the development.ini or production.ini ==== IMPORTANT: Disable review system prior to the migration ==== As part of the workflow, the ckanext-issues extension modifies the system in a way that newly created datasets are automatically made private thus becoming unpublished. This is not the expected behaviour for the import scripts and in order for this situation to be avoided, the variable **ckanext.issues.review_system** should be set to //False// prior to running the scripts. After all the import scripts have been run successfully, the variable should be set again to //True// ==== Script initialization ==== On the scripts folder, there is a script that needs to be run first. It is called insert_initial_odm_data.py and creates a basic set of Users, Organizations and Groups that will be necessary for the import scripts below. In order to run it cd scripts python insert_initial_odm_data.py Please be sure to edit the script and change CKAN_ADMIN_API_KEY to your user's API Key (which you noted before), otherwise the script will fail. ===== Migrating GeoServer's spatial data to the Data Hub ===== A python script has been developed in order to extract the list of Layers hosted on GeoServer, pull the metadata, GEOJson representation (if available), link to OpenLayers and other visualisation formats (stored in GeoServer) in order to be stored/updated on CKAN. Here is the Pseudocode of the script: - Pull the list of available layers on GeoServer through its REST API (http://docs.geoserver.org/stable/en/user/rest/api/layers.html#layers-format) URL to call: http://geoserver.opendevelopmentcambodia.net:8181/geoserver/rest/layers.json - Parse and iterate through the list of layers, extracting its name which will be used to name the corresponding dataset on CKAN (http://docs.geoserver.org/stable/en/user/rest/api/layers.html#layers-l-format).URL to call: http://geoserver.opendevelopmentcambodia.net:8181/geoserver/rest/layers/LAYER_NAME.json (i.e LAYER_NAME = Provinces) - Generate the links to its GEOJson, PNG, PDF and OpenLayers representation. The first 3 should be downloaded to a temp file and then uploaded to CKAN as a resource along with the dataset, the latter will be just linked as resource within the same dataset. - Before creating a new dataset, the script should check whether it already exists. In that case, metadata and existing resources within the dataset will be replaced by the new ones ensuring the information stays up-to-date. In order to avoid issues deriving from GeoServer been moved to another location thus changing its IP, it should be always address with the current domain name: http://geoserver.opendevelopmentcambodia.net ==== Executing the script ==== Run [[https://github.com/OpenDevelopmentMekong/odm-internal/tree/master/odm-migration/CKAN/import_scripts/scripts/import_from_geoserver.py|import_from_geoserver.py]] which can be found under odm-scripting/ckan-scripts. This script downloads and initializes the map layers from GeoServer. python import_from_geoserver.py Please be sure to edit the script and change , and to your CKAN's user API Key and Geoserver's Authorization header (Basic Auth) respectively, otherwise the script will fail. ===== Migrating Library publications from NGL to the Data Hub ===== Currently, NextGenLib is used to maintain a collection of Library Publications on old's ODC website. This system not only offers users the possibility to browse through a book and article catalog but to check the availability of certain publications in ODC's physical library. The existing records along with its metadata need to be imported into the new datahub module. However, the functionality to check the availability of publications in the physical library is not supported by ckan and would need to be programmed extra. By the moment this won't be supported. The records stored on NGL should be imported to the Data Hub programatically. For that, a script has been developed that aims to automate this process. Following workflow has been conceived: - Current Library publication records, stored on ODC's NextGenLib instance (http://library.opendevelopmentcambodia.net/) need to be extracted in MARC21 format. This export produces a single binary file. Let's call it **records.mrc** - The generated file, which contains the actual records, needs to be uploaded to the [[https://github.com/OpenDevelopmentMekong/odm-library|odm-library repository on Github]] in order to be available for the import script. - The developed import script can be found on the scripts folder maintained on the [[https://github.com/OpenDevelopmentMekong/odm-internal/|odm_internal repository on Github]]. This repository needs to be cloned in order for the script to be run. - Once the repository is cloned, the script can be found under /ckan_scripts/import_from_ngl.py. ==== Executing the script ==== Run [[https://github.com/OpenDevelopmentMekong/odm-internal/tree/master/odm-migration/CKAN/import_scripts/scripts/import_from_ngl.py|import_from_ngl.py]] which can be found under odm-internal/odm-migration. This script imports the Library publication records from NextGenLib into CKAN python import_from_ngl.py Please be sure to edit the script and change , and to CKAN's URL and PORT, your CKAN's user API Key and URL of the NGL instance respectively, otherwise the script will fail. ===== Archiving contents from the former ODC Wordpress site ===== In order to replicate the efect of the [[https://github.com/OpenDevelopmentMekong/wpckan|wpckan wordpress plugin]] to all of the previously created contents on opendevelopmentcambodia.net. A script has beeen written, which pulls XML files with exports of each relevant category on the wordpress site and archives it into CKAN assigning the created or modified datasets to specific Organizations and/or Groups. See above. ==== Executing the script ==== Run [[https://github.com/OpenDevelopmentMekong/odm-internal/tree/master/odm-migration/CKAN/import_scripts/scripts/import_odc_contents.py|import_odc_contents.py]] which can be found under odm-internal/odm-migration. python import_odc_contents.py Please be sure to edit the script and change , and to CKAN's URL and PORT, your CKAN's user API Key. ===== Loading Taxonomy structure into CKAN ===== Information across the platform is structured following a taxonomy which helps contents to be categorized after certain topics. This structure has to be also maintained in the Data Hub. The Taxonomy is available on the [[https://github.com/OpenDevelopmentMekong/odm-localization|odm-localization]] repository along with its translation in several languages. For importing the Taxonomy elements into ODM's CKAN instance, a script has being written which gets the structure of the taxonomy from the odm-localization repository and imports it into ckan as [[http://docs.ckan.org/en/latest/maintaining/tag-vocabularies.html|Tag Vocabularies]]. ==== Executing the script ==== Run [[https://github.com/OpenDevelopmentMekong/odm-internal/blob/master/odm-migration/CKAN/import_scripts/scripts/import_taxonomy_term_translations.py|import_taxonomy_tag_dictionaries.py|import_taxonomy_tag_dictionaries.py]] which can be found under odm-scripting/ckan-scripts. This script downloads and initializes the taxonomy structure into CKAN. python import_taxonomy_tag_dictionaries.py Please be sure to edit the script and change , to your user's API Key (which you noted before), otherwise the script will fail. ===== Importing translation terms for ODM's Taxonomy ===== The Taxonomy is available on the [[https://github.com/OpenDevelopmentMekong/odm-localization|odm-localization]] repository along with its translation in several languages. For importing the trasnalted Taxonomy elements into ODM's CKAN instance, a script has being written which gets the structure of the translated taxonomy elements and imports it into ckan as [[http://docs.ckan.org/en/latest/api/index.html?highlight=term_transla#ckan.logic.action.update.term_translation_update|Term translations]]. ==== Executing the script ==== Run [[https://github.com/OpenDevelopmentMekong/odm-internal/blob/master/odm-migration/CKAN/import_scripts/scripts/import_taxonomy_term_translations.py|import_taxonomy_term_translations.py]] which can be found under odm-scripting/ckan-scripts. This script downloads and initializes the taxonomy structure into CKAN. python import_taxonomy_tag_dictionaries.py Please be sure to edit the script and change , to your user's API Key (which you noted before), otherwise the script will fail. ===== Deleting datasets from a certain group ===== Sometimes it will be needed to remove datasets from a certain group (i.e Laws or maps). For that, the delete_datasets_in_group can be used. The script can be configured by specifying the following details in the DELETE_MAP variable in the config file: * **organization**: The organization owning the dataset. NOTE: This parameter overrides the group parameter. * **group**: The group from which the datasets will be deleted. * **state**: Delete only datasets with a certain state ( draft/active) * **limit**: Limit the number of datasets to remove everytime the script is run * field_**filter**: Here Key:Value objects can be specified in order to filter datasets to select after the contents of their extra fields. Use field_filter parameter to remove only datasets imported by the import scripts, specify this value 'field_filter':{'odm_contact':'ODM Importer','odm_contact_email':'info@opendevmekong.net'} ==== Executing the script ==== Run [[https://github.com/OpenDevelopmentMekong/odm-internal/tree/master/odm-migration/CKAN/import_scripts/scripts/delete_datasets_in_group.py|delete_datasets_in_group.py]] which can be found under odm_internal/odm_migration. This script gathers the list of datasets and removes them bulk-wise. python delete_datasets_in_group.py Please be sure to edit the script and change , to your user's API Key (which you noted before), otherwise the script will fail. When we run this script, filtered datasets will get the state 'deleted' but will still be available in the DB. In order to delete these datasets permanently, login with sysadmin credentials and point your browser to: [[http://data.opendevelopmentmekong.net/ckan-admin/trash|http://data.opendevelopmentmekong.net/ckan-admin/trash]]. Alternativelly, the instruction found under http://wiki.opendevelopmentmekong.net/code_snippets#purge_all_datasets_marked_as_deleted_on_ckan can be run in order to purge the deleted datasets.