User Tools

Site Tools




Basic concepts and guidelines

User Guides

For users

For mappers


Learn about Dokuwiki


FIXME This page is outdate. Please help completing the wiki updates.
(remove this paragraph once the page is updated)

This page is outdated

Site analytics report

Who this guide is for

  • Monitoring and evaluation officers who produce data for reporting
  • Editors and data officers who assist with reporting
  • Program development officers who use site analytics data for presentation and fundraising purposes

What this guide teaches

  • Introduction to site analytics indicators and workflow
  • How to identify, gather, manipulate and analyze site analytics data for CKAN and WP

Note: The step-by-step guide on how to work with site analytics data has been simplified for those who have basis data skills. If you are an advanced data wrangler or a developer, work your magic!

Things to know beforehand

Questions to consider when analyzing site data:

  • Are you trying to understand CKAN or WP usage?
  • How many users and where do they come from?
  • Do they specifically come to the site or are we showing up as a search result from their research online? What are the terms they are searching for?
  • What do they do when they arrive on the site?
  • How long do they stay?
  • Which areas of the platform are they most interested in?
  • What do they read or download off the site?
  • How do they navigate through the platform and between the countries if at all?
  • How do they use interactive tools (i.e. map explorer and profiles)?

ODI has developed two surveys to collect user feedback and testimonials.

  • Pop-up survey, to be implimented on the site, to gather data for user profiling and user need assessment.
  • User feedback and testimonial survey, to be sent to a selected number of active users, to gather feedbacks for improving the site and collect testimonies.


  1. Identify the purpose of the site analytics report
  2. Wordpress or CKAN? Both Wordpress and CKAN are tracked by Google Analytics, and data can be sourced directly from GA. But CKAN data can also be sourced from CKAN and may be required, depending on the temporal range for which you are reporting.
  3. Determine the audience – Is it a donor report, an internal assessment, a PR product?
  4. Identify temporal range – Weekly, monthly, quarterly, annually, mid-term, specific date (e.g. October 1, 2016 – September 30, 2017)
  5. Identify indicators and data sources – A number of useful indicators are included below in the Glossary of terms; however, depending on the purpose of the analytics report, additional indicators may be added. Google Analytics is an endless source of parameters, comparisons, insights and combinations to discover how pages on OD platform perform. Having a set of specific questions helps identify the most useful and relevant parameters. Some donors may have a number of specific indicators they would like to see (e.g. gender, access from mobile devices, access from a certain location). Make sure to check the reporting requirements with the donors.
  6. Determine if comparison is needed – Do you need to benchmark against previous reporting period or another OD country instance (for internal evaluation)?
  7. Download the data – Never edit the original file. Make a copy of the file and conduct data manipulation and analysis off of the copy. It’s preferable to use Google Sheets for your calculations as it enables collaborative work.
  8. Manipulate and analyze the data – Data manipulation is the process of changing data in an effort to make it easier to read or be more organized. You may need to filter, sort, categorize and/or perform basic calculations.
  9. Visualize the data – Identify the right type of presentation for the data you want to visualize. Provide a short and compelling caption for each graph or data table.
  10. Draft a narrative for the report - This should be informed by the site analytics insights
  11. Share the draft report with colleagues - Invite comments/feedback prior to finalizing.
  12. After taking into account relevant feedback, finalize the report

NEVER EDIT THE ORIGINAL DATA. After you download raw data for analysis, rename the files to an easily accessible name and save them in one folder. Create a new Google Sheet and create a new tab for each dataset. Then, you can import (or copy and paste) the raw data over for analysis. Do not do manual entry!

A folder containing raw datasets and a centralized Google Sheet containing data which will be used in the following section are stored on this Google Drive folder

Data sources and their limitations

Google Analytics

We have been using Google Analytics to discern user behaviours and usage trends. Our CKAN data platform is now tracked separately from the Wordpress site, but both use Google Analytics to do this. While useful, it also has some limitations.

Google Analytics data is used to discern usage trends and average user behaviour on the OD platforms. Usage trends are demonstrated by the number of users and sessions, disaggregated by new and returning users. Average user behaviour is discerned by how users access the site, average time spent on the site, how likely they are to exit the site, and the pages that are the most visited during the reporting period. It is also possible to discern what the most downloaded datasets are, as well as the most viewed.

This data is limited in that Google Analytics only offers the information disaggregated by certain parameters. These limitations are described in the Working with Google Analytics generally section.

Data on CKAN usage is tracked by Google Analytics, and can be obtained both via Google Analytics as well as directly through the CKAN interface. However, data extracted directly through the CKAN interface is limited as it only provides information on activities from the date this feature was initiated on the platform in late 2019. In addition, raw data can only be downloaded for a maximum temporal period of one year. For older or longer periods, use Google Analytics directly.

Working with Google Analytics generally

Things to know beforehand

  • Admin of the site as well as those working on donor reporting should already have access permission to Google Analytics. To request access, contact an ODI administrator.
  • For donor reporting, it can be useful to break down the data by quarter. If you want to do this, it is best to download data on each quarter separately. This is because each data point which is downloaded for a specific temporal range is dated with the same range, thus if you set the temporal range for October 1, 2016 to September 30, 2017, it cannot be separated into week, month, or quarter.

Glossary of terms

  • Users: Counter to expectations, this is not a count of unique users on the platform. Google Analytics calculates the number of users based on a cookie that is set by the user’s browser. That means if a user accesses the website from a different browser or device, he/she might be counted as multiple users. Google Analytics used to offer “Users” as the unique number of visitors who visit a site. The number used to represent exactly how many individual people were on the site. This is no longer the case. If you're interested, see more here:
  • Sessions: A session is a group of interactions by one user with the site that take place within a given time frame. One unique visitor may initiate multiple sessions in a day. Sessions are typically refreshed after 30 minutes of inactivity. If you're interested, see more here:
  • Unique pageview: The number of visits to any given page. If a page was viewed multiple times during one visit, it is only counted once.
  • Bounce rate: A bounce is a single-page visit to the site that does not continue to another page. If users enter our site on a certain page (e.g. the landing page), and then leave the site without visiting another page, they have bounced. Google records limited statistics for these users: they are represented as having viewed 1 page and spending 0 minutes on the site. The bounce rate tells us what percentage of our visitors exit the site only visiting a single page. This is different from an exit rate, which is the percentage of visitors which exit the site off of a particular page. If you're interested, see more here:
  • Non-bounce sessions: Sessions where users view more than one page in a session.
  • Average session duration: This tells us how long, on average, visitors spend on our site. Note: this metric is likely underestimated since Google averages site visit duration for bounced visitors as 0.
  • Source: The origin of traffic to the site, such as a search engine or a domain.
  • Medium: General category of the source, including organic, referral, email, or none.
  • Referral traffic: This is traffic that arrives on the site through another source, such as a link on another domain. Analytics automatically recognizes where traffic was immediately before arriving on the OD platform site, and displays the domain names of these sites as sources.

Google Analytics for Wordpress statistics

Note: Unfiltered contains spam data. Always choose Master.

User acquisitions - Where do users of the OD platform come from?

User acquisition data tells the source of traffic and the medium through which users came to the OD platform for the reporting period. A source can be either a search engine (Google) or a domain ( Mediums include:

  • Direct (a user typed the OD platform URL into the web browser)
  • Organic search (A user clicked on the OD platform link from a search engine source such as Google)
  • Cost-per-click search (A user clicked on a paid link from a search engine source such as Google)
  • Email (A user clicked on a custom medium link made by OD)
  • Referral (A user clicked a link published on an external website, inclusive of social media platforms, link contained in a non-OD e-mail newsletter)

Since the OD platform hosts six interlinked websites, we can demonstrate how much traffic one OD site directs to another OD site (eg. ODL to ODM). This traffic is measured by number of sessions.

Referral data is also useful to understand how much traffic to the site has been directed from a partner organization's site (eg. Global Forest Watch to ODC).

Accessing user acquisition data

User acquisition data is readily available on Google Analytics. It is accessible through this path: Acquisition > Overview

From this acquisition overview page, you can find macro data on traffic source and number of sessions associated with each source. If this level of data is all you need, download the data by clicking on Export. You may save the file as CSV, Excel, Google Sheet, or PDF.

Assuming you save the file as CSV, Excel, or Google Sheet, you will see:

You can also access broad data on traffic to OD from social media and social networking sites. Follow this path: Acquisition > Social.

User acquisition data disaggregated by source and medium

Disaggregated data helps us to answer the following questions:

  1. What are some of the most popular search engines our users used?
  2. How much traffic to OD platform is directed from other OD sites?
  3. How much traffic to OD platform is directed from data partner websites?
  4. How much traffic to OD platform is directed from government / media / academic websites?

Below is a step-by-step guide on how to download and analyze traffic data to show direct traffic, organic search, and traffic via referrals from the OD platform and social media. Using the method below, you can also analyze how much traffic to OD platform comes from government, media, academic, NGOs, etc.

Step 1: Download raw data

  • Go to Acquisition > All traffic > Source/Medium.

  • Do not click export yet. Scroll all the way down to the data table.

Only the visible data is downloaded. In this case, only 10 rows of data would be downloaded if you clicked Export. Change the number of visible row to a number that is more than the total number of rows. In this example, choose 50 (there are 41 rows in total) .

  • Scroll back up and click Export and save as Excel or Google Sheet.

Step 2: Working with raw data

  • Copy the data to a new tab in Google Sheet, or if you downloaded an .xlsx file you can copy the data to a new Google Sheet. See this sheet on Google Drive.
  • Take note of the totals for data verification later. Note that the total number of sessions from all traffic is 2792.
  • Delete the total and unrelated data. See below image; here it would be the rows and columns that have been highlighted, as well as the metadata that would show up above the header row.

  • Add two new columns to the right of Source / Medium (column A). You should have column B and C blank.
  • Copy Source / Medium column and paste it into column B.
  • Select column B > then go to Data > Split text to columns > custom > enter “/” sign. You should now have the following:

Since this is data from ODM, the medium for “direct” is

Step 3: Determine data you want to identify

We want to be able to identify: 1) traffic from major search engines (e.g. Google, Bing, Yahoo), 2) traffic from other OD platforms (e.g. ODC, ODMm, ODL, ODT, ODV), and 3) traffic from social media (e.g. Facebook, Twitter).

Step 4: Transform / manipulate data

We need to transform the Source data to the above grouping.

  • Add a new column next to the Source column. Call it Source (analyzed) to differentiate.
  • Use the filter function to view data by Medium.
  • Assign appropriate categories (e.g. government, media, NGOs, etc.).

If you want to analyze how much traffic to OD platform comes from government, media, academic, NGOs, etc., you will need to transform data in the Source column by associating .edu with Academia, .gov with Government, .org = NGOs and so on. For media organizations, you will need to perform a text search to find matches with news website URLs. Because this is a manual data transformation, there is an increased chance of inconsistency. Best practice would be to double-check the work and get a colleague to help reproduce the analysis using your method.

The various OD sites use either opendevelopment[country].net or the default URL Both count as traffic from the OD platform. There may also be traffic directed from PP site and ODM Wiki to PROD. For reporting purposes, this traffic should not be identified.

  • Now that the categorization is complete, use Pivot table function to count number of sessions for each medium. Make sure the grand total is the same as the number provided in the raw data (in this example 2792).

Step 5: Analyze / visualize data

Now you're ready to analyze and visualize the data.

If your organization reports detail user acquisition data on a regular basis (monthly or quarterly), you may combine the monthly / quarterly reports rather than downloading and transforming data of a longer temporal range.

Users and sessions - How many users, how long they're staying, and how much of the site they're using

Basic user and session data

Please see the Glossary of terms, above, for basic definitions of these indicators.

Users: This data can be desegregated by returning and new to identify how many user cookies have been set over the reporting period.

Sessions: Session data can be broken down by the following:

  • Average session duration data shows Average time returning and new users spent on the Platform, calculated by the date range specified divided by total number of sessions
  • Page / Session data shows the average number of pages on the Platform viewed per session
  • Bounce rate shows the percentage of users who viewed only one page compared to the total number of users.

You can read further here.

Basic Google Analytics report disaggregates user data by returning and new user segments (see below). However, user segments will need to be specified for Sessions data.

Step 1: Get the data

  • User and sessions data are accessible via Audience > Overview. Make sure to set the temporal range parameter appropriately. The following example is for user and session data in the fourth quarter of 2016 (October 1 to December 31, 2016)

  • Clicking on the Add segment sign, add New users and Returning users to the segment. Uncheck All users.

  • Copy and paste the following data (in the red box) to a new tab on your centralized Google Sheet. Name it appropriately. Note that you may also export the raw data, but it would be more time consuming to organize and calculate them later. Since there are only a few data points and we are not going to perform any data transformation / manipulation, it's not necessary to work with raw data.

  • Repeat this process for other temporal range as needed. In this example, extract data for three other quarters. You should have the following:

Step 2: Visualize and present the data

See the following for examples:

Bounce rate: Although the bounce rates seem relatively high for OD platform, it is possible that many return users are targeting specific pages for updates (i.e. daily news updates), spend their time reading these and leave, which constitutes a bounce even though the users found what they were looking for Therefore, it is possible that the bounce rates listed by google analytics above are skewed and not a true representation of return user behavior.

Sessions data disaggregated by bounce and non-bounce behavior

Non-bounce sessions are sessions where users (both returning and new users) view more than one page in a session.

It is important to note that Google reports disaggregated statistics for new and return users as a whole. Users cannot be desegregated as bounce or non-bounce users. However, data that describes user behavior in a session such as Average session duration and Page / Session can be disaggregated by bounce and non-bounce.

Available statistics that offer disaggregated user behavior by bounce and non-bounce behavior displays that in non-bounce sessions users (both returning and new users) spend even more time on the platform, approximately 5-7 minutes on average as opposed to 1-3 minutes.

This insights can be added to a report if deemed appropriate. Go gather the data:

  • Go to Audience < Overview. Make sure to set the temporal range parameter appropriately. Example below is for Q3, 2017.
  • Clicking on the Add segment sign, add Bounce sessions and Non-bounce sessions to the segment. Uncheck All users.
  • Copy and paste the following data (in the red box) to a new tab on your centralized Google Sheet. Name it appropriately.

  • Repeat this process for other temporal range as needed. In this example, extract data for three other quarters. You should have the following:

Most visited pages

Step 1: Get the data

  • Got to Behavior > Site content > All pages. Make sure to set the temporal range parameter appropriately. The following example October 1, 2016 to September 30, 2017.

  • Do not click export yet. Scroll all the way down to the data table. There are thousand of pages with Pageviews data. Choose to view the top 25 most viewed pages.

  • Scroll back up and click Export and save as Excel or Google Sheet. You should have:

Step 2: Working with raw data

  • Add the raw data to a centralized folder

Step 3: Determine data you want to identify

We want to be able to classify applicable page with: Topic page, Maps, Data, Tags, News, or Profiles

Step 4: Transform / manipulate data

  • Create a column before the “Page” column. Mark applicable pages. Homepage and 'about' pages cannot be classified under an OD content type (above) and thus are not relevant for reporting purposes. Filter the column to show only applicable pages.

  • Create two additional column after the “Page” column. Assign relevant content type to the pages in the “Content type” column and add a note each page.

Step 5: Visualize / present the data

Below are two examples of how this data can be presented. The graph below counts total Pageviews for each content type. The table provides a list of these most viewed pages, each hyperlinked with the relevant URL.

Note: Although you might not use all the data downloaded, it's better to have more data on hands. You might be able to use it to help you produce the narrative section of the report. For example, it might be interesting for your team or the donor to know how much time on average users spent on one of your most popular page over the reporting period.

If your organization reports most visited pages, grouped by OD content types, on a regular basis (monthly or quarterly), you may combine the monthly / quarterly report rather than downloading and transforming data of a longer temporal range.

Working with 'Linked To Your Site'

  • LTYS technically reports 'referrals' which is also an indicator reported on Google Analytics. However, Google Analytics reports the amount of traffic (measured by number of sessions) directed to OD Platform from one external site (domain); it doesn't offer information on which OD pages has been hyperlinked on that site.

For example:

By going to Google Analytics > Acquisition > Referrals, you can see that for the reporting period (October 1, 2016 - September 30, 2017), you can see that:

  • The Land Portal have linked to OD Mekong. For the reporting period, via the links to OD Mekong on the Land Portal website, a number of users have visited OD Mekong, 23 of those were new users. They combined had 95 sessions. On average they view 1.8 pages per session and spent 1 mins and 40 seconds on the site.
  • The Mekong Eye also have linked to OD Mekong. For the same reporting period, via the links to OD Mekong on the Mekong Eye website a number of users have visited OD Mekong, 43 of those were new users. They combined had 92 sessions. On average they engaged more with OD Mekong – they viewed 2.2 pages per session and spent 2 mins and 17 seconds on the site.

By going to LTYS report for < click “More …” under Who links the most < Search for the Land Portal and the Mekong Eye, you would see:

  • The Land Portal have linked to 4 ODM pages and it exposed a link to these 4 pages on 57 of its own pages.

* The Mekong Eye have linked to 2 ODM pages and it exposed a link to these 2 pages on 7322 of its own pages. The ODM homepage has been linked to 7,306 pages on the Mekong Eye.

  • These are some of the pages on the Mekong Eye that contains a link to ODM homepage.

Note: Depending on what analysis you need, you might need to consolidate data for an external domain from the OD Datahub in order to demonstrate how a data partner is linking to the site. For example, the Land Portal is a data partner and have linked more to the OD Datahub rather than OD Mekong site. It has linked to 69 datasets on OD Datahub and has exposed these links on 169 of its web pages.

Main institutional user groups hyperlinking to OD Platform

To show how one OD platform might benchmark against another, the following demonstration will analyze and compare data from OD Mekong, ODC, ODMm, and OD Datahub as an example. Those working on an OD country instance may download only data for their respective site. If you want to access data from another OD instance, please contact the administrator of that country site.

The raw data will be stored here and the analysis will be conducted on this centralized Google Sheet

Step 1: Download the data

  • Go to Link to Your Site > Who links the most > “More” > Download this table

Links to Your Site data was downloaded on November 15, 2017 for analysis for this guide.

  • Properly name the file and add it to your centralized folder for raw data.
  • Copy and past the data to a properly named tab on your working Google Sheet. If you are working with data from other OD sites, add a “Linking to:” Column and make sure to properly mark the data using the shorthand for each OD platform.

Step 3: Determine institutional user groups and domain extension

The domains data can be classified into:

  • OD domains
  • Non-OD domains (which are true external domains)

For external domains, we want to identify the following institutional user groups:

  • Government
  • Academia
  • Civi society
  • Media
  • Private sector / business

Domain extension can be identified and classified. The following assumptions are made for this analysis:

  • .org, .net, .info, and relevant .de extensions = Civil society organization (Double check the domains if needed. A number of German institutions have linked to OD Platforms)
  • .edu and .ac = academia
  • .gov, .go and relevant extensions = government
  • .com = business
  • News domains are identified by searching for exact match to known URL of media houses OR by filtering the text for the word “news”, “tribune”, “times”, “post”, and actual newsroom url (e.g. is an online media publisher). Using this assumption, the number of media websites linking to OD platform might be underreported.

Some CSOs may have a .com domain (e.g. Some media organization / newsroom may have a .org or .net domain. Academia might have a .net domain (e.g. Thus, using this transformation method, the number of CSOs or media websites linking to OD platform might be skewed. Try your best to identify these and document your assumptions. A domain should only be assigned to one user group. Since LTYS data only offers a sample of links, this analysis should be accepted as it as: insights on which pages have been hyperlinked to by external domains, indicating their popularity and amplifying the reach of the OD platform to users of these domains. The data generally reveals a broad group of users from government, media, and civil society.

Step 4: Transform the data

  • Copy the data to a new sheet for analysis. Properly name the new sheet
  • Evaluate wether a domain is internal (OD instances) or external. You can use the Filter function < Filter by condition < Text contains < type in opendev < click Enter

  • In the Internal / External column, mark these rows with “Internal”.

  • Double-check. Filter “Internal / External” column. Select (Blank) and mark the rest with “External”. You should have the following.

  • Filter “Internal / External” column for “External”. To identify the relevant domain extensions noted in Step 3, you may either 1) use the same Filter by condition function and mark the extensions properly in a new column called “Domain extension” OR 2) use the Split text to column function.
  • In another column called “User group”, label the data with the previously defined institutional user grouping in Step 3.

Note: Random entity might have a .net extension. They shouldn't be classified as civil society. Add them to “Other” category. .com domains are not very useful for this analysis and will also be classified as “Other”.

  • Using the Pivot table function, summarize the data you need for analysis. Note that the figures in the Pivot table below represent number of external domains in each institutional user group.

Step 5: Visualize and present the data

You may present the data as a data table or in a graphic presentation.

Government websites hyperlinking to OD Platform

We are often asked if government agencies have used data offered on OD Platform. LTYS data analyzed above sheds light on which government institutions have found our content useful enough to link it to their website.

  • Filter the User group column for “Government” we have the following:

  • Identify the government institution names and organize the data for presentation if need.

  • With the domain information provided, you may go to LTYS website and find out which OD pages they each have linked to and how have the OD pages been displayed on their website.

For example: The Ministry of Commerce of Cambodia have linked to three pages on ODC.

On ODC, MoC has linked to three pages, each displaying all content which has been tagged with “fdi” (Foreign Direct Investment), “construction-industry”, and “rubber-export”. ODC uses these keywords to tag relevant news article curated on the site. This indicate that some staffer at the MOC has using ODC website to browse news and to conduct research. Clicking on the “fdi” tag, we can see that MOC has been referencing this tag page in multiple of its report.

Most hyperlinked pages

LTYS report also offers data on most linked pages for each OD Platform. The data is accessible via LTYS report > Your most linked content < “More”

Source domains is an important indicator. It tells you how many websites have hyperlinked to a certain OD page.

Since OD Platforms, each with a different URL, are regarded by Google crawlers as external website, LTYS data also include linkages from other OD instances. To Truly present linkages from 'external' domains, the data needs to be adjusted. In the following example, ODMm has hyperlinked to ODM Land page. Thus, number of source domains linking to ODM Land page needs to be reduced by 1 and the number of links needs to be reduced by 4.

Step 1: Download the data

  • Go to LTYS report > Your most linked content > “More” > Download this table
  • Add the raw data to a centralized folder

Note that the page URLs already contains information about OD content type. Fore example /topic/ = Topic page, /updates/ = Site updates etc. Editors can easily verify these markers with the custom-post types on WordPress.

Step 2: Transform data

  • We want to use the “Split text to column” function to separate out the page URL. Before doing that, move the “Links” and “Source domains” columns and place them before the “Your pages” column.
  • Copy content from “Your pages” column and past it in column D.

  • Select column D, go to Data > Split text to columns > Split by custom separator “/”

  • Clean up the data by combining similar markers (for example news and news-source) and separating irrelevant pages such as homepage, partnership page, terms of use, etc by marking them with “internal” in a new column. Statistics for these pages might be useful to know but we do not need it for donor reporting.

Why you shouldn't use unadjusted data: From the data above we can see which content types have been hyperlinked the most by external domains.

We can also see which topic pages are the most linked.

However the number of source domains hyperlinking to each page maybe over reported since the figures might contain hyperlinks from other OD instance. This problem is of a greater concern to ODC since the site has been operational longer.

Step 2: Adjust the data

Since we need to look up each page one by one on LTYS in order to find out if other ODC instances have linked that specific page, it's best to clearly identify a small set of pages to look up.

Using the same method with ODC data, remove figures for linkages from ODM and ODMm.

Before adjustment:

After adjustment:

Step 2: Visualize and present the data

By filtering the LTYS data further we found that the most linked content type for ODM were the topic page, with the Land page the most linked topic by external domains. It recorded 11 external domains who hyperlinked at least 5 times on average to the land page.

For ODC the most linked content types were the profile pages, which continues to be the Economic Land Concession, Mining and Natural Protected Areas datasets, which have been periodically updated throughout the year. This highlights the demand for detailed national level datasets and the uniqueness of our platform to offer these.

Google Analytics for CKAN statistics

  • Go to: and sign-in
  • Select your country instance under Analytics Account > OD instance > OD instance WordpressGA > OD instance Wordpress views

CKAN for CKAN statistics

Things to know beforehand

  • To access this, administrator-level permissions are required. To request access, contact an ODI administrator.
admin/site_analytics.txt · Last modified: 2020/07/10 08:15 by mchung