Table of Contents

FIXME This page is outdate. Please help completing the wiki updates.
(remove this paragraph once the page is updated)

This page is outdated

Site analytics report

Who this guide is for

What this guide teaches

Note: The step-by-step guide on how to work with site analytics data has been simplified for those who have basis data skills. If you are an advanced data wrangler or a developer, work your magic!

Things to know beforehand

Questions to consider when analyzing site data:

ODI has developed two surveys to collect user feedback and testimonials.

  • Pop-up survey, to be implimented on the site, to gather data for user profiling and user need assessment.
  • User feedback and testimonial survey, to be sent to a selected number of active users, to gather feedbacks for improving the site and collect testimonies.

Workflow

  1. Identify the purpose of the site analytics report
  2. Wordpress or CKAN? Both Wordpress and CKAN are tracked by Google Analytics, and data can be sourced directly from GA. But CKAN data can also be sourced from CKAN and may be required, depending on the temporal range for which you are reporting.
  3. Determine the audience – Is it a donor report, an internal assessment, a PR product?
  4. Identify temporal range – Weekly, monthly, quarterly, annually, mid-term, specific date (e.g. October 1, 2016 – September 30, 2017)
  5. Identify indicators and data sources – A number of useful indicators are included below in the Glossary of terms; however, depending on the purpose of the analytics report, additional indicators may be added. Google Analytics is an endless source of parameters, comparisons, insights and combinations to discover how pages on OD platform perform. Having a set of specific questions helps identify the most useful and relevant parameters. Some donors may have a number of specific indicators they would like to see (e.g. gender, access from mobile devices, access from a certain location). Make sure to check the reporting requirements with the donors.
  6. Determine if comparison is needed – Do you need to benchmark against previous reporting period or another OD country instance (for internal evaluation)?
  7. Download the data – Never edit the original file. Make a copy of the file and conduct data manipulation and analysis off of the copy. It’s preferable to use Google Sheets for your calculations as it enables collaborative work.
  8. Manipulate and analyze the data – Data manipulation is the process of changing data in an effort to make it easier to read or be more organized. You may need to filter, sort, categorize and/or perform basic calculations.
  9. Visualize the data – Identify the right type of presentation for the data you want to visualize. Provide a short and compelling caption for each graph or data table.
  10. Draft a narrative for the report - This should be informed by the site analytics insights
  11. Share the draft report with colleagues - Invite comments/feedback prior to finalizing.
  12. After taking into account relevant feedback, finalize the report

NEVER EDIT THE ORIGINAL DATA. After you download raw data for analysis, rename the files to an easily accessible name and save them in one folder. Create a new Google Sheet and create a new tab for each dataset. Then, you can import (or copy and paste) the raw data over for analysis. Do not do manual entry!

A folder containing raw datasets and a centralized Google Sheet containing data which will be used in the following section are stored on this Google Drive folder

Data sources and their limitations

Google Analytics

We have been using Google Analytics to discern user behaviours and usage trends. Our CKAN data platform is now tracked separately from the Wordpress site, but both use Google Analytics to do this. While useful, it also has some limitations.

Google Analytics data is used to discern usage trends and average user behaviour on the OD platforms. Usage trends are demonstrated by the number of users and sessions, disaggregated by new and returning users. Average user behaviour is discerned by how users access the site, average time spent on the site, how likely they are to exit the site, and the pages that are the most visited during the reporting period. It is also possible to discern what the most downloaded datasets are, as well as the most viewed.

This data is limited in that Google Analytics only offers the information disaggregated by certain parameters. These limitations are described in the Working with Google Analytics generally section.

Data on CKAN usage is tracked by Google Analytics, and can be obtained both via Google Analytics as well as directly through the CKAN interface. However, data extracted directly through the CKAN interface is limited as it only provides information on activities from the date this feature was initiated on the platform in late 2019. In addition, raw data can only be downloaded for a maximum temporal period of one year. For older or longer periods, use Google Analytics directly.

Working with Google Analytics generally

Things to know beforehand

Glossary of terms

Google Analytics for Wordpress statistics

Note: Unfiltered contains spam data. Always choose Master.

User acquisitions - Where do users of the OD platform come from?

User acquisition data tells the source of traffic and the medium through which users came to the OD platform for the reporting period. A source can be either a search engine (Google) or a domain (www.mekongeye.com). Mediums include:

Since the OD platform hosts six interlinked websites, we can demonstrate how much traffic one OD site directs to another OD site (eg. ODL to ODM). This traffic is measured by number of sessions.

Referral data is also useful to understand how much traffic to the site has been directed from a partner organization's site (eg. Global Forest Watch to ODC).

Accessing user acquisition data

User acquisition data is readily available on Google Analytics. It is accessible through this path: Acquisition > Overview

From this acquisition overview page, you can find macro data on traffic source and number of sessions associated with each source. If this level of data is all you need, download the data by clicking on Export. You may save the file as CSV, Excel, Google Sheet, or PDF.

Assuming you save the file as CSV, Excel, or Google Sheet, you will see:

You can also access broad data on traffic to OD from social media and social networking sites. Follow this path: Acquisition > Social.

User acquisition data disaggregated by source and medium

Disaggregated data helps us to answer the following questions:

  1. What are some of the most popular search engines our users used?
  2. How much traffic to OD platform is directed from other OD sites?
  3. How much traffic to OD platform is directed from data partner websites?
  4. How much traffic to OD platform is directed from government / media / academic websites?

Below is a step-by-step guide on how to download and analyze traffic data to show direct traffic, organic search, and traffic via referrals from the OD platform and social media. Using the method below, you can also analyze how much traffic to OD platform comes from government, media, academic, NGOs, etc.

Step 1: Download raw data

Only the visible data is downloaded. In this case, only 10 rows of data would be downloaded if you clicked Export. Change the number of visible row to a number that is more than the total number of rows. In this example, choose 50 (there are 41 rows in total) .

Step 2: Working with raw data

Since this is data from ODM, the medium for “direct” is opendevelopmentmekong.net

Step 3: Determine data you want to identify

We want to be able to identify: 1) traffic from major search engines (e.g. Google, Bing, Yahoo), 2) traffic from other OD platforms (e.g. ODC, ODMm, ODL, ODT, ODV), and 3) traffic from social media (e.g. Facebook, Twitter).

Step 4: Transform / manipulate data

We need to transform the Source data to the above grouping.

If you want to analyze how much traffic to OD platform comes from government, media, academic, NGOs, etc., you will need to transform data in the Source column by associating .edu with Academia, .gov with Government, .org = NGOs and so on. For media organizations, you will need to perform a text search to find matches with news website URLs. Because this is a manual data transformation, there is an increased chance of inconsistency. Best practice would be to double-check the work and get a colleague to help reproduce the analysis using your method.

The various OD sites use either opendevelopment[country].net or the default URL country.opendevelopmentmekong.net. Both count as traffic from the OD platform. There may also be traffic directed from PP site and ODM Wiki to PROD. For reporting purposes, this traffic should not be identified.

Step 5: Analyze / visualize data

Now you're ready to analyze and visualize the data.

If your organization reports detail user acquisition data on a regular basis (monthly or quarterly), you may combine the monthly / quarterly reports rather than downloading and transforming data of a longer temporal range.

Users and sessions - How many users, how long they're staying, and how much of the site they're using

Basic user and session data

Please see the Glossary of terms, above, for basic definitions of these indicators.

Users: This data can be desegregated by returning and new to identify how many user cookies have been set over the reporting period.

Sessions: Session data can be broken down by the following:

You can read further here.

Basic Google Analytics report disaggregates user data by returning and new user segments (see below). However, user segments will need to be specified for Sessions data.

Step 1: Get the data

Step 2: Visualize and present the data

See the following for examples:

Bounce rate: Although the bounce rates seem relatively high for OD platform, it is possible that many return users are targeting specific pages for updates (i.e. daily news updates), spend their time reading these and leave, which constitutes a bounce even though the users found what they were looking for Therefore, it is possible that the bounce rates listed by google analytics above are skewed and not a true representation of return user behavior.

Sessions data disaggregated by bounce and non-bounce behavior

Non-bounce sessions are sessions where users (both returning and new users) view more than one page in a session.

It is important to note that Google reports disaggregated statistics for new and return users as a whole. Users cannot be desegregated as bounce or non-bounce users. However, data that describes user behavior in a session such as Average session duration and Page / Session can be disaggregated by bounce and non-bounce.

Available statistics that offer disaggregated user behavior by bounce and non-bounce behavior displays that in non-bounce sessions users (both returning and new users) spend even more time on the platform, approximately 5-7 minutes on average as opposed to 1-3 minutes.

This insights can be added to a report if deemed appropriate. Go gather the data:

Most visited pages

Step 1: Get the data

Step 2: Working with raw data

Step 3: Determine data you want to identify

We want to be able to classify applicable page with: Topic page, Maps, Data, Tags, News, or Profiles

Step 4: Transform / manipulate data

Step 5: Visualize / present the data

Below are two examples of how this data can be presented. The graph below counts total Pageviews for each content type. The table provides a list of these most viewed pages, each hyperlinked with the relevant URL.

Note: Although you might not use all the data downloaded, it's better to have more data on hands. You might be able to use it to help you produce the narrative section of the report. For example, it might be interesting for your team or the donor to know how much time on average users spent on one of your most popular page over the reporting period.

If your organization reports most visited pages, grouped by OD content types, on a regular basis (monthly or quarterly), you may combine the monthly / quarterly report rather than downloading and transforming data of a longer temporal range.

Working with 'Linked To Your Site'

For example:

By going to Google Analytics > Acquisition > Referrals, you can see that for the reporting period (October 1, 2016 - September 30, 2017), you can see that:

By going to LTYS report for opendevelopmentmekong.net < click “More …” under Who links the most < Search for the Land Portal and the Mekong Eye, you would see:

* The Mekong Eye have linked to 2 ODM pages and it exposed a link to these 2 pages on 7322 of its own pages. The ODM homepage has been linked to 7,306 pages on the Mekong Eye.

Note: Depending on what analysis you need, you might need to consolidate data for an external domain from the OD Datahub in order to demonstrate how a data partner is linking to the site. For example, the Land Portal is a data partner and have linked more to the OD Datahub rather than OD Mekong site. It has linked to 69 datasets on OD Datahub and has exposed these links on 169 of its web pages.

Main institutional user groups hyperlinking to OD Platform

To show how one OD platform might benchmark against another, the following demonstration will analyze and compare data from OD Mekong, ODC, ODMm, and OD Datahub as an example. Those working on an OD country instance may download only data for their respective site. If you want to access data from another OD instance, please contact the administrator of that country site.

The raw data will be stored here and the analysis will be conducted on this centralized Google Sheet

Step 1: Download the data

Links to Your Site data was downloaded on November 15, 2017 for analysis for this guide.

Step 3: Determine institutional user groups and domain extension

The domains data can be classified into:

For external domains, we want to identify the following institutional user groups:

Domain extension can be identified and classified. The following assumptions are made for this analysis:

Some CSOs may have a .com domain (e.g. sahrika.com). Some media organization / newsroom may have a .org or .net domain. Academia might have a .net domain (e.g. researchgate.net). Thus, using this transformation method, the number of CSOs or media websites linking to OD platform might be skewed. Try your best to identify these and document your assumptions. A domain should only be assigned to one user group. Since LTYS data only offers a sample of links, this analysis should be accepted as it as: insights on which pages have been hyperlinked to by external domains, indicating their popularity and amplifying the reach of the OD platform to users of these domains. The data generally reveals a broad group of users from government, media, and civil society.

Step 4: Transform the data

Note: Random entity might have a .net extension. They shouldn't be classified as civil society. Add them to “Other” category. .com domains are not very useful for this analysis and will also be classified as “Other”.

Step 5: Visualize and present the data

You may present the data as a data table or in a graphic presentation.

Government websites hyperlinking to OD Platform

We are often asked if government agencies have used data offered on OD Platform. LTYS data analyzed above sheds light on which government institutions have found our content useful enough to link it to their website.

For example: The Ministry of Commerce of Cambodia have linked to three pages on ODC.

On ODC, MoC has linked to three pages, each displaying all content which has been tagged with “fdi” (Foreign Direct Investment), “construction-industry”, and “rubber-export”. ODC uses these keywords to tag relevant news article curated on the site. This indicate that some staffer at the MOC has using ODC website to browse news and to conduct research. Clicking on the “fdi” tag, we can see that MOC has been referencing this tag page in multiple of its report.

Most hyperlinked pages

LTYS report also offers data on most linked pages for each OD Platform. The data is accessible via LTYS report > Your most linked content < “More”

Source domains is an important indicator. It tells you how many websites have hyperlinked to a certain OD page.

Since OD Platforms, each with a different URL, are regarded by Google crawlers as external website, LTYS data also include linkages from other OD instances. To Truly present linkages from 'external' domains, the data needs to be adjusted. In the following example, ODMm has hyperlinked to ODM Land page. Thus, number of source domains linking to ODM Land page needs to be reduced by 1 and the number of links needs to be reduced by 4.

Step 1: Download the data

Note that the page URLs already contains information about OD content type. Fore example /topic/ = Topic page, /updates/ = Site updates etc. Editors can easily verify these markers with the custom-post types on WordPress.

Step 2: Transform data

Why you shouldn't use unadjusted data: From the data above we can see which content types have been hyperlinked the most by external domains.

We can also see which topic pages are the most linked.

However the number of source domains hyperlinking to each page maybe over reported since the figures might contain hyperlinks from other OD instance. This problem is of a greater concern to ODC since the site has been operational longer.

Step 2: Adjust the data

Since we need to look up each page one by one on LTYS in order to find out if other ODC instances have linked that specific page, it's best to clearly identify a small set of pages to look up.

Using the same method with ODC data, remove figures for linkages from ODM and ODMm.

Before adjustment:

After adjustment:

Step 2: Visualize and present the data

By filtering the LTYS data further we found that the most linked content type for ODM were the topic page, with the Land page the most linked topic by external domains. It recorded 11 external domains who hyperlinked at least 5 times on average to the land page.

For ODC the most linked content types were the profile pages, which continues to be the Economic Land Concession, Mining and Natural Protected Areas datasets, which have been periodically updated throughout the year. This highlights the demand for detailed national level datasets and the uniqueness of our platform to offer these.

Google Analytics for CKAN statistics

CKAN for CKAN statistics

Things to know beforehand