About
Basic concepts and guidelines
User Guides
For users
For mappers
Technical
Learn about Dokuwiki
About
Basic concepts and guidelines
User Guides
For users
For mappers
Technical
Learn about Dokuwiki
This is an old revision of the document!
Note: The step-by-step guide on how to work with site analytics data has been simplified for those who have basis data skills. If you are an advanced data wrangler or a developer, use your magic.
Questions to consider when analyzing site data:
ODI has developed two surveys to collect user feedback and testimonials.
If you download raw data for analysis, make sure to NEVER edit the original data. Properly name the files and save them in one folder. Create a centralized Google Sheet and for each dataset create a tab where you can import (or copy and paste) the raw data over for analysis.
A folder containing raw datasets and a centralized Google Sheet containing data which will be used in the following section are stored on this Google Drive folder
We have been using Google Analytics and ‘Linked to Your Site’ report, also a Google tool, to discern user behaviours and usage trends. Each offer specific insights and has its own set of limitations:
Google Analytics data is used to discern usage trends and average user behaviour on OD platform. Usage trends are demonstrated by number of users and sessions, disaggregated by new and returning users. Average user behaviour is discerned by how users access the site, average time spent on the site, how likely they are to exit the site, and most visited pages during the reporting period.
There are limitations associated with this data as Google Analytics only offers the information disaggregated by certain parameters. These limitations are described in the Working with Google Analytics section.
‘Links to Your Site’ (LTYS) reports websites that make linkages to the OD platforms. Also a Google tool, it deploys crawlers to discover the hyperlinks directed to OD pages and data from external domains. With data manipulation, you may discern the most hyperlinked topics, pages and content types, and allows us to identify who hyperlinks to OD platform from external domains.
The LTYS data, however, also has its limitations. For instance the data is collected at a single point in time and thus is not dated and cannot be filtered for the reporting period. However, the data offers richer information than Google Analytics for insights on which pages have been hyperlinked to by external domains, indicating their popularity and amplifying the reach of the OD platforms to users of these domains. The data can be manipulated (categorize and grouping) and analyzed to reveal a broad group of users from government, media and civil society.
Note: Unfiltered data contains spam data. Always choose Master.
ODM data from October 1, 2016 to September 30, 2017 will be used for the following step-by-step demonstration.
User acquisition data tells the source of traffic and the medium through which users came to the OD platform for the reporting period. There are three types of source:
Since the OD platform hosts six websites and each is making linkages to another, we want to demonstrate how much traffic to one of the OD site is directed from other OD sites. For example, how much traffic to ODM is directed from OD country instances. This traffic is measured by number of sessions.
Referral data is also useful if we want to measure how much traffic to the site has been directed from a partner organization's site. For example, how many sessions to ODC has been directed from the Global Forest Watch or a Cambodian government websites.
Basic referral data is readily available on Google Analytics. It is accessible through this path: Acquisition > Overview (you should see this page: https://analytics.google.com/analytics/web/#report/trafficsources-overview/)
From this acquisition overview page, you can find macro data on traffic source and number of sessions associated with each source. If this level of data is all you need, download the data by clicking on Export. You may save the file as CSV, Excel, Google Sheet, or PDF.
Assuming you save the file as CSV, Excel, or Google Sheet, you will see:
The same process can be done to access broad data on traffic to OD platform from social media and social network sites.
Disaggregated data helps us answer the following questions:
Below is a step-by-step guide on how to download and analyze traffic data to show direct traffic, organic search, and traffic via referrals from OD platform and social media. Using the method below, you may also analyze how much traffic to OD platform comes from government, media, academic, NGOs, etc.
Step 1: Download raw data
Only the visible data is downloaded. In this case, only 10 rows of data would be downloaded if you clicked Export. Change the number of visible row to a number that is more than the total number of rows. In this example, choose 250 (there are 246 rows in total).
Step 2: Working with raw data
Select column B > go to Data > Split text to columns > Split by “/” sign. You should have the following:
This is data from ODM site. Thus the medium for “direct” is opendevelopmentmekong.net
Step 3: Determine data you want to identify
We want to be able to identify: 1) traffic from major search engines (e.g. Google, Bing, Yahoo), 2) traffic from other OD platforms (e.g. ODC, ODMm, ODL, ODT, ODV), and 3) traffic from social media (e.g. Facebook, Twitter).
Step 4: Transform / manipulate data
We need to transform the Source data to the above grouping.
If you want to analyze how much traffic to OD platform comes from government, media, academic, NGOs, etc., you will need to transform data in the Source column by associating .edu with Academia, .gov with Government, .org = NGOs and so on. For media organization, you will need to perform a text search to find match with news website URL. This manual data transformation might produce some inconsistency. Make sure you double-check the work and get a colleague to help reproduce the data using your method.“
ODC and ODMm uses opendev[country].net as well as the default URL country.opendevelopmentmekong.net. Make sure to count traffic from both as traffic directed from the OD platform. There are traffic directed from PP site and ODM Wiki to PROD. For reporting purposes, this traffic doesn't need to be identified.
Step 5: Analyze / visualize data
Below is an example of how this data can be presented.
If your organization reports detail user acquisition data on a regular basis (monthly or quarterly), you may combine the monthly / quarterly reports rather than downloading and transforming data of a longer temporal range.
Temporal range: For donor reporting, it's useful to breakdown the data by quarter. In the example below, we will gather data from October 1, 2016 to September 30, 2017, which are four quarters:
Each data point which is downloaded for a specific temporal range is dated with the same range, thus if you set the temporal range for October 1, 2016 to September 30, 2017, it cannot be separated into week, month, or quarter. It's a good practice to define the specific temporal range segments and download the data for each segment.
Users: Gather data to show how many users, desegregated by returning and new users, have visited the OD platform over the reporting period.
Sessions: A session is a group of interactions by one user with the site that take place within a given time frame. One unique visitor may initiate multiple sessions in a day. Sessions are typically refreshed after 30 minutes of inactivity.
Basic Google Analytics report disaggregates user data by returning and new user segments (see below). However, user segments will need to be specified for Sessions data.
Step 1: Get the data
Step 2: Visualize and present the data
See the following for examples:
Bounce rate: Although the bounce rates seem relatively high for OD platform, it is possible that many return users are targeting specific pages for updates (i.e. daily news updates), spend their time reading these and leave, which constitutes a bounce even though the users found what they were looking for Therefore, it is possible that the bounce rates listed by google analytics above are skewed and not a true representation of return user behavior.
Non-bounce sessions are sessions where users (both returning and new users) view more than one page in a session.
It is important to note that Google reports disaggregated statistics for new and return users as a whole. Users cannot be desegregated as bounce or non-bounce users. However, data that describes user behavior in a session such as Average session duration and Page / Session can be disaggregated by bounce and non-bounce.
Available statistics that offer disaggregated user behavior by bounce and non-bounce behavior displays that in non-bounce sessions users (both returning and new users) spend even more time on the platform, approximately 5-7 minutes on average as opposed to 1-3 minutes.
This insights can be added to a report if deemed appropriate. Go gather the data:
Step 1: Get the data
Step 2: Working with raw data
Step 3: Determine data you want to identify
We want to be able to classify applicable page with: Topic page, Maps, Data, Tags, News, or Profiles
Step 4: Transform / manipulate data
Step 5: Visualize / present the data
Below are two examples of how this data can be presented. The graph below counts total Pageviews for each content type. The table provides a list of these most viewed pages, each hyperlinked with the relevant URL.
Note: Although you might not use all the data downloaded, it's better to have more data on hands. You might be able to use it to help you produce the narrative section of the report. For example, it might be interesting for your team or the donor to know how much time on average users spent on one of your most popular page over the reporting period.
If your organization reports most visited pages, grouped by OD content types, on a regular basis (monthly or quarterly), you may combine the monthly / quarterly report rather than downloading and transforming data of a longer temporal range.
What are 'external' domains? OD Platform hosts 6 websites, each of which might have more than one URL (i.e. both cambodia.opendevelopmentmekong.net and opendevelopmentcambodia.net take users to the ODC platform). The 6 front-ends, which technically are regarded by Google crawler as separate domains, are interconnected and makeing linkages to one another (referrals). To demonstrate how others (non-OD platform) have linked to content on an OD instance, we must take out links from within the OD family.
You should see:
For example:
By going to Google Analytics > Acquisition > Referrals, you can see that for the reporting period (October 1, 2016 - September 30, 2017), you can see that:
By going to LTYS report for opendevelopmentmekong.net < click “More …” under Who links the most < Search for the Land Portal and the Mekong Eye, you would see:
* The Mekong Eye have linked to 2 ODM pages and it exposed a link to these 2 pages on 7322 of its own pages. The ODM homepage has been linked to 7,306 pages on the Mekong Eye.
Note: Depending on what analysis you need, you might need to consolidate data for an external domain from the OD Datahub in order to demonstrate how a data partner is linking to the site. For example, the Land Portal is a data partner and have linked more to the OD Datahub rather than OD Mekong site. It has linked to 69 datasets on OD Datahub and has exposed these links on 169 of its web pages.
To show how one OD platform might benchmark against another, the following demonstration will analyze and compare data from OD Mekong, ODC, ODMm, and OD Datahub as an example. Those working on an OD country instance may download only data for their respective site. If you want to access data from another OD instance, please contact the administrator of that country site.
The raw data will be stored here and the analysis will be conducted on this centralized Google Sheet
Step 1: Download the data
Links to Your Site data was downloaded on November 15, 2017 for analysis for this guide.
Step 3: Determine institutional user groups and domain extension
The domains data can be classified into:
For external domains, we want to identify the following institutional user groups:
Domain extension can be identified and classified. The following assumptions are made for this analysis:
Some CSOs may have a .com domain (e.g. sahrika.com). Some media organization / newsroom may have a .org or .net domain. Academia might have a .net domain (e.g. researchgate.net). Thus, using this transformation method, the number of CSOs or media websites linking to OD platform might be skewed. Try your best to identify these and document your assumptions. A domain should only be assigned to one user group. Since LTYS data only offers a sample of links, this analysis should be accepted as it as: insights on which pages have been hyperlinked to by external domains, indicating their popularity and amplifying the reach of the OD platform to users of these domains. The data generally reveals a broad group of users from government, media, and civil society.
Step 4: Transform the data
Note: Random entity might have a .net extension. They shouldn't be classified as civil society. Add them to “Other” category. .com domains are not very useful for this analysis and will also be classified as “Other”.
Step 5: Visualize and present the data
You may present the data as a data table or in a graphic presentation.
We are often asked if government agencies have used data offered on OD Platform. LTYS data analyzed above sheds light on which government institutions have found our content useful enough to link it to their website.
For example: The Ministry of Commerce of Cambodia have linked to three pages on ODC.
On ODC, MoC has linked to three pages, each displaying all content which has been tagged with “fdi” (Foreign Direct Investment), “construction-industry”, and “rubber-export”. ODC uses these keywords to tag relevant news article curated on the site. This indicate that some staffer at the MOC has using ODC website to browse news and to conduct research. Clicking on the “fdi” tag, we can see that MOC has been referencing this tag page in multiple of its report.
LTYS report also offers data on most linked pages for each OD Platform. The data is accessible via LTYS report > Your most linked content < “More”
Source domains is an important indicator. It tells you how many websites have hyperlinked to a certain OD page.
Since OD Platforms, each with a different URL, are regarded by Google crawlers as external website, LTYS data also include linkages from other OD instances. To Truly present linkages from 'external' domains, the data needs to be adjusted. In the following example, ODMm has hyperlinked to ODM Land page. Thus, number of source domains linking to ODM Land page needs to be reduced by 1 and the number of links needs to be reduced by 4.
Step 1: Download the data
Note that the page URLs already contains information about OD content type. Fore example /topic/ = Topic page, /updates/ = Site updates etc. Editors can easily verify these markers with the custom-post types on WordPress.
Step 2: Transform data
Why you shouldn't use unadjusted data:
From the data above we can see which content types have been hyperlinked the most by external domains.
We can also see which topic pages are the most linked.
However the number of source domains hyperlinking to each page maybe over reported since the figures might contain hyperlinks from other OD instance. This problem is of a greater concern to ODC since the site has been operational longer.
Step 2: Adjust the data
Since we need to look up each page one by one on LTYS in order to find out if other ODC instances have linked that specific page, it's best to clearly identify a small set of pages to look up.
Using the same method with ODC data, remove figures for linkages from ODM and ODMm.
Before adjustment:
After adjustment:
Step 2: Visualize and present the data
By filtering the LTYS data further we found that the most linked content type for ODM were the topic page, with the Land page the most linked topic by external domains. It recorded 11 external domains who hyperlinked at least 5 times on average to the land page.
For ODC the most linked content types were the profile pages, which continues to be the Economic Land Concession, Mining and Natural Protected Areas datasets, which have been periodically updated throughout the year. This highlights the demand for detailed national level datasets and the uniqueness of our platform to offer these.
In addition to Google Anlytics and Links to Your Site, relevant figures from CKAN should also be included. CKAN offers statistics on Total number of datasets and Top rated datasets which can be pulled directly from CKAN Stat page without additional coding.
In addition to these two indicators, figures on Most viewed datasets and Most downloaded datasets, disaggregated by Topics and OD Country should also be included.
As part of the already completed milestone 2.3.0 improvements on layout of dataset detail page, we have implemented a mechanism which tracks following Events on Google Analytics: