Geospatial analysis usually entails the comparison between different data sets; more often than not, the focus of such an analysis lies in the exploration of spatial correlation between ‘activities’. For example, assume data about the distribution of a certain industrial activity is available to a researcher who wants to determine the optimal location for a supplier type business venture. The data concerning the demand-side (buyers) is not sufficient for analysis and can lead to faulty estimates. Rather, the researcher needs to add the competitors’ locations to his analysis (supply-side) in order to avoid settling the venture in over-saturated locations that may hinder growth and eventually lead to failure. Below, I present a technique for data distribution comparison, that can be used to override such issues and supplant a much more robust analysis of data.
The implementation of the method uses QGIS, with the heatmap and raster calculator tools, and assumes that both supply-and-demand-side data is available. Of course, heatmaps are not necessarily the input rasters, and the technique can be used to compare distributions between all sorts of raster type files.
First, you will need to create two heatmaps, using the ‘Heatmap’ plugin. Follow the instructions here, until you get the two raster type (.tif) files, containing kernel density estimations in the grid cell data. What you may want to do, if you have a particular wealth of data available to you, is set weights to the data input of the ‘heatmap’ plugin, for example by setting the industry ‘production’ field as the parameter for the creation of the grid cell data. In this stage, remember to set common cell size for both heatmaps.
Then, we will need to normalize the grid cell data contained in the heatmaps. In order to do this, we will need to determine the current maximum and minimum values contained in the heatmaps that are going to be used. This is fairly simple: double-click on the heatmap layer name on the layer panel (usually on the left of the screen) or right click on the name and select properties.
Click on the ‘metadata’ option on the left of the open window, and QGIS calculates the maximum and minimum values on the bottom window. Note the two numbers down, as max and min.
Then, you will need to use the raster calculator tool for a series of calculations. Open the raster calculator tool, found under the raster menu in the toolbar, as shown in the image below.
Insert the normalization formula (shown below) in the calculation box, and set name and directory of the output. For the example, we set 100 as the new maximum and 0 as the new minimum (old maximum was 592.19445800781 and old minimum was 0).
Repeat the heatmap normalization process for the second heatmap, setting the same maximum and minimum as the first one. As you can see in the metadata, the new values contained in the raster have a new range, with a maximum of 100 and a minimum of 0.
Then, open raster calculator again, and subtract the supply heatmap from the demand heatmap, as shown below.
The resulting heatmap theoretically contains values ranging from -maximum to +maximum (in this case -100 to +100), located at points where the one matrix is maximum and the second matrix is minimum and vice versa.
You can now easily distinguish the locations where the two distributions converge (values close to 0) and where they diverge (values closer to the extreme ends).
In our example, after styling the heatmap, we can create a highly intuitive map that provides a graphical comparison between said distributions. Of course, you can use the heatmap for further calculations; however, note that the normalization of the data makes it suitable only for relativistic rather than absolute statistical analysis. If you wish to do the latter, use the two heatmaps that were created first, before normalization, as values for the grid cells will definitely be different in range and value scale.