Chapter 11 Point Pattern Analysis
11.1 Centrography
A very basic form of point pattern analysis involves summary statistics such as the mean center, standard distance and standard deviational ellipse.
These point pattern analysis techniques were popular before computers were ubiquitous since hand calculations are not too involved, but these summary statistics are too concise and hide far more valuable information about the observed pattern. More powerful analysis methods can be used to explore point patterns. These methods can be classified into two groups: density based approach and distance based approach.
11.2 Density based analysis
Density based techniques characterize the pattern in terms of its distribution visavis the study area–a firstorder property of the pattern.
A first order property of a pattern concerns itself with the variation of the observations’ density across a study area. For example, the distribution of oaks will vary across a landscape based on underlying soil characteristics (resulting in areas having dense clusters of oaks and other areas not).
In these lecture notes, we’ll make a distinction between the intensity of a spatial process and the observed density of a pattern under study. A point pattern can be thought of as a “realization” of an underlying process whose intensity \(\lambda\) is estimated from the observed point pattern’s density (which is sometimes denoted as \(\widehat{\lambda}\) where the caret \(\verb!^!\) is referring to the fact that the observed density is an estimate of the underlying process’ intensity). Density measurements can be broken down into two categories: global and local.
11.2.1 Global density
A basic measure of a pattern’s density \(\widehat{\lambda}\) is its overall, or global, density. This is simply the ratio of observed number of points, \(n\), to the study region’s surface area, \(a\), or:
\(\begin{equation} \widehat{\lambda} = \frac{n}{a} \label{eq:globaldensity} \end{equation}\)
11.2.2 Local density
A point pattern’s density can be measured at different locations within the study area. Such an approach helps us assess if the density–and, by extension, the underlying process’ local (modeled) intensity \(\widehat{\lambda}_i\)–is constant across the study area. This can be an important property of the data since it may need to be mitigated for when using the distance based analysis tools covered later in this chapter. Several techniques for measuring local density are available, here we will focus on two such methods: quadrat density and kernel density.
11.2.2.1 Quadrat density
This technique requires that the study area be divided into subregions (aka quadrats). Then, the point density is computed for each quadrat by dividing the number of points in each quadrat by the quadrat’s area. Quadrats can take on many different shapes such as hexagons and triangles, here we use square shaped quadrats to demonstrate the procedure.
The choice of quadrat numbers and quadrat shape can influence the measure of local density and must be chosen with care. If very small quadrat sizes are used you risk having many quadrats with no points which may prove uninformative. If very large quadrat sizes are used, you risk missing subtle changes in spatial density distributions such as the eastwest gradient in density values in the above example.
Quadrat regions do not have to take on a uniform pattern across the study area, they can also be defined based on a covariate. For example, if it’s believed that the underlying point pattern process is driven by elevation, quadrats can be defined by subregions such as different ranges of elevation values (labeled 1 through 4 on the righthand plot in the following example). This can result in quadrats having nonuniform shape and area.
Converting a continuous field into discretized areas is sometimes referred to as tessellation. The end product is a tessellated surface.
If the local intensity changes across the tessellated covariate, then there is evidence of a dependence between the process that generated the point pattern and the covariate. In our example, subregions 1 through 4 have surface areas of 17.08, 50.45, 26.76, 5.71 map units respectively. To compute these regions’ point densities, we simply divide the number of points by the respective area values.
We can plot the relationship between point density and elevation regions to help assess any dependence between the variables.
Though there is a steep increase in density at the highest elevation range, this increase is not monotonic across all ranges of increasing elevation suggesting that density may not be explained by elevation alone.
It’s important to note that how one chooses to tessellate a surface can have an influence on the resulting density distribution. For example, dividing the elevation into equal area subregions produces the following density values.
While the high density in the western part of the study area remains, the density values to the east are no longer consistent across the other three regions.
The quadrat analysis approach has its advantages in that it is easy to compute and interpret however, it does suffer from the modifiable areal unit problem (MAUP) as highlighted in the last two examples. Another density based approach that will be explored next (and that is less susceptible to the MAUP) is the kernel density.
11.2.2.2 Kernel density
The kernel density approach is an extension of the quadrat method: Like the quadrat density, the kernel approach computes a localized density for subsets of the study area, but unlike its quadrat density counterpart, the subregions overlap one another providing a moving subregion window. This moving window is defined by a kernel. The kernel density approach generates a grid of density values whose cell size is smaller than that of the kernel window. Each cell is assigned the density value computed for the kernel window centered on that cell.
A kernel not only defines the shape and size of the window, but it can also weight the points following a well defined kernel function. The simplest function is a basic kernel where each point in the kernel window is assigned equal weight.
Some of the most popular kernel functions assign weights to points that are inversely proportional to their distances to the kernel window center. A few such kernel functions follow a gaussian or quartic like distribution function. These functions tend to produce a smoother density map.
11.2.2.3 Kernel Density Adjusted for Covariate
In the previous section, we learned that we could use a covariate, like elevation, to define the subregions (quadrats) within which densities were computed.
Here, instead of dividing the study region into discrete subregions (as was done with quadrat analysis), we create an intensity function that is dependent on the underlying covariate. This function, which we’ll denote as \(\rho\), can be estimated in one of three different ways– by ratio, reweight and transform methods. We will not delve into the differences between these methods, but note that there is more than one way to estimate \(\rho\) in the presence of a covariate.
In the following example, the elevation raster is used as the covariate in the \(\rho\) function using the ratio method. The rightmost plot maps the modeled intensity as a function of elevation.
We can compare the modeled intensity function to the kernel density function of the observed point pattern via a scatter plot.
A red onetoone diagonal line is added to the plot. While an increase in predicted intensity is accompanied with increasing observed intensity, the relationship is not linear. This can be explained by the small area covered by these high elevation locations which result in fewer observation opportunities and thus higher uncertainty for that corner of the study extent. This uncertainty is very apparent in the \(\rho\) vs. elevation plot where the 95% confidence interval envelope widens at higher elevation values (indicating the greater uncertainty in our estimated \(\rho\) value at those higher elevation values).
11.2.3 Modeling intensity as a function of a covariate
So far, we have learned techniques that describe the distribution of points across a region of interest. But it is often more interesting to model the relationship between the distribution of points and some underlying covariate by defining that relationship mathematically. This can be done by exploring the changes in point density as a function of a covariate, however, unlike techniques explored thus far, this approach makes use of a statistical model. One such model is a Poisson point process model which can take on the form of:
\[ \begin{equation} \lambda(i) = e^{\alpha + \beta Z(i)} \label{eq:densitycovariate} \end{equation} \]
where \(\lambda(i)\) is the modeled intensity at location \(i\), \(e^{\alpha}\) (the exponent of \(\alpha\)) is the base intensity when the covariate is zero and \(e^{\beta}\) is the multiplier by which the intensity increases (or decreases) for each 1 unit increase in the covariate \(Z(i)\). This is a form of the logistic regression model–popular in the field of statistics. This equation implies that the relationship between the process that lead to the observed point pattern is a loglinear function of the underlying covariate (i.e. one where the process’ intensity is exponentially increasing or decreasing as a function of the covariate). Note that taking the log of both sides of the equation yields the more familiar linear regression model where \(\alpha + \beta Z(i)\) is the linear predictor.
Note: The lefthand side of a logistic regression model is often presented as the probability, \(P\), of occurrence and is related to \(\lambda\) as \(\lambda=P/(1P)\) which is the ratio of probability of occurrence. Solving for \(P\) gives us \(P = \lambda/(1 + \lambda)\) which yields the following equation: \[ P(i) = \frac{e^{\alpha + \beta Z(i)}}{1 + e^{\alpha + \beta Z(i)}} \]
Let’s work with the point distribution of Starbucks cafes in the state of Massachusetts. The point pattern clearly exhibits a nonrandom distribution. It might be helpful to compare this distribution to some underlying covariate such as the population density distribution.
We can fit a poisson point process model to these data where the modeled intensity takes on the form:
\[ \begin{equation} Starbucks\ density(i) = e^{\alpha + \beta\ population(i)} \label{eq:walmartmodel} \end{equation} \]
The parameters \(\alpha\) and \(\beta\) are estimated from a method called maximum likelihood. Its implementation is not covered here but is widely covered in many statistics text books. The index \((i)\) serves as a reminder that the point density and the population distribution both can vary as a function of location \(i\).
The estimated value for \(\alpha\) in our example is 18.966. This is interpreted as stating that given a population density of zero, the base intensity of the point process is e^{18.966} or 5.79657e09 cafes per square meter (the units are derived from the point’s reference system)–a number close to zero (as one would expect). The estimated value for \(\beta\) is 0.00017. This is interpreted as stating that for every unit increase in the population density derived from the raster, the intensity of the point process increases by e^{0.00017} or 1.00017.
If we are to plot the relationship between density and population, we get:
11.3 Distance based analysis
An alternative to the density based methods explored thus far are the distance based methods for pattern analysis whereby the interest lies in how the points are distributed relative to one another (a secondorder property of the point pattern) as opposed to how the points are distributed relative to the study extent
A second order property of a pattern concerns itself with the observations’ influence on one another. For example, the distribution of oaks will be influenced by the location of parent trees–where parent oaks are present we would expect dense clusters of oaks to emerge.
Three distance based approaches are covered next: The average nearest neighbor (ANN), the K and L functions, and the pair correlation function.
11.3.1 Average Nearest Neighbor
An average nearest neighbor (ANN) analysis measures the average distance from each point in the study area to its nearest point. In the following example, the average nearest neighbor for all points is 1.52 units.
An extension of this idea is to plot the ANN values for different order neighbors, that is for the first closest point, then the second closest point, and so forth.
The shape of the ANN curve as a function of neighbor order can provide insight into the spatial arrangement of points relative to one another. In the following example, three different point patterns of 20 points are presented.
Each point pattern offers different ANN vs. neighbor order plots.
The bottom line (black dotted line) indicates that the cluster (left plot) is tight and that the distances between a point and all other points is very short. This is in stark contrast with the top line (red dotted line) which indicates that the distances between points is much greater. Note that the way we describe these patterns is heavily influenced by the size and shape of the study region. If the region was defined as the smallest rectangle encompassing the cluster of points, the cluster of points would no longer look clustered.
An important assumption that underlies our interpretation of the ANN results is that of stationarity of the underlying point process (i.e. that there is no overall drift or trend in the process’ intensity). If the point is not stationary, then it will be difficult to assess if the results from the ANN analysis are due to interactions between the points or due to changes in some underlying factor that changes as a function of location.
11.3.2 K and L functions
11.3.2.1 K function
The average nearest neighbor (ANN) statistic is one of many distance based point pattern analysis statistics. Another statistic is the Kfunction which summarizes the distance between points for all distances. The calculation of K is fairly simple: it consists of dividing the mean of the sum of the number of points at different distance lags for each point by the area event density. For example, for point \(S1\) we draw circles, each of varying radius \(d\), centered on that point. We then count the number of points (events) inside each circle. We repeat this for point \(S2\) and all other points \(Si\). Next, we compute the average number of points in each circle then divide that number by the overall point density \(\hat{\lambda}\) (i.e. total number of events per study area).

We can then plot K and compare that plot to a plot we would expect to get if an IRP/CSR process was at play (K_{expected}).
\(K\) values greater than \(K_{expected}\) indicate clustering of points at a given distance band; K values less than \(K_{expected}\) indicate dispersion of points at a given distance band. In our example, the stores appear to be more clustered than expected at distances greater than 12 km.
Note that like the ANN analysis, the \(K\)function assumes stationarity in the underlying point process (i.e. that there is no overall drift or trend in the process’ intensity).
11.3.2.2 L function
One problem with the \(K\) function is that the shape of the function tends to curve upward making it difficult to see small differences between \(K\) and \(K_{expected}\). A workaround is to transform the values in such a way that the expected values, \(K_{expected}\), lie horizontal. The transformation is calculated as follows:
\[ \begin{equation} L=\sqrt{\dfrac{K(d)}{\pi}}d \label{eq:Lfunction} \end{equation} \]
The \(\hat{K}\) computed earlier is transformed to the following plot (note how the \(K_{expected}\) red line is now perfectly horizontal):
This graph makes it easier to compare \(K\) with \(K_{expected}\) at lower distance values. Values greater than \(0\) indicate clustering, while values less than \(0\) indicate dispersion. It appears that Walmart locations are more dispersed than expected under CSR/IRP up to a distance of 12 km but more clustered at distances greater than 12 km.
11.3.3 The Pair Correlation Function \(g\)
A shortcoming of the \(K\) function (and by extension the \(L\) function) is its cumulative nature which makes it difficult to know at exactly which distances a point pattern may stray from \(K_{expected}\) since all points up to distance \(r\) can contribute to \(K(r)\). The pair correlation function, \(g\), is a modified version of the \(K\) function where instead of summing all points within a distance \(r\), points falling within a narrow distance band are summed instead.
The plot of the \(g\) function follows.
If \(g(r)\) = 1, then the interpoint distances (at and around distance \(r\)) are consistent with CSR. If \(g(r)\) > 1, then the points are more clustered than expected under CSR. If \(g(r)\) < 1, then the points are more dispersed than expected under CSR. Note that \(g\) can never be less than 0.
Like its \(K\) and ANN counterparts, the \(g\)function assumes stationarity in the underlying point process (i.e. that there is no overall drift or trend in the process’ intensity).
11.4 First and second order effects
The concept of 1^{st} order effects and 2^{nd} order effects is an important one. It underlies the basic principles of spatial analysis.
Density based measurements such as kernel density estimations look at the 1^{st} order property of the underlying process. Distance based measurements such as ANN and Kfunctions focus on the 2^{nd} order property of the underlying process.
It’s important to note that it is seldom feasible to separate out the two effects when analyzing point patterns, thus the importance of relying on a priori knowledge of the phenomena being investigated before drawing any conclusions from the analyses results.