Chapter 11 Point Pattern Analysis
11.1 Centrography
A very basic form of point pattern analysis involves summary statistics such as the mean center, standard distance and standard deviational ellipse.
These point pattern analysis techniques were popular before computers were ubiquitous since hand calculations are not too involved, but these summary statistics are too concise and hide far more valuable information about the observed pattern. More powerful analysis methods can be used to explore point patterns. These methods can be classified into two groups: density based approach and distance based approach.
11.2 Density based analysis
A first order property of a pattern concerns itself with the variation of the observations’ density across a study area. For example, the distribution of oaks will vary across a landscape based on underlying soil characteristics (resulting in areas having dense clusters of oaks and other areas not).
Density based techniques characterize the pattern in terms of its distribution visavis the study area–a firstorder property of the pattern. Here, we make a distinction between the intensity of a spatial process and the observed density of a pattern under study (which is often used to estimate the process’ intensity). A point pattern can be thought of as a “realization” of an underlying process whose intensity \(\lambda\) is estimated from the observed point pattern’s density (which is sometimes denoted as \(\widehat{\lambda}\) where the caret \(\verb!^!\) is referring to the fact that the observed density is an estimate of the underlying process’ intensity) .
11.2.1 Global density
A basic measure of a pattern’s density \(\widehat{\lambda}\) is its overall, or global, density. This is simply the ratio of observed number of points, \(n\), to the study region’s surface area, \(a\), or:
\[ \begin{equation} \widehat{\lambda} = \frac{n}{a} \label{eq:globaldensity} \end{equation} \]
11.2.2 Local density
A point pattern’s density can be measured at different locations within the study area. Such an approach helps us assess if the density–and, by extension, the underlying process’ local intensity \(\widehat{\lambda}_i\)–is constant across the study area. This can be an important property of the data since it may need to be mitigated for when using distance based analysis tools. Several techniques to measure local density are available, here we will focus on two such methods: quadrat density and kernel density.
11.2.2.1 Quadrat density
This technique requires that the study area be divided into subregions (aka quadrats). Then, the point density is computed for each quadrat by dividing the number of points in each quadrat by the quadrat’s area. Quadrats can take on many different shapes such as hexagons and triangles, here we use square shaped quadrats to demonstrate the procedure.
The choice of quadrat numbers and quadrat shape can influence the measure of local density and must be chosen with care. If very small quadrat sizes are used you risk having many quadrats with no points which may prove uninformative. If very large quadrat sizes are used, you risk missing subtle changes in spatial density distributions such as the eastwest gradient in density values in the above example.
Quadrat regions do not have to take on a uniform pattern across the study area, they can also be defined based on a covariate. For example, if it’s believed that the underlying point pattern process is driven by elevation, quadrats can be defined by subregions such as different ranges of elevation values (labeled 1 through 4 on the righthand plot in the following example). This can result in quadrats having nonuniform shape and area.
Converting a continuous field into discretized areas is sometimes referred to as tesselation. The end product is a tesselated surface.
If the local intensity changes across all regions of the tessellated covariate as a function of the covariate, then there is evidence of a dependence between the process that generated the point pattern and the covariate. In our example, subregions 1 through 4 have surface areas of 17.08, 50.45, 26.76, 5.71 respectively. To compute these regions’ point densities, we simply divide the number of points by the area.
We can plot the relationship between point density and elevation regions to help assess any dependence between the variables.
Though there is a steep increase in density at the highest elevation range, this increase is not monotonic across all ranges of increasing elevation suggesting that density may not be explained by elevation alone.
Note that how one chooses to tessellate a surface can have an influence on the resulting density values. For example, dividing the elevation into equal area subregions produces different density values.
11.2.2.2 Kernel density
The kernel density approach is an extension of the quadrat method: Like the quadrat density, the kernel approach computes a localized density for subsets of the study area, but unlike its quadrat density counterpart, the subregions overlap one another providing a moving subregion window. This moving window is defined by a kernel. Unlike its quadrat density counterpart, the kernel density approach generates a grid of density values whose cell size is smaller than that of the kernel window. Each cell is assigned the density value computed for the kernel window centered on that cell. Many software applications will usually generate a raster output to store the density values.
A kernel not only defines the shape and size of the window, but it can also weight the points following a well defined kernel function. The simplest of functions is a basic kernel where each point in the kernel window is assigned equal weight.
Some of the most popular kernel functions assign weights to points that are inversely proportional to their distances to the kernel window center. A few such kernel functions follow a gaussian or quartic like distribution function. These functions tend to produce a smoother density map.
11.2.2.3 Kernel Density Adjusted for Covariate
In the previous section, we learned that we could use a covariate, like elevation, to define the subregions (quadrats) within which densities were computed. In essence a form of normalization was applied to the density calculations whereby each subregion was assumed to represent a unique underlying process (if this were the case, then the density values would have been the same across each subregion). The idea of normalizing the data to some underlying covariate can be extended to kernel density analysis. Here, instead of dividing the study region into discrete subregions (as was done with quadrat analysis), we normalize the computed density (from our observed point pattern) to a measure of density expected from the underlying covariate. This normalized density, which we’ll denote as \(\rho\), can be estimated in one of three different ways– by ratio, reweight and transform methods. We will not delve into the differences between these methods, but note that there is more than one way to estimate \(\rho\).
If the density, \(\rho\), can be explained by the elevation, we would expect a constant \(\rho\) value across all elevation values (figure on the left) and across the spatial extent (figure on the right). This appears to be the case with our working example except near the upper righthand corner of the spatial extent where the elevation values are the highest. This can be explained by the small area covered by these high elevation locations which result in fewer observed points and thus higher uncertainty for that corner of the study extent. This is very apparent in the left figure where the 95% confidence interval envelope widens at higher elevation values (indicating the greater uncertainty in our estimated \(\rho\) value at the higher elevation).
11.2.3 Modeling intensity as a function of a covariate
So far, we have learned techniques that describe the distribution of points across a region of interest. But it is often more interesting to model the relationship between the distribution of points and some underlying covariate by defining that relationship mathematically. This can be done by exploring the changes in point density as a function of a covariate, however, unlike techniques explored thus far, this approach makes use of a statistical model. One such model is a Poisson point process model which can take on the form of:
\[ \begin{equation} \lambda(i) = e^{\alpha + \beta Z(i)} \label{eq:densitycovariate} \end{equation} \]
The lefthand side of a logisitic regression model is often presented as the probability, P, of occurance and is related to λ as \(\lambda=P/(1P)\) which is the ratio of probability of occurence. Solving for P gives us \(P = \lambda/(1 + \lambda)\) which yields the following equation:
\[
P(i) = \frac{e^{\alpha + \beta Z(i)}}{1 + e^{\alpha + \beta Z(i)}}
\]
where \(\lambda(i)\) is the modeled intensity at location \(i\), \(e^{\alpha}\) (the exponent of \(\alpha\)) is the base intensity when the covariate is zero and \(e^{\beta}\) is the multiplier by which the intensity increases (or decreases) for each 1 unit increase in the covariate. This is a form of the logistic regression model–popular in the field of statistics. This equation implies that the relationship between the process that lead to the observed point pattern is a loglinear function of the underlying covariate (i.e. one where the process’ intensity is exponentially increasing or decreasing as a function of the covariate). Note that taking the log of both sides of the equation yields the more familiar linear regression model where \(\alpha + \beta Z(i)\) is the linear predictor.
Let’s work with the point distribution of Starbucks cafes in the state of Massachusetts. The point pattern clearly exhibits a nonrandom distribution. It might be helpful to compare this distribution to some underlying covariate such as the population density distribution.
We can fit a poisson point process model to these data where the modeled intensity takes on the form:
\[ \begin{equation} Starbucks\ density(i) = e^{\alpha + \beta\ population(i)} \label{eq:walmartmodel} \end{equation} \]
The parameters \(\alpha\) and \(\beta\) are estimated from a method called maximum likelihood. Its implementation is not covered here but is widely covered in many statistics text books. And the term \((i)\) serves as a reminder that the point density and the population distribution both can vary as a function of location \(i\).
The estimated value for \(\alpha\) is 18.966. This is interpreted as stating that given a population density of zero, the base intensity of the point process is e^{18.966} or 5.79657e09–a number close to zero (as one would expect). The estimated value for \(\beta\) is 0.00017. This is interpretated as stating that for every unit increase in population density (one person per square mile), the intensity of the point process increases by e^{0.00017} or 1.00017.
If one were to plot the relationship between density and population, we would get:
11.3 Distance based analysis
A second order property of a pattern concerns itself with the observations’ influence on one another. For example, the distribution of oaks will be influenced by the location of parent trees: where parent oaks are present we would expect dense clusters of oaks to emerge.
An alternative to density based methods is distance based methods for pattern analysis where the interest lies in how the points are distributed relative to one another–a secondorder property of the pattern.
11.3.1 Average Nearest Neighbor
An average nearest neighbor (ANN) analysis measures the average distance from each point in the study area to its nearest point. In the following example, the average nearest neighbor for all points is 1.52 units.
An extension of this idea is to plot the ANN values for different order neighbors, that is for the first closest point, then the second closest point, and so forth.
The shape of the ANN curve as a function of neighbor order can provide insight into the spatial arrangement of points relative to one another. In the following example, three different point patterns of 20 points are presented.
Each point pattern offers different ANN vs. neighbor order plots.
The bottom line (black line) indicates that the cluster (left plot) is tight and that the distances between a point and all other points is very short. This is in stark contrast with the top line (red line) which indicates that the distances between points is much greater. Note that the way we describe these patterns is heavily influenced by the size and shape of the study region. If the region was defined as the smallest rectangle encompassing the cluster of points, the cluster of points would no longer look clustered.
11.3.2 K and L functions
11.3.2.1 K function
The average nearest neighbor (ANN) statistic is one of many point pattern analysis statistics. Another statistic is the Kfunction which summarizes the distance between points for all distances. The calculation of K is fairly simple: it consists of dividing the mean sum of the number of points at different distance lags for each point by the area event density. For example, for point \(S1\) we draw circles, each of varying radius \(d\), centered on that point. We then count the number of points (events) inside each circle. We repeat this for point \(S2\) and all other points \(Si\). Next, we compute the average number of points in each circle then divide that number by the overall point density \(\hat{\lambda}\) (i.e. total number of events per study area).

We can then plot K and compare that plot to a plot we would expect to get if an IRP/CSR process was at play (K_{expected}).
Kfunction calculated from the Walmart stores point distribution in MA (shown in black) compared toK_{expected} under the IRP/CSR assumption (shown in red).
K values greater than \(K_{expected}\) indicates a greater number of points at that distance (more clustered); K values less than \(K_{expected}\) indicates fewer number of points at that distance (more dispersed).
Note that like the ANN analysis, the \(K\)function assumes stationarity in the underlying point process (i.e. that there is no overall drift or trend in the process’ intensity).
11.3.2.2 L function
One problem with the \(K_{expected}\) is that the shape of the function tends to curve upward making it difficult to see small differences between K and \(K_{expected}\). A workaround is to transform the values in such a way that the expected values, \(K_{expected}\), lie horizontal. The transformation is calculated as follows:
\[ \begin{equation} L=\sqrt{\dfrac{K(d)}{\pi}}d \label{eq:Lfunction} \end{equation} \]
The above \(\hat{K}\) is transformed to the following plot (note how the \(K_{expecetd}\) red line is now perfectly horizontal)
11.3.3 The Pair Correlation Function \(g\)
A shortcoming of the \(K\) function (and by extension the \(L\) function) is its cumulative nature which makes it difficult to know at exactly which distances a point pattern may stray from \(K_{expected}\) since all points up to distance \(r\) can contribute to \(K(r)\). The pair correlation function, \(g\), is a modified version of the \(K\) function where instead of summing all points within a distance \(r\), points falling within a narrow distance band are summed instead.
The plot and interpretation of the \(g\) function differs from that of the \(K\) and \(L\) functions.
If \(g(r)\) = 1, then the interpoint distances (at and around distance \(r\)) are consistant with CSR. If \(g(r)\) > 1, then the points are more clustered than expected under CSR. If \(g(r)\) < 1, then the points are more dispersed than expected under CSR. Note that \(g\) can never be less than 0.
Like its \(K\) counterpart, the \(g\)function assumes stationarity in the underlying point process (i.e. that there is no overall drift or trend in the process’ intensity).
11.4 First and second order effects
The concept of 1^{st} order effects and 2^{nd} order effects is an important one. It underlies the basic principles of spatial analysis.
Density based measurements such as kernel density estimations look at the 1^{st} order property of the underlying process. Distance based measurements such as ANN and Kfunctions focus on the 2^{nd} order property of the underlying process.
It’s important to note that it is seldom feasible to separate out the two effects when analyzing the data, thus the importance on relying on a priori knowledge of the phenomena being investigated before drawing any conclusions from the analyses results.