Chapter 11 Point Pattern Analysis

11.1 Centrography

A very basic form of point pattern analysis involves summary statistics such as the mean center, standard distance and standard deviational ellipse.

These point pattern analysis techniques were popular before computers were ubiquitous since hand calculations are not too involved, but these summary statistics are too concise and hide far more valuable information about the observed pattern. More powerful analysis methods can be used to explore point patterns. These methods can be classified into two groups: density based approach and distance based approach.

11.2 Density based analysis

A first order property of a pattern concerns itself with the variation of the observations’ density across a study area. For example, the distribution of oaks will vary across a landscape based on underlying soil characteristics (resulting in areas having dense clusters of oaks and other areas not).

Density based techniques characterize the pattern in terms of its distribution vis-a-vis the study area–a first-order property of the pattern. Here, we make a distinction between the intensity of a spatial process and the observed density of a pattern under study (which is often used to estimate the process’ intensity). A point pattern can be thought of as a “realization” of an underlying process whose intensity \(\lambda\) is estimated from the observed point pattern’s density (which is sometimes denoted as \(\widehat{\lambda}\) where the caret \(\verb!^!\) is referring to the fact that the observed density is an estimate of the underlying process’ intensity) .

11.2.1 Global density

A basic measure of a pattern’s density \(\widehat{\lambda}\) is its overall, or global, density. This is simply the ratio of observed number of points, \(n\), to the study region’s surface area, \(a\), or:

\[ \begin{equation} \widehat{\lambda} = \frac{n}{a} \label{eq:global-density} \end{equation} \]

An example of a point pattern where n = 20 and the study area (defined by a square boundary) is 10 units squared. The point density is thus 20/100 = 0.2 points per unit area.

Figure 11.1: An example of a point pattern where n = 20 and the study area (defined by a square boundary) is 10 units squared. The point density is thus 20/100 = 0.2 points per unit area.

11.2.2 Local density

A point pattern’s density can be measured at different locations within the study area. Such an approach helps us assess if the density–and, by extension, the underlying process’ local intensity \(\widehat{\lambda}_i\)–is constant across the study area. This can be an important property of the data since it may need to be mitigated for when using distance based analysis tools. Several techniques to measure local density are available, here we will focus on two such methods: quadrat density and kernel density.

11.2.2.1 Quadrat density

This technique requires that the study area be divided into sub-regions (aka quadrats). Then, the point density is computed for each quadrat by dividing the number of points in each quadrat by the quadrat’s area. Quadrats can take on many different shapes such as hexagons and triangles, here we use square shaped quadrats to demonstrate the procedure.

An example of a quadrat count where the study area is divided into four equally sized quadrats whose area is 25 square units each. The density in each quadrat can be computed by dividing the number of points in each quadrat by that quadrat's area.

Figure 11.2: An example of a quadrat count where the study area is divided into four equally sized quadrats whose area is 25 square units each. The density in each quadrat can be computed by dividing the number of points in each quadrat by that quadrat’s area.

The choice of quadrat numbers and quadrat shape can influence the measure of local density and must be chosen with care. If very small quadrat sizes are used you risk having many quadrats with no points which may prove uninformative. If very large quadrat sizes are used, you risk missing subtle changes in spatial density distributions such as the east-west gradient in density values in the above example.

Quadrat regions do not have to take on a uniform pattern across the study area, they can also be defined based on a covariate. For example, if it’s believed that the underlying point pattern process is driven by elevation, quadrats can be defined by sub-regions such as different ranges of elevation values (labeled 1 through 4 on the right-hand plot in the following example). This can result in quadrats having non-uniform shape and area.

Converting a continuous field into discretized areas is sometimes referred to as tesselation. The end product is a tesselated surface.

Example of a covariate. Figure on the left shows the elevation map. Figure on the right shows elevation broken down into four sub-regions for which a local density will be computed.

Figure 11.3: Example of a covariate. Figure on the left shows the elevation map. Figure on the right shows elevation broken down into four sub-regions for which a local density will be computed.

If the local intensity changes across all regions of the tessellated covariate as a function of the covariate, then there is evidence of a dependence between the process that generated the point pattern and the covariate. In our example, sub-regions 1 through 4 have surface areas of 17.08, 50.45, 26.76, 5.71 respectively. To compute these regions’ point densities, we simply divide the number of points by the area.

Figure on the left displays the number of points in each elevation sub-region (sub-regions are coded as values ranging from 1 to 4). Figure on the right shows the density of points (number of points divided by area of sub-region).

Figure 11.4: Figure on the left displays the number of points in each elevation sub-region (sub-regions are coded as values ranging from 1 to 4). Figure on the right shows the density of points (number of points divided by area of sub-region).

We can plot the relationship between point density and elevation regions to help assess any dependence between the variables.

Plot of point density vs elevation regions.

Figure 11.5: Plot of point density vs elevation regions.

Though there is a steep increase in density at the highest elevation range, this increase is not monotonic across all ranges of increasing elevation suggesting that density may not be explained by elevation alone.

Note that how one chooses to tessellate a surface can have an influence on the resulting density values. For example, dividing the elevation into equal area sub-regions produces different density values.

Same analysis as last figure using different sub-regions. Note the difference in density values.

Figure 11.6: Same analysis as last figure using different sub-regions. Note the difference in density values.

11.2.2.2 Kernel density

The kernel density approach is an extension of the quadrat method: Like the quadrat density, the kernel approach computes a localized density for subsets of the study area, but unlike its quadrat density counterpart, the sub-regions overlap one another providing a moving sub-region window. This moving window is defined by a kernel. Unlike its quadrat density counterpart, the kernel density approach generates a grid of density values whose cell size is smaller than that of the kernel window. Each cell is assigned the density value computed for the kernel window centered on that cell. Many software applications will usually generate a raster output to store the density values.

A kernel not only defines the shape and size of the window, but it can also weight the points following a well defined kernel function. The simplest of functions is a basic kernel where each point in the kernel window is assigned equal weight.

An example of a basic 3x3 kernel density map (ArcGIS calls this a point density map) where each point is assigned an equal weight. For example, the second cell from the top and left (i.e. centered at location x=1.5 and y =8.5) has one point within a 3x3 unit (pixel) region and thus has a local density of 1/9 = 0.11.

Figure 11.7: An example of a basic 3x3 kernel density map (ArcGIS calls this a point density map) where each point is assigned an equal weight. For example, the second cell from the top and left (i.e. centered at location x=1.5 and y =8.5) has one point within a 3x3 unit (pixel) region and thus has a local density of 1/9 = 0.11.

Some of the most popular kernel functions assign weights to points that are inversely proportional to their distances to the kernel window center. A few such kernel functions follow a gaussian or quartic like distribution function. These functions tend to produce a smoother density map.

An example of a kernel function is the 3x3 quartic kernel function where each point in the kernel window is weighted based on its proximity to the kernel's center cell (typically, closer points are weighted more heavily). Kernel functions, like the quartic, tend to generate smoother surfaces.

Figure 11.8: An example of a kernel function is the 3x3 quartic kernel function where each point in the kernel window is weighted based on its proximity to the kernel’s center cell (typically, closer points are weighted more heavily). Kernel functions, like the quartic, tend to generate smoother surfaces.

11.2.2.3 Kernel Density Adjusted for Covariate

In the previous section, we learned that we could use a covariate, like elevation, to define the sub-regions (quadrats) within which densities were computed. In essence a form of normalization was applied to the density calculations whereby each sub-region was assumed to represent a unique underlying process (if this were the case, then the density values would have been the same across each sub-region). The idea of normalizing the data to some underlying covariate can be extended to kernel density analysis. Here, instead of dividing the study region into discrete sub-regions (as was done with quadrat analysis), we normalize the computed density (from our observed point pattern) to a measure of density expected from the underlying covariate. This normalized density, which we’ll denote as \(\rho\), can be estimated in one of three different ways– by ratio, re-weight and transform methods. We will not delve into the differences between these methods, but note that there is more than one way to estimate \(\rho\).

An estimate of $\rho$ using the ratio method. The figure on the left shows the estimated $\rho$ as a function of elevation. The envelope shows the 95% confidence interval. The figure on the right shows the spatial distribution of $\rho$.

Figure 11.9: An estimate of \(\rho\) using the ratio method. The figure on the left shows the estimated \(\rho\) as a function of elevation. The envelope shows the 95% confidence interval. The figure on the right shows the spatial distribution of \(\rho\).

If the density, \(\rho\), can be explained by the elevation, we would expect a constant \(\rho\) value across all elevation values (figure on the left) and across the spatial extent (figure on the right). This appears to be the case with our working example except near the upper right-hand corner of the spatial extent where the elevation values are the highest. This can be explained by the small area covered by these high elevation locations which result in fewer observed points and thus higher uncertainty for that corner of the study extent. This is very apparent in the left figure where the 95% confidence interval envelope widens at higher elevation values (indicating the greater uncertainty in our estimated \(\rho\) value at the higher elevation).

11.2.3 Modeling intensity as a function of a covariate

So far, we have learned techniques that describe the distribution of points across a region of interest. But it is often more interesting to model the relationship between the distribution of points and some underlying covariate by defining that relationship mathematically. This can be done by exploring the changes in point density as a function of a covariate, however, unlike techniques explored thus far, this approach makes use of a statistical model. One such model is a Poisson point process model which can take on the form of:

\[ \begin{equation} \lambda(i) = e^{\alpha + \beta Z(i)} \label{eq:density-covariate} \end{equation} \]

The left-hand side of a logisitic regression model is often presented as the probability, P, of occurance and is related to λ as \(\lambda=P/(1-P)\) which is the ratio of probability of occurence. Solving for P gives us \(P = \lambda/(1 + \lambda)\) which yields the following equation:
\[ P(i) = \frac{e^{\alpha + \beta Z(i)}}{1 + e^{\alpha + \beta Z(i)}} \]

where \(\lambda(i)\) is the modeled intensity at location \(i\), \(e^{\alpha}\) (the exponent of \(\alpha\)) is the base intensity when the covariate is zero and \(e^{\beta}\) is the multiplier by which the intensity increases (or decreases) for each 1 unit increase in the covariate. This is a form of the logistic regression model–popular in the field of statistics. This equation implies that the relationship between the process that lead to the observed point pattern is a loglinear function of the underlying covariate (i.e. one where the process’ intensity is exponentially increasing or decreasing as a function of the covariate). Note that taking the log of both sides of the equation yields the more familiar linear regression model where \(\alpha + \beta Z(i)\) is the linear predictor.

Let’s work with the point distribution of Starbucks cafes in the state of Massachusetts. The point pattern clearly exhibits a non-random distribution. It might be helpful to compare this distribution to some underlying covariate such as the population density distribution.

Location of Starbucks relative to population density. Note that the classification scheme follows a log scale to more easily differentiate population density values.

Figure 11.10: Location of Starbucks relative to population density. Note that the classification scheme follows a log scale to more easily differentiate population density values.

We can fit a poisson point process model to these data where the modeled intensity takes on the form:

\[ \begin{equation} Starbucks\ density(i) = e^{\alpha + \beta\ population(i)} \label{eq:walmart-model} \end{equation} \]

The parameters \(\alpha\) and \(\beta\) are estimated from a method called maximum likelihood. Its implementation is not covered here but is widely covered in many statistics text books. And the term \((i)\) serves as a reminder that the point density and the population distribution both can vary as a function of location \(i\).

The estimated value for \(\alpha\) is -18.966. This is interpreted as stating that given a population density of zero, the base intensity of the point process is e-18.966 or 5.79657e-09–a number close to zero (as one would expect). The estimated value for \(\beta\) is 0.00017. This is interpretated as stating that for every unit increase in population density (one person per square mile), the intensity of the point process increases by e0.00017 or 1.00017.

If one were to plot the relationship between density and population, we would get:

Poisson point process model fitted to the relationship between Starbucks store locations and population density. The model assumes a loglinear relationship. Note that the density is reported in number of stores per map unit area (the map units are in meters).

Figure 11.11: Poisson point process model fitted to the relationship between Starbucks store locations and population density. The model assumes a loglinear relationship. Note that the density is reported in number of stores per map unit area (the map units are in meters).

11.3 Distance based analysis

A second order property of a pattern concerns itself with the observations’ influence on one another. For example, the distribution of oaks will be influenced by the location of parent trees: where parent oaks are present we would expect dense clusters of oaks to emerge.

An alternative to density based methods is distance based methods for pattern analysis where the interest lies in how the points are distributed relative to one another–a second-order property of the pattern.

11.3.1 Average Nearest Neighbor

An average nearest neighbor (ANN) analysis measures the average distance from each point in the study area to its nearest point. In the following example, the average nearest neighbor for all points is 1.52 units.

Distance between each point and its closest point. For example, the point closest to point 1 is point 9 which is 2.32 units away.

Figure 11.12: Distance between each point and its closest point. For example, the point closest to point 1 is point 9 which is 2.32 units away.

An extension of this idea is to plot the ANN values for different order neighbors, that is for the first closest point, then the second closest point, and so forth.

ANN values for different neighbor order numbers. For example, the ANN for the first closest neighbor is 1.52 units; the ANN for the 2nd closest neighbor is 2.14  units; and so forth.

Figure 11.13: ANN values for different neighbor order numbers. For example, the ANN for the first closest neighbor is 1.52 units; the ANN for the 2nd closest neighbor is 2.14 units; and so forth.

The shape of the ANN curve as a function of neighbor order can provide insight into the spatial arrangement of points relative to one another. In the following example, three different point patterns of 20 points are presented.

Three different point patterns: a single cluster, a dual cluster and a randomly scattered pattern.

Figure 11.14: Three different point patterns: a single cluster, a dual cluster and a randomly scattered pattern.

Each point pattern offers different ANN vs. neighbor order plots.

Three different ANN vs. neighbor order plots. The black ANN line is for the first point pattern (single cluster); the blue line is for the second point pattern (double cluster) and the red line is for the third point pattern.

Figure 11.15: Three different ANN vs. neighbor order plots. The black ANN line is for the first point pattern (single cluster); the blue line is for the second point pattern (double cluster) and the red line is for the third point pattern.

The bottom line (black line) indicates that the cluster (left plot) is tight and that the distances between a point and all other points is very short. This is in stark contrast with the top line (red line) which indicates that the distances between points is much greater. Note that the way we describe these patterns is heavily influenced by the size and shape of the study region. If the region was defined as the smallest rectangle encompassing the cluster of points, the cluster of points would no longer look clustered.

The same point pattern presented with two different study areas. How differently would you describe the point pattern in both cases?

Figure 11.16: The same point pattern presented with two different study areas. How differently would you describe the point pattern in both cases?

11.3.2 K and L functions

11.3.2.1 K function

The average nearest neighbor (ANN) statistic is one of many point pattern analysis statistics. Another statistic is the K-function which summarizes the distance between points for all distances. The calculation of K is fairly simple: it consists of dividing the mean sum of the number of points at different distance lags for each point by the area event density. For example, for point \(S1\) we draw circles, each of varying radius \(d\), centered on that point. We then count the number of points (events) inside each circle. We repeat this for point \(S2\) and all other points \(Si\). Next, we compute the average number of points in each circle then divide that number by the overall point density \(\hat{\lambda}\) (i.e. total number of events per study area).

Distance
band
(km)
# events
from S1
# events
from S2
# events
from Si
K
10 0 1 0.012
20 3 5 0.067
30 9 14 0.153
40 17 17 0.269
50 25 23 0.419


We can then plot K and compare that plot to a plot we would expect to get if an IRP/CSR process was at play (Kexpected).

K-function calculated from the Walmart stores point distribution in MA (shown in black) compared toKexpected under the IRP/CSR assumption (shown in red).

K values greater than \(K_{expected}\) indicates a greater number of points at that distance (more clustered); K values less than \(K_{expected}\) indicates fewer number of points at that distance (more dispersed).

Note that like the ANN analysis, the \(K\)-function assumes stationarity in the underlying point process (i.e. that there is no overall drift or trend in the process’ intensity).

11.3.2.2 L function

One problem with the \(K_{expected}\) is that the shape of the function tends to curve upward making it difficult to see small differences between K and \(K_{expected}\). A work-around is to transform the values in such a way that the expected values, \(K_{expected}\), lie horizontal. The transformation is calculated as follows:

\[ \begin{equation} L=\sqrt{\dfrac{K(d)}{\pi}}-d \label{eq:L-function} \end{equation} \]

The above \(\hat{K}\) is transformed to the following plot (note how the \(K_{expecetd}\) red line is now perfectly horizontal)

L-function (a simple transformation of the K-function). This graph makes it easier to compare K with K~expected~ at lower distance values. It appears that Walmart locations are more clustered than expected under CSR/IRP up to a distance of 12 km but more dispersed at distances greater than 12 km.

Figure 11.17: L-function (a simple transformation of the K-function). This graph makes it easier to compare K with Kexpected at lower distance values. It appears that Walmart locations are more clustered than expected under CSR/IRP up to a distance of 12 km but more dispersed at distances greater than 12 km.

11.3.3 The Pair Correlation Function \(g\)

A shortcoming of the \(K\) function (and by extension the \(L\) function) is its cumulative nature which makes it difficult to know at exactly which distances a point pattern may stray from \(K_{expected}\) since all points up to distance \(r\) can contribute to \(K(r)\). The pair correlation function, \(g\), is a modified version of the \(K\) function where instead of summing all points within a distance \(r\), points falling within a narrow distance band are summed instead.

Difference in how the $K$ and $g$ functions aggregate points at distance $r$ ($r$ = 30 km in this example). *All* points *up to* $r$ contribute to $K$ whereas just the points in the *annulus band* at $r$ contribute to $g$.

Figure 11.18: Difference in how the \(K\) and \(g\) functions aggregate points at distance \(r\) (\(r\) = 30 km in this example). All points up to \(r\) contribute to \(K\) whereas just the points in the annulus band at \(r\) contribute to \(g\).

The plot and interpretation of the \(g\) function differs from that of the \(K\) and \(L\) functions.

$g$-function of the Massachusets Walmart point data. Its interpretation is  similar to that of the $K$ and $L$ functions. Here, we observe distances between stores greater than expected under CSR up to about 5 km. Note that this cutoff is less than the 12 km cutoff observed with the $K$ function; this can be explained by the point pattern at the shorter $r$ distances contributing to the $K$ values at higher $r$ values--a problem avoided using the $g$-function.

Figure 11.19: \(g\)-function of the Massachusets Walmart point data. Its interpretation is similar to that of the \(K\) and \(L\) functions. Here, we observe distances between stores greater than expected under CSR up to about 5 km. Note that this cutoff is less than the 12 km cutoff observed with the \(K\) function; this can be explained by the point pattern at the shorter \(r\) distances contributing to the \(K\) values at higher \(r\) values–a problem avoided using the \(g\)-function.

If \(g(r)\) = 1, then the inter-point distances (at and around distance \(r\)) are consistant with CSR. If \(g(r)\) > 1, then the points are more clustered than expected under CSR. If \(g(r)\) < 1, then the points are more dispersed than expected under CSR. Note that \(g\) can never be less than 0.

Like its \(K\) counterpart, the \(g\)-function assumes stationarity in the underlying point process (i.e. that there is no overall drift or trend in the process’ intensity).

11.4 First and second order effects

The concept of 1st order effects and 2nd order effects is an important one. It underlies the basic principles of spatial analysis.

Tree distribution can be influenced by 1^st^ order effects such as elevation gradient of spatial distribution of soil characteristics; this, in turn, changes the  tree density distribution across the study area. Tree distribution can also be influenced by 2^nd^ order effects such as seed dispersal processes where the process is independent of location and, instead, dependent on the presence of other trees.

Figure 11.20: Tree distribution can be influenced by 1st order effects such as elevation gradient of spatial distribution of soil characteristics; this, in turn, changes the tree density distribution across the study area. Tree distribution can also be influenced by 2nd order effects such as seed dispersal processes where the process is independent of location and, instead, dependent on the presence of other trees.

Density based measurements such as kernel density estimations look at the 1st order property of the underlying process. Distance based measurements such as ANN and K-functions focus on the 2nd order property of the underlying process.

It’s important to note that it is seldom feasible to separate out the two effects when analyzing the data, thus the importance on relying on a priori knowledge of the phenomena being investigated before drawing any conclusions from the analyses results.