ANOVA: beyond the p-value

For many students, ANOVA (Analysis of Variance) is a black box. You put data in, turn the crank, and get a p-value out. But what is ANOVA actually doing? How does it see the world? This article will deconstruct ANOVA, not with complex theory, but by building a simple experiment from the ground up. We will discover where the “effects” come from, demystify what an “interaction” really is, and learn how looking at residuals can tell us if our model is telling the whole story.

PART 1: How ANOVA “Sees” Effects

Building Our Example: From Baseline to Model

We’ll start with a simple example where the response variable is tomato Yield in pounds. Let’s assume that an experiment is conducted whereby Nitrogen (N) and Phosphorous (P)–two important plant nutrients–are used to fertilize plants at different levels of concentration.

No fertilizer effect

Let’s start off with a scenario where we have six tomato plants that are not given fertilizer.

In such a scenario the tomato plants produce an average of around 10 lbs of tomatoes. We’ll assume for now that there is no random variability in plant yield (i.e. each produces exactly 10 pounds of tomatoes).

base <-rep(10,6)
df   <- data.frame(base, 
                   Plevel = rep(c("P(0.5)","P(1.5)","P(2.0)"),2), 
                   P      = rep(c(0.5,1.5,2),2),
                   Nlevel =  rep(c("N(1)","N(2)"),3), 
                   N      = rep(c(1,2),3))
df$yield <- df$base

The variable yield is the response variable. At this point, it’s soley a function of the base yield value.

Mathematically, we can express this as:

\[ Yield = \mu = 10 \] Where $\mu$ is the mean yield value that we will refer to as the global mean (aka the grand mean or global value).

This will be the baseline model. At this point, there are no N or P effects given that we did not incorporate them in the potted plants.

Add a Phosphorous effect

Next, we’ll include our first effect: We’ll add different Phosphorous concentrations (P) across rows so that each row of plants that are laid out in the matrix is assigned the same concentration. We’ll add three different concentrations that we’ll name P0.5, P1.5 and P2.0. Each concentration increases tomato yield by 0.5, 1.5 and 2 pounds respectively.

df$yieldP <- df$base + df$P

We will refer to the simple formula we use to create our data the generative model.

This gives us the following model:

\[ Yield_{(P,\ N)} = 10 + 0.5(P0.5) + 1.5(P1.5) + 2(P2.0) \] Where the P0.5, P1.5 and P2.0 are binary values that take on a value of 0 or 1 depending on which Phosphorous treatment was applied to the potted plants.

This augments our tomato yield as follows:

What an ANOVA gives us

An ANOVA analysis does not only perform a hypothesis test (outcome that results in a “p-value”). An ANOVA also provides us with the contribution each factor (aka category) makes to the response variable (tomato yield in this example).

Using the base R environment, one may extract each factor’s effect using the command:

aov(yieldP ~ Plevel, df) |> model.tables()

## Tables of effects
## 
##  Plevel 
## Plevel
##  P(0.5)  P(1.5)  P(2.0) 
## -0.8333  0.1667  0.6667

This above approach makes use of least square method–a popular implementation of an ANOVA analysis. Note that aov() requires that the factors be passes as characters (or R factors) and not as the numeric N and P values otherwise, R would have treated P and N as a single continuous variable and performed a linear regression

The aov() function is the standard, but for the rest of this article, we’ll use a function from the tukeyedar package called eda_mean_sweep(). It calculates the exact same effects but does so adopting a sweeping routine.

library(tukeyedar)
MP <- eda_mean_sweep(df, yieldP, P) 
MP$effects

## $P
##        0.5        1.5          2 
## -0.8333333  0.1666667  0.6666667

You’ll notice that the effect values for Plevel do not reflect the values used in our initial model. In other words, instead of seeing the modeled values of [0.5, 1.5, 2], we have effect values of [-0.833, 0.167, 0.667].

You will also note that the global mean value is not 10 but instead is,

MP$global

## [1] 11.33333

This results in a model of the form:

\[ Yield = 11.334 -0.833(P0.5) + 0.167(P1.5) + 0.667(P2.0) \] This might look like a completely different model from the one used to construct the dataset, but it’s just a different way of telling the same story. Let’s calculate the yield for a plant given the P1.5 treatment:

\[ Yield_{(P1.5)} = 11.334 -0.833(0) +0.167(1) + 0.667(0) = 11.5 \] It’s the exact same yield we created with our simple model $10 + 1.5 = 11.5$. The models are different, but the result is identical.

So why the different numbers? Because an analysis of variance doesn’t try to find the original generative model that generated the data. Instead, it describes the dataset in terms of deviations from an average, telling us how much better or worse each factor (or treatment in our example) is compared to the overall mean of the experiment.

In our working example, the model states that the Phosphorous treatment of 1.5 units (P1.5) adds a yield of 0.167 above an overall yield of 11.334. Likewise, it also states that a Phosphorous treatment of 0.5 units (P0.5) generates a yield that is 0.833 less than the overall mean in this experiment.

A two-way analysis

A purely additive effect

So what happens when we add Nitrogen as a treatment to our tomato plants? One possible outcome is that the effect is solely additive meaning that the benefits afforded by each nutrient are added to one another. The previous model can be augmented as,

df$yieldPN <- df$base + df$N + df$P

This yields the following mathematical mode:

\[ Yield_{(P,\ N)} = 10 + 0.5(P0.5) + 1.5(P1.5) + 2(P2.0) + 1(N1) + 2(N2) \] As with the Phosphorous factors, the Nitrogen factors take on a value of 0 or 1 depending on which column of tomatoes is assigned the treatment.

Now that we’ve created a dataset with two independent, additive effects, let’s see what the ANOVA “report” looks like. We’ll run a two-way analysis of variance and look at the effects it extracts.

MPN <- eda_mean_sweep(df, yieldPN, P, N)
MPN$global

## [1] 12.83333

MPN$effects

## $P
##        0.5        1.5          2 
## -0.8333333  0.1666667  0.6666667 
## 
## $N
##    1    2 
## -0.5  0.5

The output gives us a gloabl mean and the effects for both Plevel and Nlevel.

Global Mean: 12.833
Plevel effects: -0.833, 0.167, 0.667
Nlevel effects: -0.5, 0.5

This is the classic signature of an additive model. The term “additive” means the effect of one nutrient doesn’t change based on the level of the other. The bonus from using Nitrogen is the same regardless of the Phosphorous treatment, and vice-versa. Because we created the data this way, the ANOVA model correctly shows two separate, independent sets of effects.

The ANOVA model can thus be expressed as:

\[ Yield_{(P,\ N)} = 12.83 - 0.833(P0.5) + 0.167(P1.5) + 0.667(P2.0) - 0.5(N1) + 0.5(N2) \]

Here too, this is nothing more than a re-expression of the generative model used to create the data. To confirm this, we can compute the estimated tomato yield for a plant assigned a Phosphorous treatment of P1.5 and a Nitrogen treatment of N2.0:

\[ Yield_{(P1.5,\ N2)} = 12.83 - 0.833(0) + 0.167(1) + 0.667(0) - 0.5(0) + 0.5(1) = 13.5 \] This is the same yield generated from the base model.

The ANOVA output correctly suggests that the P treatment effect is greater than the N treatment effect meaning that the Phosphorous factor, as a whole, creates more variation in the yield than the Nitrogen factor does. This doesn’t mean that every level of P is better than N, but that the choice of Phosphorous level has a bigger impact on the final outcome.

Here’s how we can tell from the effect values: * For Nitrogen, the effects are -0.5 and +0.5. The “swing” from the worst level (N1) to the best level (N2) is (+0.5) - (-0.5) = 1 pound of tomatoes. * For Phosphorous, the effects range from -0.833 to +0.667. The “swing” from the worst level (P0.5) to the best level (P2.0) is (+0.667) - (-0.833) = 1.5 pounds.

Because the total range of influence is bigger for Phosphorous (a 1.5 lb swing) than for Nitrogen (a 1.0 lb swing), the ANOVA correctly identifies that P is the more impactful factor in our experiment for explaining the variation in tomato yield.

Calculating the range of effects is one way to see which factor is more impactful. An even more direct way is to visualize the effects on a common scale using a variability decomposition plot.

plot(MPN, show.resp = TRUE, label = TRUE)

This plot perfectly summarizes our analysis. Let’s break it down:

On the far left, the Response boxplot shows the total spread of our final tomato yields (yieldPN), centered around the global mean. This represents the total variation we are trying to explain.
The next two sets of points show the effects for Plevel and Nlevel. Each dot represents the “nudge” (positive or negative) that a specific treatment level gives to the yield.

Most importantly, you can now see which factor is more powerful. The spread of the dots for Plevel is visibly wider than the spread for Nlevel. This is the graphical proof of what we calculated manually: the 1.5 lb swing from Phosphorous has a larger impact than the 1.0 lb swing from Nitrogen.

You will also notice a component on the plot labeled Residuals which is empty (all values are zero). This is because the Residuals represent the “leftover” variation that our model cannot explain. Since we created a perfect, “textbook” dataset with no random variability, our additive model of P and N effects accounts for 100% of the variation in the yield. This leaves nothing behind. We can confirm this by extracting the residuals from the MPN model.

MPN$residuals

## [1] -5.551115e-16 -5.551115e-16 -5.551115e-16 -5.551115e-16 -5.551115e-16
## [6] -5.551115e-16

Note that whenever you see a number ending in e-15 or e-16, you can confidently treat it as the computer’s way of saying “zero.”

Later, when we introduce natural variability, this residuals component will be crucial for understanding our model’s fit.

Adding an interactive effect

An interaction means the effect of one factor depends on the level of another. Let’s create a classical ANOVA interaction. We’ll model a synergistic effect where the combination of the highest levels of Nitrogen (N2) and Phosphorous (P3) gives a special boost to the yield that is more than the sum of its parts.

We’ll create an explicit interaction term. It will be 0 for all combinations except for N=2 and P=2 where we’ll add a boost of 2.5.

df$interaction <- ifelse(df$N == 2 & df$P == 2, 2.5, 0)
df$yieldPN2 <- df$base + df$N + df$P + + df$interaction

Now, if we were to fit the additive model used earlier, we would not be able to fully capture the interactive effect.

MPN2 <- eda_mean_sweep(df, yieldPN2, P, N)
MPN2$global

## [1] 13.25

MPN2$effects

## $P
##   0.5   1.5     2 
## -1.25 -0.25  1.50 
## 
## $N
##          1          2 
## -0.9166667  0.9166667

We have a model of the form,

\[ Yield_{(P,\ N)} = 13.25 - 1.25(P0.5) -0.25(P1.5) + 1.5(P2.0) - 0.92(N1) + 0.92(N2) \] If we use this additive model to reconstruct the tomato yield for the P2.0:N2 treatment combination, we get the following yield

\[ Yield_{(P2.0,\ N2)} = 13.25 - 1.25(0) -0.25(0) + 1.5(1) - 0.92(0) + 0.92(1) = 15.67 \] This computed yield is nearly one pound less than the expected yield of 16.5.

If an interaction between effects is suspected, then the interactive component needs to be added to the model. With the eda_mean_sweep function, we add the interactive component via the max_order component. We set this value to 2 indicating that we are seeking all two-way interactions.

MPN3 <- eda_mean_sweep(df, yieldPN2, P, N, max_order = 2)
MPN3$effects

## $P
##   0.5   1.5     2 
## -1.25 -0.25  1.50 
## 
## $N
##          1          2 
## -0.9166667  0.9166667 
## 
## $`P:N`
##      0.5.1      1.5.1        2.1      0.5.2      1.5.2        2.2 
##  0.4166667  0.4166667 -0.8333333 -0.4166667 -0.4166667  0.8333333

MPN3$global

## [1] 13.25

Adding the interactive component (max_order = 2) creates a more complex model. It still has the main effects for P and N, but it adds a specific adjustment value for each unique P:N combination. These adjustments are the interaction effects, and they account for the synergy that the simple additive model missed. We now have a (correct) model which includes the global mean, the two main effects, and all six of the individual interaction adjustments:

\[ \begin{align} {Yield_{(P,\ N)}} &= 13.25 - 1.25(P0.5) -0.25(P1.5) + 1.5(P2.0) - 0.92(N1) + 0.92(N2) \\ &0.417(P0.5:N1) + 0.417(P1.5:N1) - 0.833(P2.0:N1) \\ &- 0.417(P0.5:N2) - 0.417(P1.5:N2) + 0.833(P2.0:N2) \end{align} \] Where the interactive factor levels (e.g. (P0.5:N1)) are binary values that take on a 1 if the response value is associated with that combination of effects.

Recall from our earlier discussion that the ANOVA does not seek to recreated the base model used to generate the data. As such, it’s no surprise that non-zero values are associated with the interactive effects that were explicitly set to 0 in the base model.

Revisiting our example of the tomato yield for a treatment of P2.0:N2, we can now use the full model, which includes the main effects plus the specific interaction effect for that cell

\[ \begin{align} {Yield_{(P,\ N)}} &= 13.25 - 1.25(0) -0.25(0) + 1.5(1) - 0.92(0) + 0.92(1) \\ &0.417(0) + 0.417(0) - 0.833(0) \\ &- 0.417(0) - 0.417(0) + 0.833(1) = 16.5 \end{align} \] This matches the 16.5 lbs yield we originally generated. By including the interaction term, the model can now perfectly explain the data.

Residuals

So, what does this failure of the additive model (MPN2) look like visually? Let’s look at the decomposition plot for the additive-only model that we fitted to our interactive data.

plot(MPN2, show.resp=TRUE, label = TRUE)

This plot tells a clear story. Previously, when we analyzed a truly additive dataset, the Residuals component was empty. But look at it now! It has a very wide spread, with values ranging from about -0.833 to +0.833.

This is the visual proof that our model is wrong. Those large residuals are the “synergy” from the interaction that the simple additive model could not explain. In statistics, we give this leftover, unexplained variation a formal name: the Residual, often symbolized by the Greek letter $\varepsilon$.

As such, when fitting a model to data other than one created synthetically (as is the case here), you want to include a Residuals component, $\varepsilon$ to your model. Given that some of the yield data in df$yieldPN2 was not captured by the N and P components, our earlier equation was not complete. Adding $\varepsilon$ completes the model.

\[ Yield_{(P,\ N)} = 13.25 - 1.25(P0.5) -0.25(P1.5) + 1.5(P2.0) - 0.92(N1) + 0.92(N2) +\varepsilon \] When we tried reconstructing the yield (df$yieldPN2) for the P2.0:N2 combination, we were short 0.833 lbs. That value was, in fact, the residual $\varepsilon$ for that specific observation.

By explicitly adding the interaction effect to our model, we give that leftover variation a proper home. The variation that was previously unexplained and forced into the $\varepsilon$ term is now correctly captured by the P:N interaction term.

plot(MPN3, show.resp=TRUE, label = TRUE)

The P:N interaction term now appears in the plot, and it has captured all the variation that was previously “leaking” into the residuals. The Residuals component is now empty again (because we did not include unexplained variability in the base model), telling us that our full model perfectly explains the data we created.

Part 2: Finding the Signal in the Noise

So far, we have explored a “perfect” world. By carefully building a model that matched our data’s structure (including the interaction), we were able to explain 100% of the variation in our tomato yields, leaving us with zero residuals.

But real-world data is never this clean. Natural variation is everywhere: one tomato plant might be a bit stronger, the soil in one pot might hold slightly more water, or one corner of the greenhouse might get a little more sun. This “natural variability” introduces random noise that can obscure the true effects of our experiment.

An ANOVA does not just describe pristine datasets, it can also distinguish the signal of a true effect from the background noise of random chance. In this next section, we will leave our perfect world behind. We’ll introduce this natural variability into our experiment and learn how ANOVA helps us separate what is important from what is merely random.

Step 1: Understanding the Noise

Let’s return to our baseline scenario: six tomato plants with no fertilizer. This time, we’ll introduce the natural variability we discussed by giving each plant a slightly different base yield.

If we layout the plants in a grid, we have the following yield distributions:

Our current baseline model is thus,

\[ Yield = \mu + \varepsilon \] where $\mu$ is the global mean of 10 and $\varepsilon$ is the residual term (the +1 and -1 variability we added to some of the yield values). Let’s see what happens when we run an ANOVA on just this noise.

MR <- eda_mean_sweep(df2, yield, P, N)
MR$global

## [1] 10

MR$effects

## $P
## 0.5 1.5   2 
##   0   0   0 
## 
## $N
## 1 2 
## 0 0

MR$residuals

## [1] -1  0  1  1  0 -1

The output shows exactly what we designed it to: the main effects for Plevel and Nlevel are zero. All the variability we added has been isolated perfectly into the Residuals. This is because we intentionally created a “noise” vector that was perfectly balanced, or orthogonal, to our experimental design. This isn’t just luck; it’s a crucial design principle that ensures our noise doesn’t masquerade as a fake effect, a problem known as confounding

The variability decomposition plot shows this visually: there is variation in the Response and the Residuals, but no effects for P or N.

plot(MR, show.resp = TRUE, label = TRUE)

Step 2: Finding the Signal within the Noise

Now for the key step. Let’s create our “realistic” additive dataset by adding the same P and N effects from Part 1 on top of our noisy baseline.

df2$yieldPN <- df2$base + df2$N + df2$P

We now run our ANOVA on this complete, noisy dataset:

MRPN <- eda_mean_sweep(df2, yieldPN, P, N)
MRPN$global

## [1] 12.83333

MRPN$effects

## $P
##        0.5        1.5          2 
## -0.8333333  0.1666667  0.6666667 
## 
## $N
##    1    2 
## -0.5  0.5

MRPN$residuals

## [1] -1.000000e+00 -5.551115e-16  1.000000e+00  1.000000e+00 -5.551115e-16
## [6] -1.000000e+00

Let’s look at the decomposition plot:

plot(MRPN, show.resp = TRUE, label = TRUE)

This plot reveals the true power of ANOVA. Look closely:

The Plevel and Nlevel effects are identical to the ones we saw in the “perfect world” of Part 1. The noise did not distort the signal at all.
The Residuals component is a perfect match for the noise we analyzed in Step 1. The residuals from our addtive model (MRPN) should numerically identical to the residuals from our noise-only model (MR). Let’s check:

all.equal(MR$residuals, MRPN$residuals)

## [1] TRUE

The TRUE output confirms it. The signal and the noise have been perfectly separated.

ANOVA has perfectly decomposed our messy, “realistic”, data back into its original parts: Signal (P and N effects) and Noise (Residuals).

We can also compare the magnitude of the signal to the noise using the “swing” concept. The swing of the Phosphorous effect is 1.5 lbs and Nitrogen is 1.0 lbs, while the swing of the Residuals is 1 - (-1) = 2.0 lbs. In this particular experiment, the random noise we added is actually the single largest source of variation in our data!

We’ve now seen that even with random noise, a well-designed experiment allows ANOVA to cleanly recover the true, underlying effects. The key was that the variation was separated into either a main effect or the residuals. But what happens if the model is wrong? What if an interaction effect is also hiding in the data, mixed in with the noise? This is where the visual analysis of residuals—the field of diagnostic plotting—becomes our most important tool.

Step 3: Finding a Hidden Interaction in the Noise

We’ve achieved our goal: separating the clean signals of our main effects from the random background noise. But we were successful because our model (P + N) was a perfect description of the real effects in the data.

What happens in a more realistic and challenging scenario where an important effect is hiding in our data that we don’t know about? Let’s repeat the synergistic effect from Part 1, but this time, we’ll add it to our noisy dataset. The interaction effect is now a “hidden signal” mixed in with the random noise. Can ANOVA still find it?

Creating the interactive noisy data

Let’s create our new dataset, yieldPNI, by adding the same 2.5 lb boost to the P2.0:N2 combination on top of our noisy additive data (yieldPN).

df2$interaction <- ifelse(df2$N == 2 & df2$P == 2, 2.5, 0)
df2$yieldPNI <- df2$yieldPN + df2$interaction

The Diagnostic Clue: What the Wrong Model Tells Us

Imagine we are analyzing this data for the first time and we don’t know about the interaction. Our first logical step is to fit the simple additive model just as we did in Step 2.

MRPN2 <- eda_mean_sweep(df2, yieldPNI, P, N)
plot(MRPN2, show.resp = TRUE, label = TRUE)

The variability decomposition plot is great for showing us how much variation each component explains, but it is not an effective tool for diagnosing a hidden interaction.

To find the real clue, we need a more specialized tool: the diagnostic plot.

plot(MRPN2, plot = "diagnostic")

##                        int Comparison Value for P:N^1 
##              -4.532467e-17              -3.263930e+00

This plot is specifically designed to hunt for hidden multiplicative interactions. It graphs the Residuals from our additive model against a set of Comparison Value calculated from the main effects. If the points were to form a random, unstructured cloud around the horizontal line, it would suggest our additive model is sufficient.

But here, we see a clear (downward) sloping trend. This non-random pattern is evidence that our simple additive model is may not be appropriate and that a hidden interaction effect is at play.

The Final Decomposition: Fitting the Complete Model

The sloping pattern in the diagnostic plot told us that a simple additive model was not enough and that a hidden interaction was likely at play. Now, we can act on that clue and fit the correct, complete model, including the two-way interaction term.

MRPNI <- eda_mean_sweep(df2, yieldPNI, P, N, max_order = 2)
plot(MRPNI, show.resp = TRUE, label = TRUE)

All four of our original components have been successfully identified and separated: The P effect, The N effect, The P:N interaction effect, and the Residuals (our original random noise).

But Wait… Where Did the Residuals Go?

You might notice something surprising and confusing in the plot above. While the P:N interaction has appeared, the Residuals (our original random noise) are all zero!

This is not a mistake. This gets to the heart of what ANOVA models do. Think of it this way:

Our experiment has 6 unique cells (P0.5/N1, P0.5/N2, etc.), and we have exactly one data point (tomato plant) in each slot. A full interactive model is incredibly flexible; it creates a parameter not just for each row (P) and each column (N), but also a unique adjustment parameter for every single cell (the P:N interaction).

We have 6 data points and the model has created just enough parameters to perfectly describe those 6 unique slots. It can now assign a value to each slot to perfectly match the data, leaving nothing left over. The model has become so flexible that it has “absorbed” all the remaining variation into the most complex term it has: the interaction.

This reveals a critical lesson. In a saturated model with only one observation per cell, while the final predicted yield for each cell is perfectly accurate, the individual effect values can be misleading. Here’s what’s really happening:

As we saw when we fit the wrong additive model, the presence of the interaction slightly distorted the main effects for P and N.
In this final (complete) model, the P:N interaction term has absorbed all the leftover variation: the “true” synergy effect, the original random noise, and the small adjustments needed to account for the distorted main effects.

Therefore, you cannot distinguish the true interaction from the random noise. The calculated P:N effect is a combination of all three components. To be able to truly separate these, we would need more data—specifically, multiple plants (replicates) for each treatment combination.

The Final Piece: Using Replicates to Separate Interaction from Noise

We’ve seen that with only one observation per treatment combination, a full interactive model becomes “saturated,” and it can no longer distinguish the true interaction effect from random noise.

So how do we solve this? The answer lies in good experimental design: replication.

In a real experiment, we would never have just one plant for each treatment. We would have multiple plants, or replicates, for every combination of P and N.

Let’s simulate this by creating a new dataset where we have two plants for each of our 6 treatment cells, for a total of 12 plants.

We will give each of these 12 plants its own unique “noise” value (ecnompassed in the base value).

base  <- c(9.5, 9.5, 11.5, 10.5, 10.5, 9.5, 8.5, 10.5, 10.5, 11.5, 9.5, 8.5)
df3   <- data.frame(base, 
                    Plevel = rep(c("P(0.5)","P(1.5)","P(2.0)"),4), 
                    P      = rep(c(0.5,1.5,2),2), 
                    Nlevel = rep(c("N(1)","N(2)"),6), 
                    N      = rep(c(1,2),3))
df3$interaction <- ifelse(df3$N == 2 & df3$P == 2, 2.5, 0)
df3$yield <- df3$base + df3$N + df3$P + df3$interaction

Now, let’s fit the full interactive model (max_order = 2) to this new, replicated dataset and look at the final decomposition plot.

MRPNR <- eda_mean_sweep(df3, yield, P, N, max_order = 2)
plot(MRPNR, show.resp=TRUE, label = TRUE)

Because we added replicates, our model is no longer saturated. We gave it more than enough information to do its job properly. Notice that we now have both a P:N interaction component and a non-zero Residuals component.

Beyond Our Example: A Note on Data and Design

Throughout this article, we have used a single, clean “experimental” dataset that we built ourselves. However, ANOVA is a flexible tool used in many different situations. It’s helpful to understand a few key terms that describe the structure of data, as they determine what you can learn from your model.

Observational vs. Experimental Data

The data we created mimics experimental data, where we, the researchers, control the factors and assign treatments to subjects (our plants).

Much of the data in the world is observational. For example, if you analyze house prices using the factors “Neighborhood” and “Number of Bedrooms,” you are not assigning houses to neighborhoods; you are simply observing the data that already exists. This data is often messy and doesn’t fit the clean structures we discuss next.

Balanced vs. Unbalanced Designs

Our tomato experiment was a balanced design. This means we had an equal number of observations (plants) for every combination of our factor levels. We had exactly one plant, and later two plants, for every P:N cell. This is the ideal scenario.

An unbalanced design occurs when the number of observations is unequal across the cells. Imagine if some of our tomato plants had died. We might have two observations for some cells, but only one or zero for others. While ANOVA can handle this, the interpretation of the effects becomes much more complex because the influences of the factors are no longer cleanly separated.

Replicated vs. Unreplicated Designs

An unreplicated design has exactly one observation for each combination of factor levels. Our models in Part 1 were unreplicated. As we saw, this makes it impossible to separate the highest-order interaction from random error in a noisy dataset.

A replicated design has more than one observation for each cell. Adding replicates was what finally allowed our model to estimate the P:N interaction and the random error (Residuals) as two separate components.

Two-Way vs. N-Way ANOVA

Our example used two factors (P and N), making it a two-way ANOVA. This principle can be easily extended. An analysis with three factors is a three-way ANOVA, and so on. We often refer to any analysis with two or more factors as an n-way ANOVA or factorial ANOVA.

As you add more factors, the complexity of the potential interactions grows rapidly. A three-way ANOVA can have three main effects (A, B, C), three two-way interactions (A:B, A:C, B:C), and even a three-way interaction (A:B:C). The principles of decomposition, additivity, and diagnostics that we explored here become even more critical as the models grow in complexity.

Additional Resources

If you’d like to continue exploring the concepts covered in this article, here are a few recommended resources:

For a robust alternative of a traditional ANOVA, see the median polish article and its accompanying function eda_pol(). For an n-way version of the median polish, see the function eda_npol().
For a presentation of a traditional ANOVA and its hypothesis testing side of the statistic, see this article.

Manuel Gimond

2026-06-05