Linear mixed effect models

R
Stats
Mixed effects models
Published

March 3, 2026

Welcome to this introduction to Linear Mixed-effects Models (LMM)!! In this tutorial, we will use R to run some simple LMMs, and we will try to understand together how to leverage these models for our analysis. LMMs are amazing tools that have saved our asses countless times during our PhDs and Postdocs. They'll probably continue to be our trusty companions forever.

Note

This tutorial offers a gentle introduction to running linear mixed-effects models without diving deep into the mathematical and statistical foundations. If you’re interested in exploring those aspects further, plenty of online resources are available.

For the best experience, you should have already completed the previous tutorial on linear models. That foundation will make this tutorial much easier to follow and understand!

This tutorial introduces the statistical concept of Hierarchical Modeling, often called Mixed Effects Modeling. This approach shines when dealing with nested data—situations where observations are grouped in meaningful ways, like students within schools or measurements across individuals.

Sounds like a mouthful, right? Don’t worry! Let’s kick things off with something a little more fun: Simpson’s Paradox.

Simpson’s Paradox is a statistical head-scratcher. It’s when a trend you see in your data suddenly flips—or even vanishes—when you split the data into groups. Ready to see it in action? Let’s dive in!

Imagine we’re looking at how years of experience impact salary at a university. Here’s some simulated data to make it fun.

Code
library(easystats)
library(tidyverse)
library(patchwork)
set.seed(1234)
data <- simulate_simpson(n = 10, groups = 5, r = 0.5,difference = 1.5) %>% 
  mutate(V2= (V2 +abs(min(V2)))*10000) %>% 
  rename(Department = Group)

# Lookup vector: map old values to new ones
lookup <- c(G_1 = "Informatics", G_2 = "English", 
            G_3 = "Sociology", G_4 = "Biology", G_5 = "Statistics")

# Replace values using the lookup vector
data$Department <- lookup[as.character(data$Department)]


one = ggplot(data, aes(x = V1, y = V2)) +
  geom_point()+
  geom_smooth(method = 'lm')+
  labs(y='Salary', x='Year of experience', title = "A. Linear model")+
  theme_bw(base_size = 20)

two = ggplot(data, aes(x = V1, y = V2)) +
  geom_point(aes(color = Department)) +
  geom_smooth(aes(color = Department), method = "lm", alpha = 0.3) +
  geom_smooth(method = "lm", alpha = 0.3)+
  labs(y='Salary', x='Year of experience', title = "B. Linear model acounting for grouping structure")+
  theme_bw(base_size = 20)+
  theme(legend.position = 'bottom')

(one / two)

Take a look at the first plot. Whoa, wait a minute—this says the more years of experience you have, the less you get paid! What kind of backwards world is this? Before you march into HR demanding answers, let’s look a little closer.

Now, check out the second plot. Once we consider the departments—Informatics, English, Sociology, Biology, and Statistics—a different story emerges. Each department shows a positive trend between experience and salary. In other words, more experience does mean higher pay, as it should!

So what happened? The first plot ignored the hierarchical structure of the data. By lumping all departments together, it completely missed the real relationship hiding in plain sight. This is why Hierarchical Modeling is so powerful—it helps us avoid embarrassing statistical blunders like this one. It allows us to correctly analyze data with nested structures and uncover the real patterns.

Now, this example is a bit of an extreme case. In real life, you’re less likely to find such wildly opposite effects. However, the influence of grouping structures on your analysis is very real—and often subtle. Ignoring them can lead to misleading conclusions.

Ready to explore how Mixed Effects Modeling helps us account for these nested structures? Let’s dive in and get hands-on!

Mixed effects models

So, why are they called mixed effects models? It’s because these models combine two types of effects: fixed effects and random effects……. And now you’re wondering what those are, don’t worry—I’ve got you! 😅

Remember how departments (Informatics, English, Sociology, etc.) completely changed the story about experience and salary? That’s exactly where fixed and random effects come in.

  • Fixed effects capture the consistent, predictable trends in your data—like the relationship between experience and salary across all departments. These are the big-picture patterns you’re curious about and want to analyze.

  • Random effects, on the other hand, account for the variability within groups—like how salaries differ between departments. You’re not deeply analyzing these differences, but you know they’re there, and ignoring them could mess up your results.

Without accounting for random effects, it’s like assuming every department is exactly the same—and we’ve already seen how misleading that can be!

The model will estimate the fixed and random effects but don’t worry—the model won’t get super complex. The estimates and p-values will primarily focus on the fixed effects, but it will account for the random effects in the background to make sure the results are accurate. In other words, random effects are variables that you know contribute to the variance in your model, and you want to account for them, but you’re not directly interested in obtaining a result about each one.

Got the gist? Great! Now enough with words…let’s dive into a real example and see these mixed effects in action!

Settings and data

In this section, we’ll set up our working environment by loading the necessary libraries and importing the dataset. You’ll likely already have this dataset available if you completed the previous linear models tutorial. If not, don’t worry—you can easily download it using the link below:

The lme4 package is the go-to library for running Linear Mixed Models (LMM) in R. To make your life easier, there’s also the lmerTest package, which enhances lme4 by allowing you to extract p-values and providing extra functions to better understand your models. In my opinion, you should always use lmerTest alongside lme4—it just makes everything smoother!

To run our Linear Mixed Effects Model, these are the key packages we’ll use. On top of that, the tidyverse suite will help with data wrangling and visualization, while easystats will let us easily extract and summarize the important details from our models. Let’s get started!

Read the data

df = read.csv("../../resources/Stats/Dataset.csv")
df$Id = factor(df$Id) # make sure subject_id is a factor
df$StandardTrialN = standardize(df$TrialN) # create standardize trial column

After importing the data, we’ve ensured that Id is treated as a factor rather than a numerical column and that we have a standardized column of TrialN.

Linear Model

While we have already seen how to run a linear model, we will rerun it here as a comparison to the next steps. In case something is not clear about this lm(), please go back to the previous tutorial on linear models.

mod_lm = lm(LookingTime ~ StandardTrialN*Event, data = df)
summary(mod_lm)

Call:
lm(formula = LookingTime ~ StandardTrialN * Event, data = df)

Residuals:
   Min     1Q Median     3Q    Max 
-772.0 -159.7   -6.1  190.5  607.1 

Coefficients:
                            Estimate Std. Error t value Pr(>|t|)    
(Intercept)                1310.3674    16.9387  77.359  < 2e-16 ***
StandardTrialN              -43.8501    17.2505  -2.542   0.0113 *  
EventReward                 101.2269    21.1961   4.776 2.21e-06 ***
StandardTrialN:EventReward    0.7688    21.4872   0.036   0.9715    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 253.2 on 656 degrees of freedom
  (140 observations deleted due to missingness)
Multiple R-squared:  0.05147,   Adjusted R-squared:  0.04713 
F-statistic: 11.87 on 3 and 656 DF,  p-value: 1.417e-07

We won’t delve into the details of the model results in this tutorial, as we have already covered it in the previous one. However, we want to point out one thing about the data we run it on!!

str(df)
'data.frame':   800 obs. of  7 variables:
 $ Id            : Factor w/ 20 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Event         : chr  "NoReward" "NoReward" "NoReward" "NoReward" ...
 $ TrialN        : int  1 2 3 4 5 6 7 8 9 10 ...
 $ ReactionTime  : num  344 362 406 395 382 ...
 $ LookingTime   : num  1140 1156 1074 1107 991 ...
 $ SES           : chr  "low" "low" "low" "low" ...
 $ StandardTrialN: 'dw_transformer' num  -1.646 -1.473 -1.3 -1.127 -0.953 ...
  ..- attr(*, "center")= num 10.5
  ..- attr(*, "scale")= num 5.77
  ..- attr(*, "robust")= logi FALSE

Wait a minute! Look at our data - we have an Id column! 👀 This column tells us which participant each trial belongs to. As each subject experienced all trial conditions, we have multiple data points per person. This is similar to the departments in the previous example… it’s a grouping variable

Wait..but then we should have taken it into consideration!!!

Instead, there was nothing about Id in our lm()…there is nothing in the formula about Id….

Yes, we did not account for this grouping structure…let’s fix that!! But how do we do so? Well, at this point it’s obvious…with Mixed effects models!! Let’s dive in..

Mixed Effects

Random Intercept

Alright, let’s start with Random Intercepts! What are they? Well, the name gives it away—they’re just intercepts…but with a twist! 🤔

If you recall your knowledge of linear models, you’ll remember that each model has one intercept—the point where the model crosses the y-axis (when x=0).

But what makes random intercepts special? They allow the model to have different intercepts for each grouping variable—in this case, the Ids. This means we’re letting the model assume that each subject may have a slightly different baseline performance.

Here’s the idea:

  • One person might naturally be a bit better.

  • Someone else could be slightly worse.

  • And me? Well, let’s just say I’m starting from rock bottom.

However, even though we’re starting from different baselines, the rate of improvement over trials can still be consistent across subjects.

This approach helps us capture variation in the starting performance, acknowledging that people are inherently different but might still follow a similar overall pattern of improvement. It’s a simple yet powerful way to model individual differences!

Now, let’s look at how to include this in our mixed model.

Model

To run a linear mixed-effects model, we’ll use the lmer function from the lme4 package. it Functions very similarly to the lm function we used before: you pass a formula and a dataset, but with one important addition: specifying the random intercept.

The formula is nearly the same as a standard linear model, but we include (1|subject_id) to tell the model that each subject should have its own unique intercept. This accounts for variations in baseline performance across individuals.

Caution

When specifying random intercepts (like (1|Group)), your grouping variables must be factors! If a grouping variable is numeric, R will wrongly treat it as a continuous variable rather than discrete categories. Character variables are automatically fine, but numeric grouping variables must be explicitly converted using factor().

mod_rintercept =lmer(LookingTime ~ StandardTrialN * Event+ (1|Id ), data= df,  na.action = na.exclude)
summary(mod_rintercept)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: LookingTime ~ StandardTrialN * Event + (1 | Id)
   Data: df

REML criterion at convergence: 7665.4

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.4289 -0.6265  0.0007  0.6061  4.1655 

Random effects:
 Groups   Name        Variance Std.Dev.
 Id       (Intercept) 61071    247.13  
 Residual              5669     75.29  
Number of obs: 660, groups:  Id, 20

Fixed effects:
                           Estimate Std. Error       df t value Pr(>|t|)    
(Intercept)                1291.113     55.518   19.248  23.256 1.49e-15 ***
StandardTrialN              -61.378      5.321  637.281 -11.536  < 2e-16 ***
EventReward                 120.849      6.552  637.315  18.446  < 2e-16 ***
StandardTrialN:EventReward   18.901      6.535  637.176   2.892  0.00396 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) StndTN EvntRw
StandrdTrlN  0.043              
EventReward -0.079 -0.362       
StndrdTN:ER -0.035 -0.812  0.299

Wow! Now the model is showing us something new compared to the simple linear model. We observe an interaction between Event and StandardTrialN. By letting the intercept vary for each subject, the model is able to capture nuances in the data that a standard linear model might miss.

To understand this interaction, let’s plot it and see how performance changes across trials for each condition.

Code
i_pred = estimate_expectation(mod_rintercept, include_random=T)

ggplot(i_pred, aes(x= StandardTrialN, y= Predicted, color= Id, shape = Event))+
    geom_point(data = df, aes(y= LookingTime, color= Id), position= position_jitter(width=0.2))+
    geom_line()+
    geom_ribbon(aes(ymin=Predicted-SE, ymax=Predicted+SE, fill = Id),color= 'transparent', alpha=0.1)+
    labs(y='Looking time', x='# trial')+
    theme_modern(base_size = 20)+
    theme(legend.position = 'none')+
    facet_wrap(~Event)

As you can see here, each color represents a different subject, and we’ve divided the plot into two panels - one for each type of event - to make visualization simpler. Cool isn’t it??.

Now, you might be thinking, “This looks interesting, but my plot is going to be a mess with all these individual estimates!” Well, don’t worry! While what we’ve plotted is how the data is modeled by our mixed-effects model, the random effects are actually used to make more accurate estimates—but the model still returns an overall estimate.

Think of it like this: the random effects allow the model to account for individual differences between subjects. But instead of just showing all the individual estimates in the plot, the model takes these individual estimates for each subject and returns the average of these estimates to give you a cleaner, more generalizable result.

we can plot the actual estimate of the model:

Code
i_pred = estimate_expectation(mod_rintercept, include_random =F)

ggplot(i_pred, aes(x= StandardTrialN, y= Predicted))+
    geom_point(data = df, aes(y= LookingTime, color= Id, shape = Event), position= position_jitter(width=0.2))+
    geom_line(aes(group= Event),color= 'blue', lwd=1.4)+
    geom_ribbon(aes(ymin=Predicted-SE, ymax=Predicted+SE, group= Event),color= 'transparent', alpha=0.1)+
    labs(y='Looking time', x='# trial')+
    theme_bw(base_size = 20)+
  theme(legend.position = 'none')+
  facet_wrap(~Event)

Slope

Coool!!!!!!! So far, we’ve modeled a different intercept for each subject, which lets each subject have their own baseline level of performance. But here’s the catch: our model assumes that everyone improves over the trials in exactly the same way, with the same slope. That doesn’t sound quite right, does it? We know that some people may get better faster than others, or their learning might follow a different pattern.

Model

This is where we can model random slopes to capture these individual differences in learning rates. By adding (0 + StandardTrialN | Id), we’re telling the model that while the intercept (starting point) is the same for everyone, the rate at which each subject improves (the slope) can vary.

This way, we’re allowing each subject to have their own slope in addition to their own intercept, making the model more flexible and reflective of real-world variations in learning!

Caution

Any variable used as a random slope (before the |) must also be included as a fixed effect in your model. The fixed effect estimates the overall effect, while the random slope captures how that effect varies across groups. Without the fixed effect, you’re modeling deviations from zero instead of from an average, which rarely makes theoretical sense.

mod_rslope =lmer(LookingTime ~ StandardTrialN * Event+ (0 + StandardTrialN | Id ), data= df)
summary(mod_rslope)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: LookingTime ~ StandardTrialN * Event + (0 + StandardTrialN |      Id)
   Data: df

REML criterion at convergence: 9136.9

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.3355 -0.6449  0.0801  0.7321  2.7171 

Random effects:
 Groups   Name           Variance Std.Dev.
 Id       StandardTrialN  2407     49.06  
 Residual                61817    248.63  
Number of obs: 660, groups:  Id, 20

Fixed effects:
                            Estimate Std. Error        df t value Pr(>|t|)    
(Intercept)                1310.8278    16.9264  655.9797  77.443  < 2e-16 ***
StandardTrialN              -43.8937    20.5097   65.7394  -2.140   0.0361 *  
EventReward                 100.7614    21.0343  652.7701   4.790 2.06e-06 ***
StandardTrialN:EventReward    0.7738    21.3998  655.1091   0.036   0.9712    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) StndTN EvntRw
StandrdTrlN  0.354              
EventReward -0.804 -0.284       
StndrdTN:ER -0.337 -0.683  0.277

The results aren’t too different from the intercept-only model, but let’s take a closer look at what we’ve actually modeled.

Code
s_pred = estimate_expectation(mod_rslope, include_random =T)

ggplot(s_pred, aes(x= StandardTrialN, y= Predicted, color= Id, shape = Event))+
    geom_point(data = df, aes(y= LookingTime, color= Id), position= position_jitter(width=0.2))+
    geom_line()+
    geom_ribbon(aes(ymin=Predicted-SE, ymax=Predicted+SE, fill = Id),color= 'transparent', alpha=0.1)+
    labs(y='Looking time', x='# trial')+
    theme_modern(base_size = 20)+
    theme(legend.position = 'none')+
    facet_wrap(~Event)

Intercept + Slope

That plot does look nuts, and it’s a clear signal that something is off. Why? Because by modeling only the random slopes while keeping the intercepts fixed, we’re essentially forcing all subjects to start from the same baseline. That’s clearly unrealistic for most real-world data.

In real life, the intercept and slope often go hand-in-hand for each subject.

Model

To make the model more realistic, we can model both the random intercept and the random slope together. We simply modify the random effects part of the formula to (trial_number | subject_id).

Now, we are telling the model to estimate both a random intercept (baseline performance) and a random slope (rate of improvement). This captures the full variability in how each subject learns over time!

mod_rinterraction = lmer(LookingTime ~ StandardTrialN * Event+ (1 + StandardTrialN | Id ), data= df)
summary(mod_rinterraction)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: LookingTime ~ StandardTrialN * Event + (1 + StandardTrialN |      Id)
   Data: df

REML criterion at convergence: 6948.7

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-3.10408 -0.69312 -0.01849  0.66571  2.84041 

Random effects:
 Groups   Name           Variance Std.Dev. Corr
 Id       (Intercept)    64540    254.05       
          StandardTrialN  4162     64.51   0.55
 Residual                 1631     40.39       
Number of obs: 660, groups:  Id, 20

Fixed effects:
                           Estimate Std. Error       df t value Pr(>|t|)    
(Intercept)                1286.142     56.888   19.076  22.608 3.09e-15 ***
StandardTrialN              -67.118     14.741   20.203  -4.553 0.000189 ***
EventReward                 126.844      3.630  619.185  34.940  < 2e-16 ***
StandardTrialN:EventReward   26.037      3.639  619.368   7.156 2.36e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) StndTN EvntRw
StandrdTrlN  0.545              
EventReward -0.044 -0.085       
StndrdTN:ER -0.022 -0.170  0.346

Now, let’s visualize how the model is modeling the data:

Code
is_pred = estimate_expectation(mod_rinterraction, include_random =T)

ggplot(is_pred, aes(x= StandardTrialN, y= Predicted, color= Id, shape = Event))+
    geom_point(data = df, aes(y= LookingTime, color= Id), position= position_jitter(width=0.2))+
    geom_line()+
    geom_ribbon(aes(ymin=Predicted-SE, ymax=Predicted+SE, fill = Id),color= 'transparent', alpha=0.1)+
    labs(y='Looking time', x='# trial')+
    theme_modern(base_size = 20)+
    theme(legend.position = 'none')+
    facet_wrap(~Event)

While it is not super apparent from the data, you can see that different subjects have different slopes, meaning that they do not all grow at the same rate

Summary of mixed models

Now that we’ve seen how to run mixed-effects models, it’s time to focus on interpreting the summary output. While we’ve been building models, we haven’t delved into what the summary actually tells us or which parts of it deserve our attention. Let’s fix that!

To start, we’ll use our final model and inspect its summary. This will give us a chance to break it down step by step and understand the key information it provides. Here’s how to check the summary:

summary(mod_rinterraction)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: LookingTime ~ StandardTrialN * Event + (1 + StandardTrialN |      Id)
   Data: df

REML criterion at convergence: 6948.7

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-3.10408 -0.69312 -0.01849  0.66571  2.84041 

Random effects:
 Groups   Name           Variance Std.Dev. Corr
 Id       (Intercept)    64540    254.05       
          StandardTrialN  4162     64.51   0.55
 Residual                 1631     40.39       
Number of obs: 660, groups:  Id, 20

Fixed effects:
                           Estimate Std. Error       df t value Pr(>|t|)    
(Intercept)                1286.142     56.888   19.076  22.608 3.09e-15 ***
StandardTrialN              -67.118     14.741   20.203  -4.553 0.000189 ***
EventReward                 126.844      3.630  619.185  34.940  < 2e-16 ***
StandardTrialN:EventReward   26.037      3.639  619.368   7.156 2.36e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) StndTN EvntRw
StandrdTrlN  0.545              
EventReward -0.044 -0.085       
StndrdTN:ER -0.022 -0.170  0.346

The Random effects section in the model summary shows how variability is accounted for by the random effects. The Groups column indicates the grouping factor (e.g., subject), while the Name column lists the random effects (e.g., intercept and slope). The Variance column represents the variability for each random effect—higher values indicate greater variation in how the effect behaves across groups. The Std.Dev. column is simply the standard deviation of the variance, showing the spread in the same units as the data.

The Corr column reflects the correlation between random effects, telling us whether different aspects of the data (e.g., intercepts and slopes) tend to move together. A negative correlation would suggest that higher intercepts (starting points) are associated with smaller slopes (slower learning rates), while a positive correlation would suggest the opposite.

The Residual section shows the unexplained variability after accounting for the fixed and random effects.

The key takeaway here is that random effects capture the variability in the data that can’t be explained by the fixed effects alone. If the variance for a random effect is low, it suggests the random effect isn’t adding much to the model and may be unnecessary. On the other hand, high variance indicates that the random effect is important for capturing group-level differences and improving the model’s accuracy.

Model comparison

But how can we be sure the random effects are helping our model? One of the easiest ways is to check the variance explained by the random effects. As we said, if the variance related to the random effects is too small, it probably isn’t contributing much to the model. If it’s high, it’s likely helping the model by capturing important variability in the data.

Another method is to compare the performance of different models. One of the best indices for this is the Akaike Information Criterion (AIC). AIC gives a relative measure of how well a model fits the data, while penalizing the number of parameters in the model. Lower AIC values indicate better models, as they balance goodness-of-fit with model complexity.

You can compare the AIC of different models using the following:

compare_performance(mod_lm, mod_rintercept, mod_rslope, mod_rinterraction, metrics='AIC')
# Comparison of Model Performance Indices

Name              |           Model |  AIC (weights)
----------------------------------------------------
mod_lm            |              lm | 9184.1 (<.001)
mod_rintercept    | lmerModLmerTest | 7702.4 (<.001)
mod_rslope        | lmerModLmerTest | 9178.3 (<.001)
mod_rinterraction | lmerModLmerTest | 6990.1 (>.999)

As you can see, the best model based on AIC is the one with both intercept and slope. This is a good way to check if and which random effect structure is necessary for our model.

Warning

Never decide if your random effect structure is good by just looking at p-values! P-values are not necessarily related to how well the model fits your data. Always use model comparison and fit indices like AIC to guide your decision.

Formulary

In this tutorial, we introduced linear mixed-effects models. However, these models can be far more versatile and complex than what we’ve just explored. The lme4 package allows you to specify various models to suit diverse research scenarios. While we won’t dive into every possibility, here’s a handy reference for the different random effects structures you can specify

Formula Description
(1|s) Random intercepts for unique level of the factor s.
(1|s) + (1|i) Random intercepts for each unique level of s and for each unique level of i.
(1|s/i) Random intercepts for factor s and i, where the random effects for i are nested in s. This expands to (1|s) + (1|s:i), i.e., a random intercept for each level of s, and each unique combination of the levels of s and i. Nested random effects are used in so-called multilevel models. For example, s might refer to schools, and i to classrooms within those schools.
(a|s) Random intercepts and random slopes for a, for each level of s. Correlations between the intercept and slope effects are also estimated. (Identical to (a*b|s).)
(a*b|s) Random intercepts and slopes for a, b, and the a:b interaction, for each level of s. Correlations between all the random effects are estimated.
(0+a|s) Random slopes for a for each level of s, but no random intercepts.
(a||s) Random intercepts and random slopes for a, for each level of s, but no correlations between the random effects (i.e., they are set to 0). This expands to: (0+a|s) + (1|s).
Back to top