Draft
Stats
R
Modelling
Additive models

If you have collected and pre-processed your pupil data, the long awaited moment arrived: It’s finally time to analyse your data and get your results!!!

In this tutorial we will do two types of analysis. The first one is more simple, and the second is more advanced. For some research questions, simple analyses are enough: they are intuitive and easy to understand. However, pupil data is actually very rich and complex, and more sophisticated analyses can sometimes help to really get the most out of your data and let them shine!

Note

We have a tutorial on linear models! If you are a beginner when it comes to statistical analyses, we advise you to follow that one first. Even if it is not about pupil data, it contains all the basics that won’t be touched upon here.

Before starting any type of analysis, let’s import the data and take a quick look at it.

Import data

The data used in this tutorial comes from the pre-processing tutorial of pupil dilation. If you haven’t run that tutorial yet, it’s a good idea to check it out first to ensure your data is prepared and ready for analysis: Pre-processing pupil data. In case you did not save the result of the pre-processing you can download them from here :

Now that you have the data, let’s import it along with the necessary libraries. We’ll also ensure that the Events and Subjects columns are properly set as factors (categorical for easier analysis. Here’s how:

library(tidyverse)  # Data manipulation and visualization
library(easystats)  # Easy statistical modeling

data = read.csv("..\\..\\resources\\Pupillometry\\ProcessedPupilData.csv")

# Make sure Events and Subject are factors
data$Events = as.factor(data$Events)
data$Subject = as.factor(data$Subject)

head(data)
  X   Subject TrialN Events Timebin   mean_pupil  Time        .diff1 .diff2
1 1 Subject_1      2 Square       0 -0.009760421  0.00  0.0052149396   3.30
2 2 Subject_1      2 Square      66 -0.004545481  3.30  0.0003562377   3.35
3 3 Subject_1      2 Square     133 -0.004189244  6.65 -0.0003562377   3.35
4 4 Subject_1      2 Square     200 -0.004545481 10.00  0.0003562377   3.30
5 5 Subject_1      2 Square     266 -0.004189244 13.30  0.0000000000   3.35
6 6 Subject_1      2 Square     333 -0.004189244 16.65  0.0000000000   3.35
  pupil_in_blink
1              0
2              0
3              0
4              0
5              0
6              0

For a detailed description of the data, you can have a look at the tutorial on preprocessing pupil data. The key variables to focus on here are the following:

  • mean_pupil indicates what the pupil size was at every moment in time (every 50 milliseconds, 20Hz). This is our dependent variable.

  • Time indicates the specific moment in time within each trial

  • TrialN indicates the trial number

  • Event indicates whether the trial contained a circle (followed by a reward) or a square (not followed by a reward). This variable is not numerical, but categorical. We thus set it to factor with as.factor().

  • Subject contains a different ID number for each subject. This is also a categorical variable.

Comparing means

In many paradigms, you have two or more conditions and want to test whether your dependent variable (pupil size in this case!) is significantly different across conditions. In our example paradigm, we may want to test whether, on average, pupil size while looking at the rewarding cue (the circle) is greater than pupil size while looking at the non-rewarding cue (the square). This would mean that even before the reward is presented, infants have learned that a reward will be coming and dilate their pupils in response to it! Pretty cool, uh?

If we want to test multiple groups, we can use a t-test, an ANOVA or… A linear model! Here, we’ll be using a special type of linear model, a mixed-effect model - which is infinitely better for many many reasons [add link].

Adapt the data

We want to compare the means across conditions but… We don’t have means yet! We have a much richer dataset, that contains hundreds of datapoints with milliseconds precision. For this first simple analysis, we just want one average measure of pupil dilation for each trial instead. We can compute this using the tidyverse library (that is container of multiple packages) a powerful collection of packages for wrangling and visualuzating dataframes in R.

Here, we group the data by Subject, Events, and TrialN, then summarize it within these groups by calculating the mean values.

averaged_data = data %>%
  group_by(Subject, Events, TrialN) %>%
  summarise(mean_pupil = mean(mean_pupil, na.rm = TRUE))

head(averaged_data)
# A tibble: 6 × 4
# Groups:   Subject, Events [2]
  Subject   Events TrialN mean_pupil
  <fct>     <fct>   <int>      <dbl>
1 Subject_1 Circle      3    0.548  
2 Subject_1 Circle      4   -0.00196
3 Subject_1 Circle      6    0.0199 
4 Subject_1 Circle      9   -0.119  
5 Subject_1 Square      2   -0.108  
6 Subject_1 Square      5    0.0259 

Linear mixed-effect model

With a single value for each participant, condition, and trial (averaged across time points), we are now ready to proceed with our analysis. Even if the word “Linear mixed-effect model” might sound scary, the model is actually very simple. We take our experimental conditions (Events) and check whether they affect pupil size (mean_pupil). To account for individual differences in pupil response intensity, we include participants as a random intercept.

Let’s give it a go!!!

library(lmerTest)   # Mixed-effect models library

# The actual model
model_avg = lmer(mean_pupil ~ Events + (1|Subject), data = averaged_data)

summary(model_avg) # summary of the model
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: mean_pupil ~ Events + (1 | Subject)
   Data: averaged_data

REML criterion at convergence: -2.5

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.18353 -0.55683 -0.03283  0.71236  2.45935 

Random effects:
 Groups   Name        Variance Std.Dev.
 Subject  (Intercept) 0.002844 0.05333 
 Residual             0.047692 0.21838 
Number of obs: 58, groups:  Subject, 6

Fixed effects:
             Estimate Std. Error       df t value Pr(>|t|)    
(Intercept)   0.10379    0.04605 13.18897   2.254 0.041818 *  
EventsSquare -0.21012    0.05739 51.41590  -3.661 0.000592 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr)
EventsSquar -0.623

We won’t dive into the detailed interpretation of the results here—this isn’t the right place for that. However, if you’re curious about how to interpret mixed-effects model outputs, you can check out this excellent guide: Linear mixed-effects models. It provides a great foundation. And keep an eye out for our upcoming page dedicated to mixed-effects modeling, where we’ll break everything down step by step!

The key takeaway here is that there’s a significatn difference between the Events. Specifically, the Square cue appears to result in smaller pupil dilation compared to the Circle event (which serves as the reference level for the intercept). COOL!

Let’s visualize the effect!!

Code
# Create a data grid for Events and Time
datagrid = get_datagrid(model_avg, by = c('Events'))

# Compute model-based expected values for each level of Events
pred = estimate_expectation(datagrid)

# 'pred' now contains predicted values and confidence intervals for each event condition.
# We can visualize these predictions and overlay them on the observed data.

ggplot() +
  # Observed data (jittered points to show distribution)
  geom_jitter(data = averaged_data, aes(x=Events, y=mean_pupil, color=Events), width=0.1, alpha=0.5, size = 5) +
  
  # Model-based predictions: points for Predicted values
  geom_point(data=pred, aes(x=Events, y=Predicted, fill=Events), 
             shape=21, size=10) +
  
  # Error bars for the confidence intervals
  geom_errorbar(data=pred, aes(x=Events, ymin=Predicted - SE, ymax=Predicted + SE, color=Events), 
                width=0.2, lwd=1.5) +
  
  theme_bw(base_size = 45)+
  theme(legend.position = 'none') +
  labs(title="Predicted Means vs. Observed Data",
       x="Condition",
       y="Baseline-Corrected Pupil Size")

Analysing the time course of pupil size

Although we have seen how to compare mean values of pupil size, our original data was much richer. By taking averages, we made it simpler but we also lost precious information. Usually, it is better to keep the data as rich as possible, even if that might require more complex analyses. Here we’ll show you one example of a more complex analysis: generalised additive models. Fear not though!!! As usual, we will try to break it down in small digestible bites, and you might realise it’s not actually that complicated after all.

The key aspect here is that we will stop taking averages, and analyse the time course of pupil dilation instead. We will analyse how it changes over time with precision in the order of milliseconds!! This is exciting!!!

This is something that we cannot do with linear models. For example, in this case linear models would assume that, over the course of a trial, pupil size will only increase linearly over time. The model would be something like this:

linear_model = lmer(mean_pupil ~ Events * Time + (1|Subject), data = data) 

Note that, compared to the previous model, we have made two changes: First, we have changed the data. While before we were using averages, now we use the richer data set; Second, we added time as a predictor. We are saying that mean_pupil might be changing linearly across time… But this is very silly!!! To understand how silly it is, let’s have a look at the data over time.

Code
# Let's first compute average pupil size at each time point by condition
data_avg_time = data %>%
  group_by(Events, Time) %>%
  summarise(mean_pupil = mean(mean_pupil, na.rm=TRUE))

# Now let's plot these averages over time
ggplot(data_avg_time, aes(x=Time, y=mean_pupil, color=Events)) +
  geom_line(lwd=1.5) +
  
  theme_bw(base_size = 45)+
  theme(legend.position = 'bottom',
        legend.title = element_blank()) +
  guides(color = guide_legend(override.aes = list(lwd = 20))) +

  labs(x = "Time (ms)",
       y = "Baseline-Corrected Pupil Size") 

Here’s the data averaged by condition at each time point. As you can clearly see, pupil dilation doesn’t follow a simple linear increase or decrease; the pattern is much more complex. Let’s see how poorly a simple linear model fits this intricate pattern!

Code
# Create a data grid for Events and Time
datagrid = get_datagrid(linear_model, by = c('Events','Time'))

# Estimate expectation and uncertainty (Predicted and SE)
Est = estimate_expectation(linear_model, datagrid)

# Plot predictions with confidence intervals and the observed data
ggplot() +
  # Real data line
  geom_line(data = data_avg_time, aes(x=Time, y=mean_pupil, color=Events), lwd=1.5) +
  
  # Predicted ribbons
  geom_ribbon(data = Est, aes(x=Time, ymin = Predicted - SE, ymax = Predicted + SE,
                              fill = Events), alpha = 0.2) +
  
  # Predicted lines
  geom_line(data = Est, aes(x=Time, y=Predicted, color=Events), lwd=1.8,linetype = "dashed") +
  
  theme_bw(base_size = 45)+
  theme(legend.position = 'bottom',
        legend.title = element_blank()) +
  guides(color = guide_legend(override.aes = list(lwd = 20))) +
  labs(title = "Linear Model Predictions vs. Data",
       x = "Time (ms)",
       y = "Baseline-corrected Pupil Size")

The estimates from our model don’t really resemble the actual data! To capture all those non-linear, smooth changes over time, we need a more sophisticated approach. Enter Generalized Additive Models (GAMs)—the perfect tool to save the day!

Generalized additive model

Here, we will not get into all the details of generalized additive models (from now on, GAMs). We will just show one example of how they can be used to model pupil size. To do this, we have to abandon linear models and download a new package instead, mgcv (install.packages("mgcv")). This package is similar to the one we used before for linear models but offers greater flexibility, particularly for modeling time-series data and capturing non-linear relationships.

What are GAMs

Ok, cool! GAMs sound awesome… but you might still be wondering what they actually do. Let me show you an example with some figures—that always helps make things clearer!

Code
library(patchwork)

# Parameters
amp <- 1; freq <- 1; phase <- 0; rate <- 100; dur <- 2
time <- seq(0, dur, by = 1 / rate)

# Sinusoidal wave with noise
wave <- amp * sin(2 * pi * freq * time + phase) + rnorm(length(time), mean = 0, sd = 0.2)


# Plot
one = ggplot()+
  geom_point(aes(y=wave, x= time),size=3)+
  theme_bw(base_size = 45)+
  labs(y='Data')


two = ggplot()+
  geom_point(aes(y=wave, x= time),size=3)+
  geom_smooth(aes(y=wave, x= time), method = 'lm', color='black', lwd=1.5)+
  theme_bw(base_size = 45)+
   theme(
    axis.title.y = element_blank(),
    axis.text.y = element_blank()
  )

tree= ggplot()+
  geom_point(aes(y=wave, x= time),size=3)+
  geom_smooth(aes(y=wave, x= time), method = 'gam', color='black', lwd=1.5)+
  theme_bw(base_size = 45)+
   theme(
    axis.title.y = element_blank(),
    axis.text.y = element_blank()
  )

one + two + tree

When modeling data with many fluctuations, a simple linear model often falls short. In the left plot, we see the raw data with its complex, non-linear pattern. The middle plot illustrates a linear model’s attempt to capture these fluctuations, but it oversimplifies the relationships and fails to reflect the true data structure. Finally, the right plot showcases an additive model, which adapts to the data’s variability by following its fluctuations and accurately capturing the underlying pattern. This demonstrates the strength of additive models in modeling non-linear, smooth changes.

Well….. This sounds like the same problem we have in our pupil data!!! Let’s go figure

Note

Linear models can be extended to capture fluctuations using polynomial terms, but this approach has limitations. Higher-order polynomials can overfit the data, capturing noise instead of meaningful patterns. Additive models, however, use smooth functions like splines to flexibly adapt to data fluctuations without the instability of polynomials, making them a more robust and interpretable choice.

Run our GAM

To run a GAM, the syntax is relatively similar to what we used in the linear model section.

library("mgcv")

# Additive model
additive_model = bam(mean_pupil ~ Events
                     + s(Time, by=Events, k=20)
                     + s(Time, Subject, bs='fs', m=1),
                     data=data)

Let’s break the formula down:

  • mean_pupil ~ Events: Here, I treat Condition as a main effect, just like we did before.

  • s(Time, by=Events, k=20): This is where the magic happens. By wrapping Time in s(), we are telling the model: “Don’t assume that changes in pupil size over time are linear. Instead, estimate a smooth, wiggly function.” The argument by=Events means: “Do this separately for each condition, so that each condition gets its own smooth curve over time.” Finally, k=20 controls how wiggly the curve can be (technically, how many ‘knots’ or flexibility points the smoothing function is allowed to have). In practice, we are allowing the model to capture complex, non-linear patterns of pupil size changes over time for each condition.

  • s(Time, Subject, bs='fs', m=1): Here, we go one step further and acknowledge that each participant might have their own unique shape of the time course. By using bs='fs', I am specifying a ‘factor smooth’, which means: “For each subject, estimate their own smooth function over time.” Setting m=1 is a specific parameter choice that defines how we penalize wiggliness. Essentially, this term is allowing us to capture individual differences in how pupil size changes over time, over and above the general pattern captured by the main smooth. It’s something like the random effect we have seen before in the linear mixed-effect model.

Now that we have run our first GAM, we can see how well it predicts the data!

Code
# Data grid
datagrid = get_datagrid(additive_model, length = 100, include_random = T)

# Estimate expectation and uncertainty (Predicted and SE)
Est = estimate_expectation(additive_model, datagrid, exclude=c("s(Time,Subject)"))


# Plot predictions with confidence intervals and the observed data
ggplot() +
  # Real data line
  geom_line(data = data_avg_time, aes(x=Time, y=mean_pupil, color=Events), size=1.5) +
  
  # Predicted ribbons
  geom_ribbon(data = Est, aes(x=Time, ymin = CI_low, ymax = CI_high,
                              fill = Events), alpha = 0.2) +
  
  # Predicted lines
  geom_line(data = Est, aes(x=Time, y=Predicted, color=Events), size=1.8, linetype = "dashed") +
  
  theme_bw(base_size = 45)+
  theme(legend.position = 'bottom',
        legend.title = element_blank()) +
  guides(color = guide_legend(override.aes = list(lwd = 20))) +
  labs(title = "Additive model Predictions vs. Data",
       x = "Time (ms)",
       y = "Baseline-corrected Pupil Size")

This looks so much better!!! The line fit so much better to the data!! We can also have a look at whether the effect of our experimental condition is significant:

summary(additive_model) 

Family: gaussian 
Link function: identity 

Formula:
mean_pupil ~ Events + s(Time, by = Events, k = 20) + s(Time, 
    Subject, bs = "fs", m = 1)

Parametric coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)   0.10429    0.03527   2.957  0.00311 ** 
EventsSquare -0.21085    0.00259 -81.405  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
                        edf Ref.df     F p-value    
s(Time):EventsCircle  6.481  8.016 26.87  <2e-16 ***
s(Time):EventsSquare  1.544  1.693 42.96  <2e-16 ***
s(Time,Subject)      33.083 53.000 83.73  <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.281   Deviance explained = 28.2%
fREML = -86.298  Scale est. = 0.058015  n = 34742

The fixed effects (Parametric coefficients) show a strong negative effect for EventSquare, indicating that pupil size for Square is significantly lower than for Circle. This suggests that pupil size is greater when expecting a rewarding stimulus compared to a non-rewarding one.

The smooth terms indicate whether the non-linear relationships modeled by s() explain significant variance in the data. A significant smooth term confirms that the function captures meaningful, non-linear patterns beyond random noise or simpler terms. While fixed effects are typically more important for hypothesis testing, it’s crucial to ensure the model specification captures the data’s fluctuations accurately.

You did it!!! You started from a simpler model and little by little you built a very complex Generalized Additive Model!! Amazing work!!!

Warning

This is just a very basic tutorial!!! We owe much of what we know to this paper, which goes much more in depth with GAMs (still using the mgcv package), reviewing different ways of fitting the models, handling auto-correlations and making sure that the GAMs you run are actually correct and robust. Please read the paper!!!!

Back to top