Pupil data pre-processing

Pupillometry

Pre-processing

Published

December 18, 2024

Keywords

pupil dilation, pupillometry, eye-tracking, experiments, infant research, DevStart, developmental science

Welcome to your first step into the world of pupillometry! In this tutorial, we’ll walk you through how to preprocess pupil data using a handy R package called PupillometryR. This package makes it simple to clean and even analyze your pupil data with just a few lines of R code.

To keep things straightforward, we’ll be working with a simulated dataset that you can download right here:

Download and unzip this folder. This dataset is based on the experimental design we introduced earlier in our eye-tracking series. If you’re not familiar with it or need a quick refresher, we recommend checking out the “Introduction to eye-tracking” guide before diving in.

This tutorial serves as a foundation for understanding how to preprocess pupil data. Once you’ve grasped the essentials, we encourage you to explore the full range of functions and features PupillometryR has to offer.

Note

In this and upcoming tutorials, we’ll use dplyr, a package included in the tidyverse collection. dplyr is designed for efficient manipulation of dataframes and makes it easy to perform operations using the Magrittr pipe (%>%). This pipe allows you to chain multiple operations on a dataframe in a seamless flow, avoiding intermediate steps or temporary variables.

We’ll focus on basic dplyr functions for now, including:

filter(): Used to subset rows based on specific conditions.
Example: Keep rows where data %>% filter( Event == "Circle").
mutate(): Adds or modifies columns in the dataframe.
Example: Create a new column data %>% mutate(mean_pupil_norm = mean_pupil / 100).
group_by(): Groups the dataframe by one or more columns for grouped operations.
Example: Group data by data %>% group_by(Subject, Event).
summarize(): Aggregates data within groups to compute summary statistics.
Example: Calculate the mean pupil size for each group.

Basic Syntax

The syntax begins by passing a dataframe into the pipe (%>%) and then applying one or more operations. These functions directly manipulate the dataframe without requiring it (or its columns) to be explicitly referenced in each step.

Here’s an example:

data2 = data %>%
    filter(Events == "Circle") %>%
    group_by(Subject) %>%
    summarize(mean_pupil = mean(mean_pupil, na.rm = TRUE))

This code filters rows where Events == "Circle", groups data by Subject, and calculates the mean pupil size for each subject, creating a new dataframe, data2, in a concise and readable flow. 🚀

This is just the tip of the iceberg ❄️—dplyr is an incredible tool! If you want to explore its full potential, check out this tutorial: R dplyr Tutorial | Learn with Examples

Read the data

Let’s begin by importing the necessary libraries and loading the downloaded dataframe.

library(PupillometryR)  # Library to process pupil signal
library(tidyverse)  # Library to wrangle dataframes
library(patchwork)

Great! Now, let’s locate and load all our data files. We’ll use list.files() to identify all the .csv files in our folder. Make sure to update the file path to match the location where your data is stored.

csv_files = list.files(
  path       = "..\\..\\resources\\Pupillometry\\RAW",
  pattern    = "\\.csv$",   # regex pattern to match .csv files
  full.names = TRUE         # returns the full file paths
)

csv_files is now a list containing all the .csv files we’ve identified. To better understand our dataset, let’s start by focusing on the first file, representing the first subject, and inspect its structure. This will give us a clear overview before we proceed further.

Raw_data = read.csv(csv_files[1])
head(Raw_data) # database peak

  X   Subject      time      L_P      R_P    Event TrialN
1 1 Subject_1  1.000000 3.187428 3.228510 Fixation      1
2 2 Subject_1  4.333526 3.153315 3.193957     <NA>     NA
3 3 Subject_1  7.667052 3.102050 3.142032     <NA>     NA
4 4 Subject_1 11.000578 3.163670 3.204446     <NA>     NA
5 5 Subject_1 14.334104 3.152682 3.193316     <NA>     NA
6 6 Subject_1 17.667630 3.086508 3.126289     <NA>     NA

Our dataframe consists of several easily interpretable columns. time represents elapsed time in milliseconds, Subject identifies the participant, and Event indicates when and which stimuli were presented. TrialN tracks the trial number, while L_P and R_P measure pupil dilation for the left and right eyes, respectively, in millimeters.

Let’s plot the data! Visualizing it first is always a crucial step as it provides an initial understanding of its structure and key patterns.

ggplot(Raw_data, aes(x = time, y = R_P)) +
  geom_line(aes(y = R_P, color = 'Pupil Right'), lwd = 1.2) +
  geom_line(aes(y = L_P, color = 'Pupil Left'), lwd = 1.2) +
  geom_vline(data = Raw_data |> dplyr::filter(!is.na(Event)), aes(xintercept = time, linetype = Event), lwd = 1.3) +
  
  theme_bw(base_size = 35) +
  ylim(1, 6) +
  labs(color= 'Signal', y='Pupil size')+
  scale_color_manual(
    values = c('Pupil Right' = '#4A6274', 'Pupil Left' = '#E2725A') )  +
  theme(
    legend.position = 'bottom'  ) +
  guides(
    color = guide_legend(override.aes = list(lwd = 20)),
    linetype = guide_legend(override.aes = list(lwd = 1.2))
  )

Prepare the data

Nice!! Now we have some sense of our data!! And….you’ve probably noticed two things:

So many events! That’s intentional — it’s better to have too many triggers than miss something important. When we recorded the data, we saved all possible events to ensure nothing was overlooked. But don’t worry, for our pupil dilation analysis, we only care about two key events: Circle and Square (check the paradigm intro if you need a refresher on why this is)
Single-sample events! Like in most studies, events are marked at a single time point (when the stimulus is presented). But PupilometryR needs a different structure — it expects the event value to be repeated for every row while the event is happening.

So, how do we fix this? First, let’s isolate the rows in our dataframe where the events are Circle or Square. We start by creating a list of the events we care about, then use it to filter our dataframe and keep only the rows related to those events in a new dataframe called Events

Events_to_keep = c('Circle','Square')
Events = filter(Raw_data, Event %in% Events_to_keep) # filter data
head(Events) # database peak

      X   Subject       time      L_P      R_P  Event TrialN
1   221 Subject_1   734.3757       NA       NA Circle      1
2  2552 Subject_1  8504.8248 3.596057 3.642405 Square      2
3  4883 Subject_1 16275.2739 3.543367       NA Circle      3
4  7215 Subject_1 24049.0565 3.164419 3.205205 Circle      4
5  9546 Subject_1 31819.5055 3.147494 3.188061 Square      5
6 11877 Subject_1 39589.9546 3.343493 3.386587 Circle      6

Perfect! Now onto the second point: we need to repeat the events we just selected for the entire duration we want to analyze. But what’s this duration? We want to cover the full cue presentation (2 seconds), plus an extra 0.1 seconds before the stimulus appears. Why? This pre-stimulus period will serve as our baseline, which we’ll use later in the analysis.

So, let’s define how much time to include before and after the stimulus. We’ll also set the framerate of our data (300Hz) and create a time vector that starts from the pre-stimulus period and continues in steps of 1/Hz, with a total length equal to Pre_stim + Post_stim.

# Settings to cut events
Fs = 300 # framerate
Step = 1000/Fs

Pre_stim = 100 # pre stimulus time (100ms)
Post_stim = 2000 # post stimulus time (2000ms)
Pre_stim_samples = Pre_stim/Step  # pre stimulus in samples
Post_stim_samples = Post_stim/Step  # post stimulus in samples

# Time vector based on the event duration
Time = seq(from = -Pre_stim, by=Step, length.out = Pre_stim_samples+Post_stim_samples) # time vector

Here’s where the magic happens. We loop through each event listed in our Events dataframe. Each row in Events corresponds to a specific event (like a “Circle” or “Square” cue) that occurred for a specific subject during a specific trial.

For each event, we extract 2 key details:

Event (to know if it’s a Circle or Square cue)
TrialN (to know which trial this event is part of)

Next, we identify the rows of interest in our main dataframe. First, we locate the row where the time is closest to the onset of the event. Then, we select a range of rows that fall within the Pre_stim and Post_stim window around the event.

Finally, we use these identified rows to add the event information. The Time, Event, and TrialN values are repeated across all the rows in this window, ensuring every row in the event window is properly labeled.

# Loop for each event 
for (trial in 1:nrow(Events)){

    # Extract the information
    Event = Events[trial,]$Event
    TrialN = Events[trial,]$TrialN
    
    # Event onset information
    Onset = Events[trial,]$time
    Onset_index = which.min(abs(Raw_data$time - Onset))
    
    # Find the rows to update based on pre post samples
    rows_to_update = seq(Onset_index - Pre_stim_samples,
                         Onset_index + Post_stim_samples-1)
    
    # Repeat the values of interest for all the rows
    Raw_data[rows_to_update, 'time'] = Time
    Raw_data[rows_to_update, 'Event'] = Event
    Raw_data[rows_to_update, 'TrialN'] = TrialN
}

Perfect! We’ve successfully extended the event information backward and forward based on our Pre_stim and Post_stim windows. Now, it’s time to clean things up.

Since we only care about the rows that are part of our trial of interest —and because the event information is now repeated for each row during its duration— we’ll remove all the rows that don’t belong to these event windows. This will leave us with a clean, focused dataset that only contains the data relevant to our analysis.

Trial_data = Raw_data %>% 
    filter(Event %in% Events_to_keep)

For all subjects

Great job making it this far! Fixing the data to make it usable in PupillometryR is definitely one of the trickiest parts. But… we’ve only done this for one subject so far—oops! 😅 No worries, though. Let’s automate this process by putting everything into a loop for each subject. In this loop, we’ll fix the event structure for each subject, store each subject’s processed dataframe in a list, and finally combine all these dataframes into one single dataframe for further analysis. Let’s make it happen!

# Libraries and files --------------------------------------------------------------------

library(PupillometryR)  # Library to process pupil signal
library(tidyverse)  # Library to wrangle dataframes
library(patchwork)

csv_files = list.files(
  path       = "..\\..\\resources\\Pupillometry\\RAW",
  pattern    = "\\.csv$",   # regex pattern to match .csv files
  full.names = TRUE         # returns the full file paths
)


# Event settings --------------------------------------------------------------------

Fs = 300 # framerate
Step = 1000/Fs

Pre_stim = 100 # pre stimulus time (100ms)
Post_stim = 2000 # post stimulus time (2000ms)
Pre_stim_samples = Pre_stim/Step  # pre stimulus in samples
Post_stim_samples = Post_stim/Step  # post stimulus in samples

# Time vector based on the event duration
Time = seq(from = -Pre_stim, by=Step, length.out = Pre_stim_samples+Post_stim_samples) # time vector

# Event fixes --------------------------------------------------------------------

List_of_subject_dataframes = list() # Empty list to be filled with dataframes

# Loop for each subject
for (sub in 1:length(csv_files)) {
  
  Raw_data = read.csv(csv_files[sub]) # Raw data
  Events = filter(Raw_data, Event %in% Events_to_keep) # Events
  
  
  # Loop for each event 
  for (trial in 1:nrow(Events)){
  
      # Extract the information
      Event = Events[trial,]$Event
      TrialN = Events[trial,]$TrialN
      
      # Event onset information
      Onset = Events[trial,]$time
      Onset_index = which.min(abs(Raw_data$time - Onset))
      
      # Find the rows to update based on pre post samples
      rows_to_update = seq(Onset_index - Pre_stim_samples,
                           Onset_index + Post_stim_samples-1)
      
      # Repeat the values of interest for all the rows
      Raw_data[rows_to_update, 'time'] = Time
      Raw_data[rows_to_update, 'Event'] = Event
      Raw_data[rows_to_update, 'TrialN'] = TrialN
  }
  
  
  # Filter only events of interest
  Trial_data = Raw_data %>% 
    filter(Event %in% Events_to_keep)
  
  # Add daframe to list
  List_of_subject_dataframes[[sub]] = Trial_data
}

# Combine the list of dataframes into 1 dataframe
Trial_data = bind_rows(List_of_subject_dataframes)

Now we have our dataset all fixed and organized for each subject. Let’s take a look!

Code

ggplot(Trial_data, aes(x = time, y = R_P, group = TrialN)) +
  geom_line(aes(y = R_P, color = 'Pupil Right'), lwd = 1.2) +
  geom_line(aes(y = L_P, color = 'Pupil Left'), lwd = 1.2) +
  geom_vline(aes(xintercept = 0), linetype = 'dashed', color = 'black', lwd = 1.2) +
  facet_wrap(~Subject) +
  
  ylim(1, 6) +
  scale_color_manual(values = c('Pupil Right' = '#4A6274', 'Pupil Left' = '#E2725A')) +
  
  theme_bw(base_size = 35) +
  theme(
    legend.position = 'bottom', 
    legend.title = element_blank()) +
  guides(color = guide_legend(override.aes = list(lwd = 20)))

As you can see, the data structure is now completely transformed. We’ve segmented the data into distinct time windows, with each segment starting at -0.1 seconds (-100 ms) and extending to 2 seconds (2000 ms). This new structure ensures consistency across all segments, making the data ready for further analysis.

Make PupillometryR data

Ok, now it’s time to start working with PupillometryR! 🎉

In the previous steps, we changed our event structure, and you might be wondering — why all that effort? Well, it’s because PupillometryR needs the data in this specific format to do its magic. To get started, we’ll pass our dataframe to the make_pupillometryr_data() function. If you’re already thinking, “Oh no, not another weird object type that’s hard to work with!” — don’t worry! The good news is that the main object it creates is just a regular dataframe. That means we can still interact with like we’re used to. This makes the pre-processing steps much less frustrating. Let’s get started!

PupilR_data = make_pupillometryr_data(data = Trial_data,
                                 subject = Subject,
                                 trial = TrialN,
                                 time = time,
                                 condition = Event)

Here, we’re simply using the make_pupillometryr_data() function to pass in our data and specify which columns contain the key information. This tells PupillometryR where to find the crucial details, like subject IDs, events, and pupil measurements, so it knows how to structure and process the data properly.

Tip

If you have extra columns that you want to keep in your PupillometryR data during preprocessing, you can pass them as a list using the other = c(OtherColumn1, OtherColumn2) argument. This allows you to keep these additional columns alongside your main data throughout most of the preprocessing steps.

But here’s a heads-up — not all functions can keep these extra columns every time. For example, downsampling may not retain them since it reduces the number of rows, and it’s not always clear how to summarize extra columns. So, keep that in mind as you plan your analysis!

Plot

One cool feature of the data created using make_pupillometryr_data() is that it comes with a simple, built-in plot function. This makes it super easy to visualize your data without needing to write tons of code. The plot function works by averaging the data over the group variable. So we can group over subject or condition. Here we use the group variable to focus on the condition and average over the subjects.

In this example, we’re plotting the L_P (left pupil) data, grouped by condition. The plot() function is actually just a ggplot2 wrapper, which means you can customize to a certain extent like any other ggplot. That’s why we can add elements to it, like theme_bw(), which gives the plot a cleaner, black-and-white look. Give it a go without adding anything and then learn to customize it!!

Tip

Pro tip! If you want more control over your plots, you can always use ggplot2. Remember, the Pupil data is just a regular dataframe, so you can plot it in any way you like!

plot(PupilR_data, pupil = L_P, group = 'condition', geom = 'line') +
  theme_bw(base_size = 45) +
  theme(
    legend.position = 'bottom', 
    legend.title = element_blank()) +
  guides(color = guide_legend(override.aes = list(lwd = 20)))

Note

In this tutorial, we’ll use two methods to plot our data. We’ll use the PupillometryR plot to visualize the average pupil response by condition, and we’ll also use ggplot to manually plot our data. Both approaches are valid and offer unique benefits.

The PupillometryR plot provides a quick overview by automatically averaging pupil responses across condition levels, making it ideal for high-level trend visualization. On the other hand, ggplot gives you full control to visualize specific details or customize every aspect of the plot, allowing for deeper insights and flexibility.

Pre-processing

Now that we have our pupillometry data in the required format we can actually start the pre-processing!!

Regress

The first step is to regress L_P against R_P (and vice versa) using a simple linear regression. This corrects small inconsistencies in pupil data caused by noise. The regression is done separately for each participant, trial, and time point, ensuring smoother and more consistent pupil dilation measurements.

Regressed_data = regress_data(data = PupilR_data,
                                pupil1 = L_P,
                                pupil2 = R_P)

Error in `mutate()`:
ℹ In argument: `pupil1newkk = .predict_right(L_P, R_P)`.
ℹ In group 1: `Subject = "Subject_1"`, `TrialN = 1`, `Event = "Circle"`.
Caused by error in `lm.fit()`:
! 0 (non-NA) cases

Pwa pwa pwaaaaa…!!🤦‍♂️ We got an error!

What’s it saying? It’s telling us that one of the trials is completely full of NAs, and since there’s no data to work with, the function fails. This happens a lot when testing infants — they don’t always do what we expect, like watching the screen. Instead, they move around or look away.

We’ll deal with missing data properly later, but for now, we need a quick fix. What can we do? We can simply drop any trials where both pupils (L_P and R_P) are entirely NA. This way, we avoid errors and keep the analysis moving.

So let’s filter our data and then redo the last two steps (make PupilR_data and then regress data)

# Filter the trial data
Trial_data = Trial_data %>%
    group_by(Subject, TrialN) %>%  # group by Subject and TrialN
    filter(!all(is.na(L_P) & is.na(R_P))) %>% # filter out if both R_P and L_P are all NA
    ungroup()  # Remove grouping

# Make pupilloemtryR data
PupilR_data = make_pupillometryr_data(data = Trial_data,
                                 subject = Subject,
                                 trial = TrialN,
                                 time = time,
                                 condition = Event)
# Regress data
Regressed_data = regress_data(data = PupilR_data,
                               pupil1 = L_P,
                                pupil2 = R_P)

And now everything worked!! Perfect!

Mean pupil

As the next steps we will average the two pupil signals. This will create a new variable called mean_pupil

Mean_data = calculate_mean_pupil_size(data = Regressed_data, 
                                       pupil1 = L_P, 
                                       pupil2 = R_P)

plot(Mean_data, pupil = mean_pupil, group = 'condition', geom = 'line')+
  theme_bw(base_size = 45) +
  theme(
    legend.position = 'bottom', 
    legend.title = element_blank()) +
  guides(color = guide_legend(override.aes = list(lwd = 20)))

Lowpass

Now that we have a single pupil signal, we can move on to filtering it. The goal here is to remove fast noise and fluctuations that aren’t relevant to our analysis. Why? Because we know that pupil dilation is a slow physiological signal, and those rapid changes are likely just noise from blinks, eye movements, or measurement errors. By filtering out these fast fluctuations, we can focus on the meaningful changes in pupil dilation that matter for our analysis.

Filtered_data = filter_data(data = Mean_data,
                             pupil = mean_pupil,
                             filter = 'median',
                             degree = 11)
plot(Filtered_data, pupil = mean_pupil, group = 'condition', geom = 'line')+
  theme_bw(base_size = 45) +
  theme(
    legend.position = 'bottom', 
    legend.title = element_blank()) +
  guides(color = guide_legend(override.aes = list(lwd = 20)))

There are different ways to filter the data in PupillometryR we suggest you check the actual package website and make decision based on your data (filter_data). Here we use a median filter based on a 11 sample window.

Trial Rejection

Now that our data is smaller and smoother, it’s a good time to take a look at it. It doesn’t make sense to keep trials that are mostly missing values, nor does it make sense to keep participants with very few good trials.

While you might already have info on trial counts and participant performance from other sources (like video coding), PupillometryR has a super handy function to check this directly. This way, you can quickly see how many valid trials each participant has and decide which ones to keep or drop.

Missing_data = calculate_missing_data(Filtered_data,
                                       mean_pupil)
head(Missing_data, n=20)

# A tibble: 20 × 3
   Subject   TrialN Missing
   <chr>      <int>   <dbl>
 1 Subject_1      2  0.0603
 2 Subject_1      3  0.125 
 3 Subject_1      4  0.0746
 4 Subject_1      5  0.106 
 5 Subject_1      6  0.0746
 6 Subject_1      7  0.0635
 7 Subject_1      8  0.0587
 8 Subject_1      9  0.0619
 9 Subject_1     10  0.130 
10 Subject_2      1  0.143 
11 Subject_2      2  0.121 
12 Subject_2      3  0.0619
13 Subject_2      4  0     
14 Subject_2      5  0.132 
15 Subject_2      6  0.137 
16 Subject_2      7  0.0841
17 Subject_2      8  0.0968
18 Subject_2      9  0.0841
19 Subject_2     10  0.0508
20 Subject_3      1  0.0587

This gives us a new dataframe that shows the amount of missing data for each subject and each trial. While we could manually decide which trials and subjects to keep or remove, PupillometryR makes it easier with the clean_missing_data() function.

This function lets you set two % thresholds — one for trials and one for subjects. Here, we’ll set it to reject trials with more than 25% missing data (keep at least 75% of the data) and reject subjects with more than 25% missing data. This way, we ensure our analysis is based on clean, high-quality data.

Clean_data = clean_missing_data(Filtered_data,
                                 pupil = mean_pupil,
                                 trial_threshold = .75,
                                 subject_trial_threshold = .75)

Removing trials with a proportion missing > 0.75 
 ...removed 3 trials

Removing subjects with a proportion of missing trials > 0.75 
 ...removed 0 subjects

See?! PupillometryR shows us exactly how many trials and subjects are being excluded from our dataframe based on our thresholds. Cool!

Warning

Note that this function calculates the percentage of missing trials based only on the trials present in the dataframe. For example, if a participant only completed one trial (and watched it perfectly) before the session had to stop, the percentage would be calculated on that single trial, and the participant wouldn’t be rejected.

If you have more complex conditions for excluding participants (e.g., based on total expected trials or additional criteria), you’ll need to handle this manually to ensure subjects are dropped appropriately.

Fill the signal

Now our data is clean, but… while the average signal for each condition looks smooth (as seen in our plots), the data for each individual participant is still noisy. We can still spot blinks and missing data in the signal.

To handle this, we’ll use interpolation to fill in the missing points. Interpolation “connects the dots” between gaps, creating a more continuous and cleaner signal. This step is crucial because large chunks of missing data can distort our analysis, and interpolation allows us to retain more usable data from each participant.

ggplot(Clean_data, aes(x = time, y = mean_pupil, group = TrialN, color= Event))+
  geom_line( lwd =1.2)+
  geom_vline(aes(xintercept = 0), linetype= 'dashed', color = 'black', lwd =1.2)+

  facet_wrap(~Subject)+
  ylim(1,6)+
  
  theme_bw(base_size = 45) +
  theme(
    legend.position = 'bottom', 
    legend.title = element_blank()) +
  labs(y = 'Pupil Size')+
  guides(color = guide_legend(override.aes = list(lwd = 20)))

So to remove these missing values we can interpolate our data. Interpolating is easy with PupillometryR we can simply:

Int_data = interpolate_data(data = Clean_data,
                             pupil = mean_pupil,
                             type = 'linear')

ggplot(Int_data, aes(x = time, y = mean_pupil, group = TrialN, color = Event))+
  geom_line(lwd =1.2)+
  geom_vline(aes(xintercept = 0), linetype= 'dashed', color = 'black', lwd =1.2)+

  facet_wrap(~Subject)+
  ylim(1,6)+
  
  theme_bw(base_size = 45) +
  theme(
    legend.position = 'bottom', 
    legend.title = element_blank()) +
  labs(y = 'Pupil Size')+
  guides(color = guide_legend(override.aes = list(lwd = 20)))

Done!! Well, you’ve probably noticed something strange… When there’s a blink, the pupil signal can rapidly decrease until it’s completely missing. Right now, this drop gets interpolated, and the result is a weird, unrealistic curve where the signal dips sharply and then smoothly recovers. This makes our data look horrible! 😩

Let’s fix it!

To do this, we’ll use PupillometryR’s blink detection functions. There are two main ways to detect blinks:

Based on size — detects pupil size.
Based on velocity — detects rapid changes in pupil size (which happens during blinks).

Here, we’ll use detection by velocity. We set a velocity threshold to detect when the pupil size changes too quickly. To ensure we capture the full blink, we use extend_forward and extend_back to expand the blink window, including the fast decrease in pupil size. The key idea is to make the entire blink period NA, not just the moment the pupil disappears. This prevents interpolation from creating unrealistic artifacts. When we interpolate, the process skips over the entire blink period, resulting in a cleaner, more natural signal.

Blink_data = detect_blinks_by_velocity(
    Clean_data,
    mean_pupil,
    threshold = 0.1,
    extend_forward = 70,
    extend_back = 70)

ggplot(Blink_data, aes(x = time, y = mean_pupil, group = TrialN, color=Event))+
  geom_line(lwd =1.2)+
  geom_vline(aes(xintercept = 0), linetype= 'dashed', color = 'black', lwd =1.2)+

  facet_wrap(~Subject)+
  ylim(1,6)+
  
  theme_bw(base_size = 45) +
  theme(
    legend.position = 'bottom', 
    legend.title = element_blank()) +
  labs(y = 'Pupil Size')+
  guides(color = guide_legend(override.aes = list(lwd = 20)))

See !! now the rapid shrinking disappeared and we can now interpolate

Int_data = interpolate_data(data = Blink_data,
                             pupil = mean_pupil,
                             type = 'linear')

ggplot(Int_data, aes(x = time, y = mean_pupil, group = TrialN, color=Event))+
  geom_line(lwd =1.2)+
  geom_vline(aes(xintercept = 0), linetype= 'dashed', color = 'black', lwd =1.2)+

  facet_wrap(~Subject)+
  ylim(1,6)+
  
  theme_bw(base_size = 45) +
  theme(
    legend.position = 'bottom', 
    legend.title = element_blank()) +
  labs(y = 'Pupil Size')+
  guides(color = guide_legend(override.aes = list(lwd = 20)))

Look how beautiful our signal is now!! 😍 Good job!!!

Caution

You won’t always run into blink issues like this. Downsampling and filtering usually handle rapid changes during earlier preprocessing steps. Whether this happens can depend on the tracker, sampling rate, or even the population you’re testing. In this simulated data, we exaggerated the blink effects on purpose to show you how to spot and fix them! Thus, blink detection may not always be necessary. The best approach is to check your data before deciding. And how do you check it? Plotting! Plotting your signal is the best way to see if blinks are causing rapid drops or if you’re just dealing with missing data. Let the data guide your decisions.

Downsample

As mentioned before, Pupil dilation is a slow signal, so 20Hz is enough — no need for 300Hz. Downsampling reduces file size, speeds up processing, and even naturally smooths the signal, all while preserving the key information we need for analysis. To downsample to 20Hz, we’ll set the timebin size to 50 ms (since 1000/20 = 0.05 seconds = 50 ms) and calculate the median for each time bin.

NewHz = 20
timebinSize = 1000/NewHz

Downsampled_data = downsample_time_data(data = Int_data,
                              pupil = mean_pupil,
                              timebin_size = timebinSize,
                              option = 'median')
plot(Downsampled_data, pupil = mean_pupil, group = 'condition', geom = 'line') +
  theme_bw(base_size = 45) +
  theme(
    legend.position = 'bottom', 
    legend.title = element_blank()) +
  guides(color = guide_legend(override.aes = list(lwd = 20)))

Baseline Correction

Good job getting this far!! We’re now at the final step of our pre-processing: baseline correction.

Baseline correction helps remove variability between trials and participants, like differences in baseline pupil size caused by individual differences, fatigue, or random fluctuations. By doing this, we can focus only on the variability caused by our paradigm. This step ensures that any changes we see in pupil size are truly driven by the experimental conditions, not irrelevant noise. To further adjust the data, we’ll subtract the calculated baseline, ensuring the values start at 0 instead of -100. Finally, to make the next steps of analysis easier, we’ll select only the columns of interest, dropping any irrelevant ones.

Let’s get it done!

Base_data = baseline_data(data = Downsampled_data,
                           pupil = mean_pupil,
                           start = -100,
                           stop = 0)

# Remove the baseline
Final_data = subset_data(Base_data, start = 0) %>% 
  select(Subject, Event, TrialN, mean_pupil, time)

Let’s plot it to see what baseline correction and removal are actually doing!! We will plot both the average signal using the plot function (with some addition information about color and theme) and using ggplot to plot the data for each subject separately.

One = plot(Final_data, pupil = mean_pupil, group = 'condition', geom = 'line')+
  theme_bw(base_size = 45) +
  theme(legend.position = 'none')


Two = ggplot(Final_data, aes(x = time, y = mean_pupil, group = TrialN, color = Event))+
  geom_line(lwd =1.2)+
  geom_vline(aes(xintercept = 0), linetype= 'dashed', color = 'black', lwd =1.2)+

  facet_wrap(~Subject)+

  theme_bw(base_size = 45) +
  theme(
    legend.position = 'bottom', 
    legend.title = element_blank()) +
  labs(y = 'Pupil Size')+
  guides(color = guide_legend(override.aes = list(lwd = 20))) 

# Using patchwork to put the plot together
One / Two

Save and analysis

This tutorial will not cover the analysis of pupil dilation. We’ll stop here since, after baseline correction, the data is ready to be explored and analyzed. From this point on, we’ll shift from pre-processing to analysis, so it’s a good idea to save the data as a simple .csv file for easy access and future use.

write.csv(Final_data, "..\\..\\resources\\Pupillometry\\Processed\\Processed_PupilData.csv")

There are multiple ways to analyze pupil data, and we’ll show you some of our favorite methods in a dedicated tutorial: Analyze Pupil Dilation.

Cite PupillometryR

If you decide to use PupillometryR in your analysis, don’t forget to cite it! Proper citation acknowledges the authors’ work and supports the development of such valuable tools.

Code

citation("PupillometryR")

To cite PupillometryR in publications use:

  Forbes, S. H. (2020). PupillometryR: An R package for preparing and
  analysing pupillometry data. Journal of Open Source Software, 5(50),
  2285. https://doi.org/10.21105/joss.02285

A BibTeX entry for LaTeX users is

  @Article{,
    title = {PupillometryR: An R package for preparing and analysing pupillometry data},
    author = {Samuel H. Forbes},
    journal = {Journal of Open Source Software},
    year = {2020},
    volume = {5},
    number = {50},
    pages = {2285},
    doi = {10.21105/joss.02285},
  }

All code

Here below we report the whole code we went trough this tutorial as an unique script to make it easier for you to copy and explore it in it’s entirety.

# Libraries and files --------------------------------------------------------------------

library(PupillometryR)  # Library to process pupil signal
library(tidyverse)  # Library to wrangle dataframes
library(patchwork)

csv_files = list.files(
  path       = "..\\..\\resources\\Pupillometry\\RAW",
  pattern    = "\\.csv$",   # regex pattern to match .csv files
  full.names = TRUE         # returns the full file paths
)


# Prepare data --------------------------------------------------------------------

## Event settings --------------------------------------------------------------------

# Settings to cut events
Fs = 300 # framerate
Step = 1000/Fs

Pre_stim = 100 # pre stimulus time (100ms)
Post_stim = 2000 # post stimulus time (2000ms)
Pre_stim_samples = Pre_stim/Step  # pre stimulus in samples
Post_stim_samples = Post_stim/Step  # post stimulus in samples

# Time vector based on the event duration
Time = seq(from = -Pre_stim, by=Step, length.out = (Pre_stim+Post_stim)/Step) # time vector


## Event fixes --------------------------------------------------------------------

List_of_subject_dataframes = list() # Empty list to be filled with dataframes

# Loop for each subject
for (sub in 1:length(csv_files)) {
  
  Raw_data = read.csv(csv_files[sub]) # Raw data
  Events = filter(Raw_data, Event %in% Events_to_keep) # Events

  
  # Loop for each event 
  for (trial in 1:nrow(Events)){
  
      # Extract the information
      Event = Events[trial,]$Event
      TrialN = Events[trial,]$TrialN
      
      # Event onset information
      Onset = Events[trial,]$time
      Onset_index = which.min(abs(Raw_data$time - Onset))
      
      # Find the rows to update based on pre post samples
      rows_to_update = seq(Onset_index - Pre_stim_samples,
                           Onset_index + Post_stim_samples-1)
      
      # Repeat the values of interest for all the rows
      Raw_data[rows_to_update, 'time'] = Time
      Raw_data[rows_to_update, 'Event'] = Event
      Raw_data[rows_to_update, 'TrialN'] = TrialN
  }
  
  
  # Filter only events of interest
  Trial_data = Raw_data %>% 
    filter(Event %in% Events_to_keep)
  
  # Add daframe to list
  List_of_subject_dataframes[[sub]] = Trial_data
}

# Combine the list of dataframes into 1 dataframe
Trial_data = bind_rows(List_of_subject_dataframes)


### Plot Raw Data -----------------------------------------------------------------

ggplot(Raw_data, aes(x = time, y = R_P)) +
  geom_line(aes(y = R_P, color = 'Pupil Right'), lwd = 1.2) +
  geom_line(aes(y = L_P, color = 'Pupil Left'), lwd = 1.2) +
  geom_vline(data = Raw_data |> dplyr::filter(!is.na(Event)), aes(xintercept = time, linetype = Event), lwd = 1.3) +
  
  theme_bw(base_size = 45) +
  ylim(1, 6) +
  labs(color= 'Signal', y='Pupil size')+
  scale_color_manual(
    values = c('Pupil Right' = '#4A6274', 'Pupil Left' = '#E2725A') )  +
  theme(
    legend.position = 'bottom'  ) +
  guides(
    color = guide_legend(override.aes = list(lwd = 20)),
    linetype = guide_legend(override.aes = list(lwd = 1.2))
  )



# Pre-processing -----------------------------------------------------------------

## Filter Out Trials with all NA -----------------------------------------------------------------

Trial_data = Trial_data %>%
  group_by(Subject, TrialN) %>%
  filter(!all(is.na(L_P) & is.na(R_P))) %>%
  ungroup()


## Make PupillometryR Data -----------------------------------------------------------------
PupilR_data = make_pupillometryr_data(data = Trial_data,
                                      subject = Subject,
                                      trial = TrialN,
                                      time = time,
                                      condition = Event)

### Plot ------------------------------------------------------------------
plot(PupilR_data, pupil = L_P, group = 'condition', geom = 'line') +
  theme_bw(base_size = 45) +
  theme(
    legend.position = 'bottom', 
    legend.title = element_blank()) +
  guides(color = guide_legend(override.aes = list(lwd = 20))) 


## Regress Data -----------------------------------------------------------------

Regressed_data = regress_data(data = PupilR_data,
                               pupil1 = L_P,
                               pupil2 = R_P)


## Calculate Mean Pupil -----------------------------------------------------------------

Mean_data = calculate_mean_pupil_size(data = Regressed_data, 
                                       pupil1 = L_P, 
                                       pupil2 = R_P)

### Plot --------------------------------------------------------------------

plot(Mean_data, pupil = mean_pupil, group = 'condition', geom = 'line')+
  theme_bw(base_size = 45) +
  theme(
    legend.position = 'bottom', 
    legend.title = element_blank()) +
  guides(color = guide_legend(override.aes = list(lwd = 20))) 


## Lowpass Filter -----------------------------------------------------------------

filtered_data = filter_data(data = Mean_data,
                             pupil = mean_pupil,
                             filter = 'median',
                             degree = 11)

### Plot --------------------------------------------------------------------

plot(filtered_data, pupil = mean_pupil, group = 'condition', geom = 'line')+
  theme_bw(base_size = 45) +
  theme(
    legend.position = 'bottom', 
    legend.title = element_blank()) +
  guides(color = guide_legend(override.aes = list(lwd = 20))) 


## Downsample -----------------------------------------------------------------

NewHz = 20

timebinSize = 1 / NewHz

Downsampled_data = downsample_time_data(data = filtered_data,
                                         pupil = mean_pupil,
                                         timebin_size = timebinSize,
                                         option = 'median')

# Plot --------------------------------------------------------------------

plot(Downsampled_data, pupil = mean_pupil, group = 'condition', geom = 'line') +
  theme_bw(base_size = 45) +
  theme(
    legend.position = 'bottom', 
    legend.title = element_blank()) +
  guides(color = guide_legend(override.aes = list(lwd = 20))) 


## Calculate Missing Data -----------------------------------------------------------------

Missing_data = calculate_missing_data(Downsampled_data, mean_pupil)


## Clean Missing Data -----------------------------------------------------------------

Clean_data = clean_missing_data(Downsampled_data,
                                 pupil = mean_pupil,
                                 trial_threshold = .75,
                                 subject_trial_threshold = .75)

### Plot --------------------------------------------------------------------

ggplot(Clean_data, aes(x = Time, y = mean_pupil, group = TrialN, color= Event))+
  geom_line( lwd =1.2)+
  geom_vline(aes(xintercept = 0), linetype= 'dashed', color = 'black', lwd =1.2)+
  
  facet_wrap(~Subject)+
  ylim(1,6)+
  
  theme_bw(base_size = 45) +
  theme(
    legend.position = 'bottom', 
    legend.title = element_blank()) +
  guides(color = guide_legend(override.aes = list(lwd = 20))) 


## Detect Blinks -----------------------------------------------------------------

Blink_data = detect_blinks_by_velocity(
  Clean_data,
  mean_pupil,
  threshold = 0.1,
  extend_forward = 50,
  extend_back = 50)

### Plot --------------------------------------------------------------------

ggplot(Blink_data, aes(x = time, y = mean_pupil, group = TrialN, color=Event))+
  geom_line(lwd =1.2)+
  geom_vline(aes(xintercept = 0), linetype= 'dashed', color = 'black', lwd =1.2)+
  
  facet_wrap(~Subject)+
  ylim(1,6)+
  
  theme_bw(base_size = 45) +
  theme(
    legend.position = 'bottom', 
    legend.title = element_blank()) +
  guides(color = guide_legend(override.aes = list(lwd = 20))) 


## Interpolate Data -----------------------------------------------------------------

Int_data = interpolate_data(data = Clean_data,
                             pupil = mean_pupil,
                             type = 'linear')

### Plot --------------------------------------------------------------------

ggplot(Int_data, aes(x = Time, y = mean_pupil, group = TrialN, color = Event))+
  geom_line(lwd =1.2)+
  geom_vline(aes(xintercept = 0), linetype= 'dashed', color = 'black', lwd =1.2)+

  facet_wrap(~Subject)+
  ylim(1,6)+
  
  theme_bw(base_size = 45) +
  theme(
    legend.position = 'bottom', 
    legend.title = element_blank()) +
  guides(color = guide_legend(override.aes = list(lwd = 20)))


# Baseline correction -----------------------------------------------------

Base_data = baseline_data(data = Int_data,
                           pupil = mean_pupil,
                           start = -100,
                           stop = 0)

# Remove the baseline
Final_data = subset_data(Base_data, start = 0)


### Final plot --------------------------------------------------------------

One = plot(Final_data, pupil = mean_pupil, group = 'condition')+
  theme_bw(base_size = 45) +
  theme(legend.position = 'none')


Two = ggplot(Final_data, aes(x = time, y = mean_pupil, group = TrialN, color = Event))+
  geom_line(lwd =1.2)+
  geom_vline(aes(xintercept = 0), linetype= 'dashed', color = 'black', lwd =1.2)+
  
  facet_wrap(~Subject)+
  
  theme_bw(base_size = 45) +
  theme(
    legend.position = 'bottom', 
    legend.title = element_blank()) +
  guides(color = guide_legend(override.aes = list(lwd = 20))) 

One / Two



# Save data ---------------------------------------------------------------

write.csv(Final_data, "..\\..\\resources\\Pupillometry\\Processed\\Peocessed_PupilData.csv")

--- title: "Pupil data pre-processing " date: "12/18/2024" author-meta: "Tommaso Ghilardi" description-meta: "Learn the fundamentals of pupil dilation processing using PupillometryR" keywords: "pupil dilation, pupillometry, eye-tracking, experiments, infant research, DevStart, developmental science" categories: - Pupillometry - R - Pre-processing lightbox: true --- Welcome to your first step into the world of pupillometry! In this tutorial, we’ll walk you through how to preprocess pupil data using a handy R package called [PupillometryR](http://samforbes.me/PupillometryR/index.html). This package makes it simple to clean and even analyze your pupil data with just a few lines of R code. To keep things straightforward, we’ll be working with a simulated dataset that you can download right here: {{< downloadthis ../../resources/Pupillometry/RAW/PupilData.zip label="PupilData.zip" dname= "PupilData" type="secondary" >}} Download and unzip this folder. This dataset is based on the experimental design we introduced earlier in our eye-tracking series. If you’re not familiar with it or need a quick refresher, we recommend checking out the "[Introduction to eye-tracking](Intro_eyetracking.qmd)" guide before diving in. This tutorial serves as a foundation for understanding how to preprocess pupil data. Once you've grasped the essentials, we encourage you to explore the full range of functions and features [PupillometryR](http://samforbes.me/PupillometryR) has to offer. ::: callout-note In this and upcoming tutorials, we’ll use **dplyr**, a package included in the **tidyverse** collection. **dplyr** is designed for efficient manipulation of dataframes and makes it easy to perform operations using the **Magrittr pipe (`%>%`)**. This pipe allows you to chain multiple operations on a dataframe in a seamless flow, avoiding intermediate steps or temporary variables. We’ll focus on basic **dplyr** functions for now, including: - **`filter()`**: Used to subset rows based on specific conditions.\ *Example*: Keep rows where `data %>% filter( Event == "Circle")`. - **`mutate()`**: Adds or modifies columns in the dataframe.\ *Example*: Create a new column `data %>% mutate(mean_pupil_norm = mean_pupil / 100)`. - **`group_by()`**: Groups the dataframe by one or more columns for grouped operations.\ *Example*: Group data by `data %>% group_by(Subject, Event)`. - **`summarize()`**: Aggregates data within groups to compute summary statistics.\ *Example*: Calculate the mean pupil size for each group. **Basic Syntax** The syntax begins by passing a dataframe into the pipe (`%>%`) and then applying one or more operations. These functions directly manipulate the dataframe without requiring it (or its columns) to be explicitly referenced in each step. Here’s an example: ```{r} #| eval: false #| warning: false data2 = data %>% filter(Events == "Circle") %>% group_by(Subject) %>% summarize(mean_pupil = mean(mean_pupil, na.rm = TRUE)) ``` This code filters rows where `Events == "Circle"`, groups data by `Subject`, and calculates the mean pupil size for each subject, creating a new dataframe, `data2`, in a concise and readable flow. 🚀 This is just the tip of the iceberg ❄️—**dplyr** is an incredible tool! If you want to explore its full potential, check out this tutorial: [R dplyr Tutorial \| Learn with Examples](https://sparkbyexamples.com/r-programming/r-dplyr-tutorial-learn-with-examples/) ::: ## Read the data Let's begin by importing the necessary libraries and loading the downloaded dataframe. ```{r SettingAndLibraries} #| message: false #| warning: false library(PupillometryR) # Library to process pupil signal library(tidyverse) # Library to wrangle dataframes library(patchwork) ``` Great! Now, let’s locate and load all our data files. We’ll use `list.files()` to identify all the `.csv` files in our folder. Make sure to update the file path to match the location where your data is stored. ```{r FindFiles} #| message: false #| warning: false csv_files = list.files( path = "..\\..\\resources\\Pupillometry\\RAW", pattern = "\\.csv$", # regex pattern to match .csv files full.names = TRUE # returns the full file paths ) ``` `csv_files` is now a list containing all the `.csv` files we’ve identified. To better understand our dataset, let’s start by focusing on the first file, representing the first subject, and inspect its structure. This will give us a clear overview before we proceed further. ```{r ReadData} Raw_data = read.csv(csv_files[1]) head(Raw_data) # database peak ``` Our dataframe consists of several easily interpretable columns. **time** represents elapsed time in milliseconds, **Subject** identifies the participant, and **Event** indicates when and which stimuli were presented. **TrialN** tracks the trial number, while **L_P** and **R_P** measure pupil dilation for the left and right eyes, respectively, in millimeters. Let's plot the data! Visualizing it first is always a crucial step as it provides an initial understanding of its structure and key patterns. ```{r PlotRaw} #| warnings: false #| fig-width: 20 #| fig-height: 14 ggplot(Raw_data, aes(x = time, y = R_P)) + geom_line(aes(y = R_P, color = 'Pupil Right'), lwd = 1.2) + geom_line(aes(y = L_P, color = 'Pupil Left'), lwd = 1.2) + geom_vline(data = Raw_data |> dplyr::filter(!is.na(Event)), aes(xintercept = time, linetype = Event), lwd = 1.3) + theme_bw(base_size = 35) + ylim(1, 6) + labs(color= 'Signal', y='Pupil size')+ scale_color_manual( values = c('Pupil Right' = '#4A6274', 'Pupil Left' = '#E2725A') ) + theme( legend.position = 'bottom' ) + guides( color = guide_legend(override.aes = list(lwd = 20)), linetype = guide_legend(override.aes = list(lwd = 1.2)) ) ``` ## Prepare the data Nice!! Now we have some sense of our data!! And....you’ve probably noticed two things: 1. **So many events!** That’s intentional — it’s better to have too many triggers than miss something important. When we recorded the data, we saved all possible events to ensure nothing was overlooked. But don’t worry, for our pupil dilation analysis, we only care about two key events: **Circle** and **Square** (check the [paradigm intro](CreateAnEyetrackingExperiment.qmd) if you need a refresher on why this is) 2. **Single-sample events!** Like in most studies, events are marked at a single time point (when the stimulus is presented). But PupilometryR needs a different structure — it expects the event value to be repeated for every row while the event is happening. So, how do we fix this? First, let’s isolate the rows in our dataframe where the events are **Circle** or **Square**. We start by creating a list of the events we care about, then use it to filter our dataframe and keep only the rows related to those events in a new dataframe called Events ```{r FindEvents} Events_to_keep = c('Circle','Square') Events = filter(Raw_data, Event %in% Events_to_keep) # filter data head(Events) # database peak ``` Perfect! Now onto the second point: we need to repeat the events we just selected for the entire duration we want to analyze. But what’s this duration? We want to cover the full cue presentation (2 seconds), plus an extra 0.1 seconds before the stimulus appears. Why? This pre-stimulus period will serve as our baseline, which we’ll use later in the analysis. So, let’s define how much time to include before and after the stimulus. We’ll also set the framerate of our data (**300Hz**) and create a time vector that starts from the pre-stimulus period and continues in steps of 1/Hz, with a total length equal to Pre_stim + Post_stim. ```{r EventsSettings} # Settings to cut events Fs = 300 # framerate Step = 1000/Fs Pre_stim = 100 # pre stimulus time (100ms) Post_stim = 2000 # post stimulus time (2000ms) Pre_stim_samples = Pre_stim/Step # pre stimulus in samples Post_stim_samples = Post_stim/Step # post stimulus in samples # Time vector based on the event duration Time = seq(from = -Pre_stim, by=Step, length.out = Pre_stim_samples+Post_stim_samples) # time vector ``` Here’s where the magic happens. We loop through each event listed in our **Events** dataframe. Each row in Events corresponds to a specific event (like a "Circle" or "Square" cue) that occurred for a specific subject during a specific trial. For each event, we extract 2 key details: - **Event** (to know if it's a Circle or Square cue) - **TrialN** (to know which trial this event is part of) Next, we identify the rows of interest in our main dataframe. First, we locate the row where the time is closest to the onset of the event. Then, we select a range of rows that fall within the **Pre_stim** and **Post_stim** window around the event. Finally, we use these identified rows to add the event information. The Time, Event, and TrialN values are repeated across all the rows in this window, ensuring every row in the event window is properly labeled. ```{r FixingEvents} # Loop for each event for (trial in 1:nrow(Events)){ # Extract the information Event = Events[trial,]$Event TrialN = Events[trial,]$TrialN # Event onset information Onset = Events[trial,]$time Onset_index = which.min(abs(Raw_data$time - Onset)) # Find the rows to update based on pre post samples rows_to_update = seq(Onset_index - Pre_stim_samples, Onset_index + Post_stim_samples-1) # Repeat the values of interest for all the rows Raw_data[rows_to_update, 'time'] = Time Raw_data[rows_to_update, 'Event'] = Event Raw_data[rows_to_update, 'TrialN'] = TrialN } ``` Perfect! We’ve successfully extended the event information backward and forward based on our Pre_stim and Post_stim windows. Now, it’s time to clean things up. Since we only care about the rows that are part of our trial of interest —and because the event information is now repeated for each row during its duration— we’ll remove all the rows that don’t belong to these event windows. This will leave us with a clean, focused dataset that only contains the data relevant to our analysis. ```{r FilterEvents} #| message: false #| warning: false Trial_data = Raw_data %>% filter(Event %in% Events_to_keep) ``` ### For all subjects Great job making it this far! Fixing the data to make it usable in **PupillometryR** is definitely one of the trickiest parts. But... we’ve only done this for **one subject** so far—oops! 😅 No worries, though. Let’s automate this process by putting everything into a **loop** for each subject. In this loop, we’ll fix the event structure for each subject, store each subject’s processed dataframe in a list, and finally combine all these dataframes into one single dataframe for further analysis. Let’s make it happen! ```{r FixingEventsSubjectLoop} #| message: false #| warning: false # Libraries and files -------------------------------------------------------------------- library(PupillometryR) # Library to process pupil signal library(tidyverse) # Library to wrangle dataframes library(patchwork) csv_files = list.files( path = "..\\..\\resources\\Pupillometry\\RAW", pattern = "\\.csv$", # regex pattern to match .csv files full.names = TRUE # returns the full file paths ) # Event settings -------------------------------------------------------------------- Fs = 300 # framerate Step = 1000/Fs Pre_stim = 100 # pre stimulus time (100ms) Post_stim = 2000 # post stimulus time (2000ms) Pre_stim_samples = Pre_stim/Step # pre stimulus in samples Post_stim_samples = Post_stim/Step # post stimulus in samples # Time vector based on the event duration Time = seq(from = -Pre_stim, by=Step, length.out = Pre_stim_samples+Post_stim_samples) # time vector # Event fixes -------------------------------------------------------------------- List_of_subject_dataframes = list() # Empty list to be filled with dataframes # Loop for each subject for (sub in 1:length(csv_files)) { Raw_data = read.csv(csv_files[sub]) # Raw data Events = filter(Raw_data, Event %in% Events_to_keep) # Events # Loop for each event for (trial in 1:nrow(Events)){ # Extract the information Event = Events[trial,]$Event TrialN = Events[trial,]$TrialN # Event onset information Onset = Events[trial,]$time Onset_index = which.min(abs(Raw_data$time - Onset)) # Find the rows to update based on pre post samples rows_to_update = seq(Onset_index - Pre_stim_samples, Onset_index + Post_stim_samples-1) # Repeat the values of interest for all the rows Raw_data[rows_to_update, 'time'] = Time Raw_data[rows_to_update, 'Event'] = Event Raw_data[rows_to_update, 'TrialN'] = TrialN } # Filter only events of interest Trial_data = Raw_data %>% filter(Event %in% Events_to_keep) # Add daframe to list List_of_subject_dataframes[[sub]] = Trial_data } # Combine the list of dataframes into 1 dataframe Trial_data = bind_rows(List_of_subject_dataframes) ``` Now we have our dataset all fixed and organized for each subject. Let’s take a look! ```{r PlotEvents} #| message: false #| warning: false #| code-fold: true #| fig-width: 40 #| fig-height: 25 #| fig-dpi: 300 ggplot(Trial_data, aes(x = time, y = R_P, group = TrialN)) + geom_line(aes(y = R_P, color = 'Pupil Right'), lwd = 1.2) + geom_line(aes(y = L_P, color = 'Pupil Left'), lwd = 1.2) + geom_vline(aes(xintercept = 0), linetype = 'dashed', color = 'black', lwd = 1.2) + facet_wrap(~Subject) + ylim(1, 6) + scale_color_manual(values = c('Pupil Right' = '#4A6274', 'Pupil Left' = '#E2725A')) + theme_bw(base_size = 35) + theme( legend.position = 'bottom', legend.title = element_blank()) + guides(color = guide_legend(override.aes = list(lwd = 20))) ``` As you can see, the data structure is now completely transformed. We’ve segmented the data into distinct time windows, with each segment starting at **-0.1 seconds (-100 ms)** and extending to **2 seconds (2000 ms)**. This new structure ensures consistency across all segments, making the data ready for further analysis. ### Make PupillometryR data Ok, now it’s time to start working with **PupillometryR**! 🎉 In the previous steps, we changed our event structure, and you might be wondering — why all that effort? Well, it’s because PupillometryR needs the data in this specific format to do its magic. To get started, we’ll pass our dataframe to the `make_pupillometryr_data()` function. If you’re already thinking, “Oh no, not another weird object type that’s hard to work with!” — don't worry! The good news is that the main object it creates is just a regular dataframe. That means we can still interact with like we’re used to. This makes the pre-processing steps much less frustrating. Let’s get started! ```{r MakePupillometry} #| message: false #| warning: false PupilR_data = make_pupillometryr_data(data = Trial_data, subject = Subject, trial = TrialN, time = time, condition = Event) ``` Here, we’re simply using the **`make_pupillometryr_data()`** function to pass in our data and specify which columns contain the key information. This tells PupillometryR where to find the crucial details, like subject IDs, events, and pupil measurements, so it knows how to structure and process the data properly. ::: callout-tip If you have extra columns that you want to keep in your **PupillometryR** data during preprocessing, you can pass them as a list using the **`other = c(OtherColumn1, OtherColumn2)`** argument. This allows you to keep these additional columns alongside your main data throughout most of the preprocessing steps. But here’s a heads-up — not all functions can keep these extra columns every time. For example, downsampling may not retain them since it reduces the number of rows, and it’s not always clear how to summarize extra columns. So, keep that in mind as you plan your analysis! ::: #### Plot One cool feature of the data created using **`make_pupillometryr_data()`** is that it comes with a simple, built-in `plot` function. This makes it super easy to visualize your data without needing to write tons of code. The plot function works by averaging the data over the `group` variable. So we can group over subject or condition. Here we use the `group` variable to focus on the **condition** and average over the subjects. In this example, we’re plotting the L_P (left pupil) data, grouped by condition. The `plot()` function is actually just a ggplot2 wrapper, which means you can customize to a certain extent like any other ggplot. That’s why we can add elements to it, like **`theme_bw()`**, which gives the plot a cleaner, black-and-white look. Give it a go without adding anything and then learn to customize it!! ::: callout-tip **Pro tip!** If you want more control over your plots, you can always use ggplot2. Remember, the Pupil data is just a regular dataframe, so you can plot it in any way you like! ::: ```{r Pupilplot} #| message: false #| warning: false #| fig-height: 25 #| fig-width: 40 plot(PupilR_data, pupil = L_P, group = 'condition', geom = 'line') + theme_bw(base_size = 45) + theme( legend.position = 'bottom', legend.title = element_blank()) + guides(color = guide_legend(override.aes = list(lwd = 20))) ``` ::: callout-note In this tutorial, we’ll use two methods to plot our data. We’ll use the PupillometryR plot to visualize the average pupil response by condition, and we’ll also use ggplot to manually plot our data. Both approaches are valid and offer unique benefits. The PupillometryR plot provides a quick overview by automatically averaging pupil responses across condition levels, making it ideal for high-level trend visualization. On the other hand, ggplot gives you full control to visualize specific details or customize every aspect of the plot, allowing for deeper insights and flexibility. ::: ## Pre-processing Now that we have our pupillometry data in the required format we can actually start the pre-processing!! ### Regress The first step is to regress **L_P** against **R_P** (and vice versa) using a simple linear regression. This corrects small inconsistencies in pupil data caused by noise. The regression is done separately for each participant, trial, and time point, ensuring smoother and more consistent pupil dilation measurements. ```{r Regression} #| error: true #| message: false #| warning: false Regressed_data = regress_data(data = PupilR_data, pupil1 = L_P, pupil2 = R_P) ``` **Pwa pwa pwaaaaa...!!**🤦‍♂️ We got an error! What’s it saying? It’s telling us that one of the trials is completely full of **NAs**, and since there’s no data to work with, the function fails. This happens **a lot** when testing infants — they don’t always do what we expect, like watching the screen. Instead, they move around or look away. We’ll deal with missing data properly later, but for now, we need a quick fix. What can we do? We can simply drop any trials where both pupils (L_P and R_P) are entirely NA. This way, we avoid errors and keep the analysis moving. So let's filter our data and then redo the last two steps (make PupilR_data and then regress data) ```{r} #| message: false #| warning: false # Filter the trial data Trial_data = Trial_data %>% group_by(Subject, TrialN) %>% # group by Subject and TrialN filter(!all(is.na(L_P) & is.na(R_P))) %>% # filter out if both R_P and L_P are all NA ungroup() # Remove grouping # Make pupilloemtryR data PupilR_data = make_pupillometryr_data(data = Trial_data, subject = Subject, trial = TrialN, time = time, condition = Event) # Regress data Regressed_data = regress_data(data = PupilR_data, pupil1 = L_P, pupil2 = R_P) ``` And now everything worked!! Perfect! ### Mean pupil As the next steps we will average the two pupil signals. This will create a new variable called mean_pupil ```{r MeanData} #| message: true #| warning: false #| fig-height: 25 #| fig-width: 40 Mean_data = calculate_mean_pupil_size(data = Regressed_data, pupil1 = L_P, pupil2 = R_P) plot(Mean_data, pupil = mean_pupil, group = 'condition', geom = 'line')+ theme_bw(base_size = 45) + theme( legend.position = 'bottom', legend.title = element_blank()) + guides(color = guide_legend(override.aes = list(lwd = 20))) ``` ### Lowpass Now that we have a single pupil signal, we can move on to filtering it. The goal here is to remove fast noise and fluctuations that aren't relevant to our analysis. Why? Because we know that pupil dilation is a slow physiological signal, and those rapid changes are likely just noise from blinks, eye movements, or measurement errors. By filtering out these fast fluctuations, we can focus on the meaningful changes in pupil dilation that matter for our analysis. ```{r Lowpass} #| message: false #| warning: false #| fig-height: 25 #| fig-width: 40 Filtered_data = filter_data(data = Mean_data, pupil = mean_pupil, filter = 'median', degree = 11) plot(Filtered_data, pupil = mean_pupil, group = 'condition', geom = 'line')+ theme_bw(base_size = 45) + theme( legend.position = 'bottom', legend.title = element_blank()) + guides(color = guide_legend(override.aes = list(lwd = 20))) ``` There are different ways to filter the data in PupillometryR we suggest you check the actual package website and make decision based on your data ([`filter_data`](http://samforbes.me/PupillometryR/reference/filter_data.html)). Here we use a median filter based on a 11 sample window. ### Trial Rejection Now that our data is smaller and smoother, it’s a good time to take a look at it. It doesn’t make sense to keep trials that are mostly missing values, nor does it make sense to keep participants with very few good trials. While you might already have info on trial counts and participant performance from other sources (like video coding), PupillometryR has a super handy function to check this directly. This way, you can quickly see how many valid trials each participant has and decide which ones to keep or drop. ```{r MissingDataframe} Missing_data = calculate_missing_data(Filtered_data, mean_pupil) head(Missing_data, n=20) ``` This gives us a new dataframe that shows the amount of missing data for each subject and each trial. While we could manually decide which trials and subjects to keep or remove, PupillometryR makes it easier with the **`clean_missing_data()`** function. This function lets you set two % thresholds — one for trials and one for subjects. Here, we’ll set it to reject trials with more than 25% missing data (keep at least 75% of the data) and reject subjects with more than 25% missing data. This way, we ensure our analysis is based on clean, high-quality data. ```{r TrialRejection} #| echo: true #| message: true #| warning: true Clean_data = clean_missing_data(Filtered_data, pupil = mean_pupil, trial_threshold = .75, subject_trial_threshold = .75) ``` See?! PupillometryR shows us exactly how many trials and subjects are being excluded from our dataframe based on our thresholds. Cool! ::: callout-warning Note that this function calculates the percentage of missing trials based only on the trials present in the dataframe. For example, if a participant only completed one trial (and watched it perfectly) before the session had to stop, the percentage would be calculated on that single trial, and the participant wouldn’t be rejected. If you have more complex conditions for excluding participants (e.g., based on total expected trials or additional criteria), you’ll need to handle this manually to ensure subjects are dropped appropriately. ::: ### Fill the signal Now our data is clean, but… while the average signal for each condition looks smooth (as seen in our plots), the data for each individual participant is still noisy. We can still spot blinks and missing data in the signal. To handle this, we’ll use interpolation to fill in the missing points. Interpolation "connects the dots" between gaps, creating a more continuous and cleaner signal. This step is crucial because large chunks of missing data can distort our analysis, and interpolation allows us to retain more usable data from each participant. ```{r PlotBlink} #| message: false #| warning: false #| fig-height: 25 #| fig-width: 40 ggplot(Clean_data, aes(x = time, y = mean_pupil, group = TrialN, color= Event))+ geom_line( lwd =1.2)+ geom_vline(aes(xintercept = 0), linetype= 'dashed', color = 'black', lwd =1.2)+ facet_wrap(~Subject)+ ylim(1,6)+ theme_bw(base_size = 45) + theme( legend.position = 'bottom', legend.title = element_blank()) + labs(y = 'Pupil Size')+ guides(color = guide_legend(override.aes = list(lwd = 20))) ``` So to remove these missing values we can interpolate our data. Interpolating is easy with PupillometryR we can simply: ```{r WrongInterpolation} #| message: false #| warning: false #| fig-height: 25 #| fig-width: 40 Int_data = interpolate_data(data = Clean_data, pupil = mean_pupil, type = 'linear') ggplot(Int_data, aes(x = time, y = mean_pupil, group = TrialN, color = Event))+ geom_line(lwd =1.2)+ geom_vline(aes(xintercept = 0), linetype= 'dashed', color = 'black', lwd =1.2)+ facet_wrap(~Subject)+ ylim(1,6)+ theme_bw(base_size = 45) + theme( legend.position = 'bottom', legend.title = element_blank()) + labs(y = 'Pupil Size')+ guides(color = guide_legend(override.aes = list(lwd = 20))) ``` **Done!!** Well, you’ve probably noticed something strange... When there’s a blink, the pupil signal can rapidly decrease until it’s completely missing. Right now, this drop gets interpolated, and the result is a weird, unrealistic curve where the signal dips sharply and then smoothly recovers. This makes our data look horrible! 😩 **Let’s fix it!** To do this, we’ll use PupillometryR’s blink detection functions. There are two main ways to detect blinks: 1. [**Based on size**](http://samforbes.me/PupillometryR/reference/detect_blinks_by_size.html) — detects pupil size. 2. [**Based on velocity**](http://samforbes.me/PupillometryR/reference/detect_blinks_by_velocity.html) — detects rapid changes in pupil size (which happens during blinks). Here, we’ll use detection by velocity. We set a velocity threshold to detect when the pupil size changes too quickly. To ensure we capture the full blink, we use `extend_forward` and `extend_back` to expand the blink window, including the fast decrease in pupil size. The key idea is to make the entire blink period NA, not just the moment the pupil disappears. This prevents interpolation from creating unrealistic artifacts. When we interpolate, the process skips over the entire blink period, resulting in a cleaner, more natural signal. ```{r BlinkRemoval} #| message: false #| warning: false #| fig-height: 25 #| fig-width: 40 Blink_data = detect_blinks_by_velocity( Clean_data, mean_pupil, threshold = 0.1, extend_forward = 70, extend_back = 70) ggplot(Blink_data, aes(x = time, y = mean_pupil, group = TrialN, color=Event))+ geom_line(lwd =1.2)+ geom_vline(aes(xintercept = 0), linetype= 'dashed', color = 'black', lwd =1.2)+ facet_wrap(~Subject)+ ylim(1,6)+ theme_bw(base_size = 45) + theme( legend.position = 'bottom', legend.title = element_blank()) + labs(y = 'Pupil Size')+ guides(color = guide_legend(override.aes = list(lwd = 20))) ``` See !! now the rapid shrinking disappeared and we can now interpolate ```{r Interpolation2} #| message: false #| warning: false #| fig-height: 25 #| fig-width: 40 Int_data = interpolate_data(data = Blink_data, pupil = mean_pupil, type = 'linear') ggplot(Int_data, aes(x = time, y = mean_pupil, group = TrialN, color=Event))+ geom_line(lwd =1.2)+ geom_vline(aes(xintercept = 0), linetype= 'dashed', color = 'black', lwd =1.2)+ facet_wrap(~Subject)+ ylim(1,6)+ theme_bw(base_size = 45) + theme( legend.position = 'bottom', legend.title = element_blank()) + labs(y = 'Pupil Size')+ guides(color = guide_legend(override.aes = list(lwd = 20))) ``` Look how beautiful our signal is now!! 😍 Good job!!! ::: callout-caution You won’t always run into blink issues like this. Downsampling and filtering usually handle rapid changes during earlier preprocessing steps. Whether this happens can depend on the tracker, sampling rate, or even the population you’re testing. **In this simulated data, we exaggerated the blink effects on purpose to show you how to spot and fix them!** Thus, blink detection may not always be necessary. The best approach is to check your data before deciding. And how do you check it? **Plotting**! Plotting your signal is the best way to see if blinks are causing rapid drops or if you’re just dealing with missing data. Let the data guide your decisions. ::: ### Downsample As mentioned before, Pupil dilation is a slow signal, so 20Hz is enough — no need for 300Hz. Downsampling reduces file size, speeds up processing, and even naturally smooths the signal, all while preserving the key information we need for analysis. To downsample to 20Hz, we’ll set the timebin size to 50 ms (since 1000/20 = 0.05 seconds = 50 ms) and calculate the median for each time bin. ```{r Downsample} #| message: false #| warning: false #| fig-height: 25 #| fig-width: 40 NewHz = 20 timebinSize = 1000/NewHz Downsampled_data = downsample_time_data(data = Int_data, pupil = mean_pupil, timebin_size = timebinSize, option = 'median') plot(Downsampled_data, pupil = mean_pupil, group = 'condition', geom = 'line') + theme_bw(base_size = 45) + theme( legend.position = 'bottom', legend.title = element_blank()) + guides(color = guide_legend(override.aes = list(lwd = 20))) ``` ### Baseline Correction Good job getting this far!! We’re now at the final step of our pre-processing: baseline correction. Baseline correction helps remove variability between trials and participants, like differences in baseline pupil size caused by individual differences, fatigue, or random fluctuations. By doing this, we can focus only on the variability caused by our paradigm. This step ensures that any changes we see in pupil size are truly driven by the experimental conditions, not irrelevant noise. To further adjust the data, we’ll subtract the calculated baseline, ensuring the values start at **0** instead of **-100**. Finally, to make the next steps of analysis easier, we’ll select only the columns of interest, dropping any irrelevant ones. Let’s get it done! ```{r BaselineCorrection} #| message: false #| warning: false #| fig-height: 25 #| fig-width: 40 Base_data = baseline_data(data = Downsampled_data, pupil = mean_pupil, start = -100, stop = 0) # Remove the baseline Final_data = subset_data(Base_data, start = 0) %>% select(Subject, Event, TrialN, mean_pupil, time) ``` Let's plot it to see what baseline correction and removal are actually doing!! We will plot both the average signal using the `plot` function (with some addition information about color and theme) and using ggplot to plot the data for each subject separately. ```{r FinalPlot} #| message: false #| warning: false #| fig-height: 25 #| fig-width: 40 One = plot(Final_data, pupil = mean_pupil, group = 'condition', geom = 'line')+ theme_bw(base_size = 45) + theme(legend.position = 'none') Two = ggplot(Final_data, aes(x = time, y = mean_pupil, group = TrialN, color = Event))+ geom_line(lwd =1.2)+ geom_vline(aes(xintercept = 0), linetype= 'dashed', color = 'black', lwd =1.2)+ facet_wrap(~Subject)+ theme_bw(base_size = 45) + theme( legend.position = 'bottom', legend.title = element_blank()) + labs(y = 'Pupil Size')+ guides(color = guide_legend(override.aes = list(lwd = 20))) # Using patchwork to put the plot together One / Two ``` ### Save and analysis This tutorial will not cover the analysis of pupil dilation. We’ll stop here since, after baseline correction, the data is ready to be explored and analyzed. From this point on, we’ll shift from **pre-processing** to **analysis**, so it’s a good idea to save the data as a simple *.csv* file for easy access and future use. ```{r Save} #| message: false #| warning: false write.csv(Final_data, "..\\..\\resources\\Pupillometry\\Processed\\Processed_PupilData.csv") ``` There are multiple ways to analyze pupil data, and we’ll show you some of our favorite methods in a dedicated tutorial: **Analyze Pupil Dilation**. ## Cite PupillometryR If you decide to use PupillometryR in your analysis, don’t forget to cite it! Proper citation acknowledges the authors' work and supports the development of such valuable tools. ```{r} #| warning: false #| code-fold: true citation("PupillometryR") ``` ## All code Here below we report the whole code we went trough this tutorial as an unique script to make it easier for you to copy and explore it in it's entirety. ```{r FinalCode} #| eval: false # Libraries and files -------------------------------------------------------------------- library(PupillometryR) # Library to process pupil signal library(tidyverse) # Library to wrangle dataframes library(patchwork) csv_files = list.files( path = "..\\..\\resources\\Pupillometry\\RAW", pattern = "\\.csv$", # regex pattern to match .csv files full.names = TRUE # returns the full file paths ) # Prepare data -------------------------------------------------------------------- ## Event settings -------------------------------------------------------------------- # Settings to cut events Fs = 300 # framerate Step = 1000/Fs Pre_stim = 100 # pre stimulus time (100ms) Post_stim = 2000 # post stimulus time (2000ms) Pre_stim_samples = Pre_stim/Step # pre stimulus in samples Post_stim_samples = Post_stim/Step # post stimulus in samples # Time vector based on the event duration Time = seq(from = -Pre_stim, by=Step, length.out = (Pre_stim+Post_stim)/Step) # time vector ## Event fixes -------------------------------------------------------------------- List_of_subject_dataframes = list() # Empty list to be filled with dataframes # Loop for each subject for (sub in 1:length(csv_files)) { Raw_data = read.csv(csv_files[sub]) # Raw data Events = filter(Raw_data, Event %in% Events_to_keep) # Events # Loop for each event for (trial in 1:nrow(Events)){ # Extract the information Event = Events[trial,]$Event TrialN = Events[trial,]$TrialN # Event onset information Onset = Events[trial,]$time Onset_index = which.min(abs(Raw_data$time - Onset)) # Find the rows to update based on pre post samples rows_to_update = seq(Onset_index - Pre_stim_samples, Onset_index + Post_stim_samples-1) # Repeat the values of interest for all the rows Raw_data[rows_to_update, 'time'] = Time Raw_data[rows_to_update, 'Event'] = Event Raw_data[rows_to_update, 'TrialN'] = TrialN } # Filter only events of interest Trial_data = Raw_data %>% filter(Event %in% Events_to_keep) # Add daframe to list List_of_subject_dataframes[[sub]] = Trial_data } # Combine the list of dataframes into 1 dataframe Trial_data = bind_rows(List_of_subject_dataframes) ### Plot Raw Data ----------------------------------------------------------------- ggplot(Raw_data, aes(x = time, y = R_P)) + geom_line(aes(y = R_P, color = 'Pupil Right'), lwd = 1.2) + geom_line(aes(y = L_P, color = 'Pupil Left'), lwd = 1.2) + geom_vline(data = Raw_data |> dplyr::filter(!is.na(Event)), aes(xintercept = time, linetype = Event), lwd = 1.3) + theme_bw(base_size = 45) + ylim(1, 6) + labs(color= 'Signal', y='Pupil size')+ scale_color_manual( values = c('Pupil Right' = '#4A6274', 'Pupil Left' = '#E2725A') ) + theme( legend.position = 'bottom' ) + guides( color = guide_legend(override.aes = list(lwd = 20)), linetype = guide_legend(override.aes = list(lwd = 1.2)) ) # Pre-processing ----------------------------------------------------------------- ## Filter Out Trials with all NA ----------------------------------------------------------------- Trial_data = Trial_data %>% group_by(Subject, TrialN) %>% filter(!all(is.na(L_P) & is.na(R_P))) %>% ungroup() ## Make PupillometryR Data ----------------------------------------------------------------- PupilR_data = make_pupillometryr_data(data = Trial_data, subject = Subject, trial = TrialN, time = time, condition = Event) ### Plot ------------------------------------------------------------------ plot(PupilR_data, pupil = L_P, group = 'condition', geom = 'line') + theme_bw(base_size = 45) + theme( legend.position = 'bottom', legend.title = element_blank()) + guides(color = guide_legend(override.aes = list(lwd = 20))) ## Regress Data ----------------------------------------------------------------- Regressed_data = regress_data(data = PupilR_data, pupil1 = L_P, pupil2 = R_P) ## Calculate Mean Pupil ----------------------------------------------------------------- Mean_data = calculate_mean_pupil_size(data = Regressed_data, pupil1 = L_P, pupil2 = R_P) ### Plot -------------------------------------------------------------------- plot(Mean_data, pupil = mean_pupil, group = 'condition', geom = 'line')+ theme_bw(base_size = 45) + theme( legend.position = 'bottom', legend.title = element_blank()) + guides(color = guide_legend(override.aes = list(lwd = 20))) ## Lowpass Filter ----------------------------------------------------------------- filtered_data = filter_data(data = Mean_data, pupil = mean_pupil, filter = 'median', degree = 11) ### Plot -------------------------------------------------------------------- plot(filtered_data, pupil = mean_pupil, group = 'condition', geom = 'line')+ theme_bw(base_size = 45) + theme( legend.position = 'bottom', legend.title = element_blank()) + guides(color = guide_legend(override.aes = list(lwd = 20))) ## Downsample ----------------------------------------------------------------- NewHz = 20 timebinSize = 1 / NewHz Downsampled_data = downsample_time_data(data = filtered_data, pupil = mean_pupil, timebin_size = timebinSize, option = 'median') # Plot -------------------------------------------------------------------- plot(Downsampled_data, pupil = mean_pupil, group = 'condition', geom = 'line') + theme_bw(base_size = 45) + theme( legend.position = 'bottom', legend.title = element_blank()) + guides(color = guide_legend(override.aes = list(lwd = 20))) ## Calculate Missing Data ----------------------------------------------------------------- Missing_data = calculate_missing_data(Downsampled_data, mean_pupil) ## Clean Missing Data ----------------------------------------------------------------- Clean_data = clean_missing_data(Downsampled_data, pupil = mean_pupil, trial_threshold = .75, subject_trial_threshold = .75) ### Plot -------------------------------------------------------------------- ggplot(Clean_data, aes(x = Time, y = mean_pupil, group = TrialN, color= Event))+ geom_line( lwd =1.2)+ geom_vline(aes(xintercept = 0), linetype= 'dashed', color = 'black', lwd =1.2)+ facet_wrap(~Subject)+ ylim(1,6)+ theme_bw(base_size = 45) + theme( legend.position = 'bottom', legend.title = element_blank()) + guides(color = guide_legend(override.aes = list(lwd = 20))) ## Detect Blinks ----------------------------------------------------------------- Blink_data = detect_blinks_by_velocity( Clean_data, mean_pupil, threshold = 0.1, extend_forward = 50, extend_back = 50) ### Plot -------------------------------------------------------------------- ggplot(Blink_data, aes(x = time, y = mean_pupil, group = TrialN, color=Event))+ geom_line(lwd =1.2)+ geom_vline(aes(xintercept = 0), linetype= 'dashed', color = 'black', lwd =1.2)+ facet_wrap(~Subject)+ ylim(1,6)+ theme_bw(base_size = 45) + theme( legend.position = 'bottom', legend.title = element_blank()) + guides(color = guide_legend(override.aes = list(lwd = 20))) ## Interpolate Data ----------------------------------------------------------------- Int_data = interpolate_data(data = Clean_data, pupil = mean_pupil, type = 'linear') ### Plot -------------------------------------------------------------------- ggplot(Int_data, aes(x = Time, y = mean_pupil, group = TrialN, color = Event))+ geom_line(lwd =1.2)+ geom_vline(aes(xintercept = 0), linetype= 'dashed', color = 'black', lwd =1.2)+ facet_wrap(~Subject)+ ylim(1,6)+ theme_bw(base_size = 45) + theme( legend.position = 'bottom', legend.title = element_blank()) + guides(color = guide_legend(override.aes = list(lwd = 20))) # Baseline correction ----------------------------------------------------- Base_data = baseline_data(data = Int_data, pupil = mean_pupil, start = -100, stop = 0) # Remove the baseline Final_data = subset_data(Base_data, start = 0) ### Final plot -------------------------------------------------------------- One = plot(Final_data, pupil = mean_pupil, group = 'condition')+ theme_bw(base_size = 45) + theme(legend.position = 'none') Two = ggplot(Final_data, aes(x = time, y = mean_pupil, group = TrialN, color = Event))+ geom_line(lwd =1.2)+ geom_vline(aes(xintercept = 0), linetype= 'dashed', color = 'black', lwd =1.2)+ facet_wrap(~Subject)+ theme_bw(base_size = 45) + theme( legend.position = 'bottom', legend.title = element_blank()) + guides(color = guide_legend(override.aes = list(lwd = 20))) One / Two # Save data --------------------------------------------------------------- write.csv(Final_data, "..\\..\\resources\\Pupillometry\\Processed\\Peocessed_PupilData.csv") ```