A method for detecting pairwise interaction based on feature-based adaption of information theoretic metrics
This repository contains the code for the paper "A feature-based information-theoretic approach for detecting interpretable, long-timescale pairwise interactions from time series". Here we provide the code to reproduce the figures in the paper, as well as the R code to apply this method on a pair of time series data to infer pairwise feature-driven dependency.
This repository is written in R and requires a small number of dependencies.
All required R packages are listed in requirements.txt
. They will be installed automatically when you run main_simulate.R
or main_evaluate.R
with the parameters required.
If you want to install them manually, run:
packages <- readLines("requirements.txt")
for (p in packages) {
if (!requireNamespace(p, quietly = TRUE)) {
install.packages(p, repos = "http://cran.us.r-project.org")
}
}
The code here requires rJvava
and JIDT. - The infofynamics.jar
file needed for JIDT is already included in this repository.
rJava
is listed inrequirements.txt
and will be installed automatically.- No additional setup is required for reproducing the results in the manuscript. If you want to use JIDT independently of this repo (for custom experiments), see the official JIDT documentation.
This repository provides everything needed to:
- Run simulations to generate the dynamical processes studied in the manuscript.
- Compute mutual information (MI) using both the traditional signal-based approach and the feature-based approach.
- Evaluate simulation results and reproduce the plots shown in the manuscript.
The repository is structured as follows:
* requirements.txt
--- lists all required R packages.
* setup.R
--- loads libraries and initializes JIDT.
* src.R
--- core functions for simulations and analyses.
* main_simulate.R
--- runs simulations, including:
* Generating time-series data for the three dynamical source processes studied.
* Extracting their features.
* Producing target time-series data.
* Computing both signal-based MI and feature-based MI.
* main_evaluate.R
--- evaluates simulation results and generates the manuscript plots.
Simulating the random noise process requires intensive computation due to the large number of features and time-series lengths considered.
We provide a job script, PBS_simulations.pbs
, to run parallelized simulations on a distributed cluster.
You can adapt this script to match your own computing environment.
For non-stationary processes, simulation data can be generated directly by running:
Rscript main_simulate.R --source_type <AR3|bimodal_spiking> --ts_length <1000> --driving_feature_timescale <50|100|150|200>
For example:
Rscript main_simulate.R --source_type AR3 --ts_length 1000 --driving_feature_timescale 100
If you receive the following warning:
Warning in install.packages :
'lib = "/usr/local/lib/R/site-library"' is not writable
then you can run the following code and that should resolve the issue:
mkdir -p <some_writeable_folder>
export R_LIBS_USER="<some_writeable_folder>"
If you run the scripts above to generate simulation data, the results will be saved to results/simulation_studies/new_results.csv
.
Note: The exact simulation dataset used in the manuscript is also provided here: results/simulation_studies/featureBasedDependency_simulation_results.csv
.
To reproduce the plots from the manuscript, run:
Rscript main_evaluate.R <path/to/output>
Similar to running the simulation script, if you receive the following warning while running the code above:
Warning in install.packages :
'lib = "/usr/local/lib/R/site-library"' is not writable
then running the following code should resolve the issue:
mkdir -p <some_writeable_folder>
export R_LIBS_USER="<some_writeable_folder>"
All plots will be saved into the specified <path/to/output>
. If the directory does not exist, it will be created automatically.
- Figure 4a →
RandomResultsMatrix.pdf
\ - Figures 4b--e →
RandomResultsLinePlots.pdf
\ - Figure 5a →
AR3_Noise.pdf
\ - Figure 5b →
bimodal_spiking_Noise.pdf
\ - Figure 6a →
AR3_DrivingTimescale.pdf
\ - Figure 6b →
bimodal_DrivingTimescale.pdf
\ - Figure 7a →
AR3_CapturingTimescale.pdf
\ - Figure 7b →
bimodal_CapturingTimescale.pdf
You can use the function detect_dependency_with_catch22()
from src.R
to infer feature-driven dependencies between a source time series and a target time series using the feature-based Transfer Entropy approach. The function computes Catch22 features for the source series and evaluates the transfer entropy (TE) from each feature to the target. Note that you will need to run setup.R
before running this function to call all the libraries required and initilize a JIDT object needed for TE calculation.
detect_dependency_with_catch22(
source,
target,
feature_window,
target_hist,
delay = 1,
number_surrogates = 1000,
theiler_window_multiplier = 1,
rotating_surrogates = TRUE
)
Parameter | Type | Description |
---|---|---|
source |
numeric vector | The source time series. |
target |
numeric vector | The target time series. |
feature_window |
integer | Window size (in samples) for computing Catch22 features on the source series. |
target_hist |
integer | History length of the target series used in TE calculation. For mutual information, this is set to 0. |
delay |
integer | Time delay between source and target for TE calculation. Default is 1. |
number_surrogates |
integer | Number of surrogate time-series to generate for statistical testing of TE. Default is 1000. |
theiler_window_multiplier |
numeric | Multiplier for the Theiler window for dynamic correlation exclusion. Default is 1. |
rotating_surrogates |
logical | If TRUE, use rotating surrogates for TE significance testing. Default is TRUE. |
A data frame (features_results
) with the following columns:
Column | Type | Description |
---|---|---|
Feature | string | Name of the feature extracted from the source time series. |
TE | numeric | Transfer entropy value between the feature and the target time series. |
pValue | numeric | p-Value from surrogate testing for the TE measurement. |
holm-pValue | numeric | p-Value adjusted with Holm-Bonferroni method |
significance | boolean | Whether the Holm-adjusted pValue is statistically significant ! |
The pvalue are then adjusted using Holm-Bonferroni method, and if any feature has a Holm-adjusted p-value < 0.05, the function prints "Statistical dependence detected" along with the list of the features with signficant Holm-adjusted pvalues. Otherwise, it prints "No significant statistical dependence detected."