This tutorial describes how to run GSFA on LUHMES CROP-seq data after preprocessing.
/project2/xinhe/yifan/Factor_analysis/LUHMES/run_gsfa_negctrl.R
/project2/xinhe/yifan/Factor_analysis/LUHMES/gsfa_negctrl_job.sbatch
library(data.table)
library(tidyverse)
library(GSFA)
Processed gene expression matrix: /project2/xinhe/yifan/Factor_analysis/LUHMES/processed_data/deviance_residual.merged_top_6k.corrected_4.scaled.rds
Binarized perturbation matrix: /project2/xinhe/yifan/Factor_analysis/LUHMES/processed_data/merged_metadata.rds
In fit_gsfa_multivar()
, we specify 20 factors initialized from truncated SVD and run Gibbs sampling for 3000 iterations, with the posterior mean estimates computed over the last 1000 iterations of samples.
We further calibrate the differential effects of gRNA targets by subtracting the raw coefficients (beta's) of gRNAs by the beta of the negative control group ("Nontargeting"). This is done by setting the options use_neg_control = T
and neg_control_index = negctrl_index
in fit_gsfa_multivar()
.
This process is both time and memory intensive. We recommend submitting the R script as an sbatch job to a high performance computing cluster that can guarantee 50GB memory and 3.5 hours of runtime without interruption.