This tutorial describes how to run GSFA on LUHMES CROP-seq data after preprocessing.

0.1 Analysis scripts

  • R script: /project2/xinhe/yifan/Factor_analysis/LUHMES/run_gsfa_negctrl.R
  • sbatch script: /project2/xinhe/yifan/Factor_analysis/LUHMES/gsfa_negctrl_job.sbatch

0.2 Necessary packages

library(data.table)
library(tidyverse)
library(GSFA)

0.3 GSFA inputs

Processed gene expression matrix: /project2/xinhe/yifan/Factor_analysis/LUHMES/processed_data/deviance_residual.merged_top_6k.corrected_4.scaled.rds

Binarized perturbation matrix: /project2/xinhe/yifan/Factor_analysis/LUHMES/processed_data/merged_metadata.rds

0.4 GSFA settings

In fit_gsfa_multivar(), we specify 20 factors initialized from truncated SVD and run Gibbs sampling for 3000 iterations, with the posterior mean estimates computed over the last 1000 iterations of samples.

We further calibrate the differential effects of gRNA targets by subtracting the raw coefficients (beta's) of gRNAs by the beta of the negative control group ("Nontargeting"). This is done by setting the options use_neg_control = T and neg_control_index = negctrl_index in fit_gsfa_multivar().

0.5 Computational requirements

This process is both time and memory intensive. We recommend submitting the R script as an sbatch job to a high performance computing cluster that can guarantee 50GB memory and 3.5 hours of runtime without interruption.