1 GSFA on LUHMES CROP-seq Data

1.1 Data Processing

Cells:
LUHMES cells from 3 batches were merged together into 1 analysis. All cells have only a single type of gRNA readout. Quality control resulted in 8708 cells.

Genes:
Only genes detected in > 10% of cells were kept, resulted in 6213 genes.

Normalization:
Seurat "LogNormalize": log(count per 10K + 1).
Batch effect, unique UMI count, library size, and mitochondria percentage were all corrected for. The corrected and scaled expression data were used as input for subsequent factor analysis.

1.2 GSFA Results (SVD Initialization)

Here, our "guide", \(G\) matrix, consists of 15 types (14 genes + NTC) of gene-level knock-down conditions across cells.

Gibbs sampling was initiated from SVD, and conducted for 2000 iterations; the posterior mean estimates were averaged over the last 500 iterations.

1.2.1 Estimate of Factor ~ Perturbation Associations (\(\beta\))

Examples of factor ~ perturbation associations:

1.2.2 DEGs (LFSR < 0.05) under Each Perturbation

Number of genes that passed GSFA LFSR < 0.05 under each perturbation:

Overlap of GSFA DEGs between perturbations:

Venn diagram for DEGs found under perturbations ADNP, ARID1B, ASH1L, CHD2:

SFARI high risk (score 1, 2) genes among the DEGs:

2 Gene Set Enrichment Analysis

2.1 Enrichment by factor

Target genes: Genes w/ non-zero loadings in each factor (PIP cutoff at 0.95);
Backgroud genes: all 6213 genes included in factor analysis;
Statistical test: hypergeometric test (over-representation test);
Only GO terms/pathways within the size of 10 ~ 500 and have have an enrichment FC > 2 and FDR < 0.05 are kept.

2.1.1 GO Slim Over-Representation Analysis

Gene sets: Gene ontology "Biological Process" (non-redundant).

We used the "Wang" method in GOSemSim to measure the similarity between GO BP terms, and all the significant terms in factors of interest were further grouped into 9 clusters using hierarchical clustering with the "ward.D" agglomeration method. The clustering results of all these GO BP terms are stored here.

Terms of interest

Factors of interest

  • Factor 10: nerve development, cell fate commitment;
  • Factor 11: chromatin remodeling, DNA recombination;
  • Factor 13: regulation of metal ion transport, axon development, ...;
  • Factor 15: neuron migration, response to metal ion, CNS neuron development, ...;
  • Factor 18: CNS neuron differentiation, xxx tissue development, ...;
  • Factor 19: axon development;
  • Factor 20: nerve development, neurotransmitter transport, ....

2.1.2 Reactome Pathway Over-Representation Analysis

Gene sets: The Reactome pathway database.

Factors of interest

  • Factor 8: p53-Independent DNA Damage Response, G2/M Transition
  • Factor 9: Neuronal System, Transmission across Chemical Synapses
  • Factor 12: Gap junction trafficking
  • Factor 13: Neurotransmitter receptors and postsynaptic signal transmission, Ion channel transport, Neuronal System
  • Factor 15: Transmission across Chemical Synapses, Ca2+ pathway, Gap junction trafficking
  • Factor 17: Axon guidance
  • Factor 19: Activation of NMDA receptors and postsynaptic events, Mitotic phases, Cell Cycle Checkpoints
  • Factor 20: Neurotransmitter receptors and postsynaptic signal transmission, Ion channel transport

3 Inspection of Signature Genes

3.1 Genes targeted by CRISPR knock-down

PTEN is not detected in > 10% cells.

3.2 Neuron projection development genes

According to Figure 4E of the reference paper, among these marker genes, mature neuron marker genes are down-regulated and negative regulator genes are up-regulated in ADNP, CHD2 and ASH1L knock-downs; while the opposite happens in PTEN knock-down.

Here we show all the marker genes that passed LFSR < 0.05 under each knock-down. In general, their effect sizes are consistent with the findings of the paper.