Introduction

Motivation

Genetic perturbation often regulates the expression of a network of genes via trans effect.

Current computational approaches to detect trans genetic effects include:

  • Per-gene association analysis such as trans-eQTL analysis and differential expression analysis, but bears huge multiple testing burden;
  • Sparse factor analysis which takes advantage of "gene modules", but subsequent analyses are necessary to interpret the biological meaning of factors.

Our approach to detect the effect of genetic perturbation:

  • Identify genetically controlled factors that are correlated with the perturbation in a joint statistical framework.

We developed GSFA (Guided Sparse Factor Analysis), a factor analysis model that can infer unobserved intermediate factors given observed gene expression levels, with the advantage of inferred factors being sparse and their correlation with given sample-level conditions (e.g. genotype, CRISPR perturbation).

GSFA Model

Given a matrix \(Y \in \mathbb{R}^{N \times P}\) that holds the normalizd expression levels of \(P\) genes in \(N\) samples, and a guide matrix \(G \in \mathbb{R}^{N \times M}\) that holds \(M\) types of sample-level conditions:

\(Y = ZW^T+E\), where \(Z \in \mathbb{R}^{N \times K}\), \(W \in \mathbb{R}^{P \times K}\), \(E_{ij} \sim N(0,\psi_j)\),

\(Z = G \beta + \Phi\), where \(\beta \in \mathbb{R}^{M \times K}\), \(\Phi_{ik} \overset{i.i.d.}{\sim} N(0,1)\).

Both \(W\) and \(\beta\) have spike-and-slab priors.

Gibbs sampling is used to infer the model parameters from data.

Applications

We applied GSFA to several published data sets of large-scale gene expression data with sample-level perturbations.

LUHMES CROP-seq Study

Cells

Lund human mesencephalic (LUHMES) neural progenitor cells. (Cells were sequenced in 3 batches.)

Perturbations

CRISPR knock-down of 14 autism spectrum disorder (ASD)–associated genes (3 gRNAs per gene) + 5 non-targeting gRNAs.

Primary Human T Cell CROP-seq Study

Cells

Primary human CD8+ T cells from two healthy donors, with T cell receptor (TCR) stimulation.

Perturbations

CRISPR knock-out of 20 genes (2 gRNAs per gene) + 8 non-targeting gRNAs. Target genes were either found to regulate T cell responses in the genome-wide screens, or known checkpoint genes.

MCF10A CROP-seq Study

Source and Reference

On the design of CRISPR-based single cell molecular screens, GEO accession: GSE108699.

Cells

MCF10A cells (normal human breast epithelial cells) with exposure to a DNA damaging agent, doxorubicin;

Perturbations

CRISPR knock-out of 29 tumor-suppressor genes (TP53, ...), 1 non-targeting control;
guide RNA readout measured on the single-cell level.

TCGA BRCA Somatic Mutation Study

Data sources

FireBrowse TCGA BRCA Archives.

  • mRNA-seq file "illuminahiseq_rnaseqv2-RSEM_genes_normalized";
  • Mutation annotation file "Mutation_Packager_Oncotated_Calls";
  • Clinical file "Clinical_Pick_Tier1".

Samples

TCGA breast invasive carcinoma (BRCA) tumor samples (confined to female, Caucasian, and with somatic mutation annotation).

Perturbation

Somatic mutation status of selected frequently mutated driver genes (PIK3CA, TP53, TTN, GATA3, CDH1, MAP3K1, MAP2K4).