Bayesian variable selection regression with logistic prior inclusion probability

The goal is to model the relationship between cis-regulatory elements and their target genes from ATAC-seq and RNA-seq data.

Model setup

Suppose we have a gene of interest G, and its expression (or promoter activity) across \(N\) samples \(\boldsymbol y\).

We also have P cis-regulatory elements (CREs) in the vicinity of gene G that we consider as the candidate set of regulatory elements on the genome responsible for controlling the expression of G in addition to its promoter. For example, these candidate CREs can be all the ATAC-seq peaks in the ~500 bp region upstream of gene G.

Let \(\boldsymbol x_j\) be the level of chromatin accessibility (or ATAC-seq peak strength) at CRE j across \(N\) samples, and \(\beta_j\) be the effect of CRE j on gene G.

Then we can describe the regulatory relationship between CREs and the target genes as a regression model: \[\boldsymbol y=\sum_{j=1}^P \beta_j \boldsymbol x_j + \boldsymbol \epsilon , \hspace{5mm} \boldsymbol \epsilon \sim N(0,\sigma^2_e I_N)\] Since for CREs with different genomic features, their regulatory effects will likely be different, we can incorporate this kind of information into the model by imposing a prior on \(\boldsymbol \beta\):

\(\beta_j \sim f(\boldsymbol A_j)\), where \(\boldsymbol A_j\) is some feature(s) in CRE j (e.g. Hi-C contact with gene promoter, evidence of transcription factor binding, H3K27ac marks …).

One specific choice is a spike-and-slab prior on \(\beta_j\) with logistic inclusion probability: \[\beta_j \sim \pi_j N(0,\sigma^2_{\beta}) + (1-\pi_j)\delta_j , \hspace{5mm} \text{log }\frac{\pi_j}{1-\pi_j}=\alpha_1 A_j + \alpha_0\] Here, the hyperparameter \(\boldsymbol \alpha:=(\alpha_0,\alpha_1)\) characterizes the enrichment level of the genomic feature in active CREs in general, so it might be shared across all genes. (In the case where we need to incorporate multiple classes of CRE-specific genomic features into our model, it would be \(\boldsymbol A_j=(A_{j1},...,A_{jm})\) and \(\boldsymbol \alpha=(\alpha_0,...,\alpha_m)\).)

If we introduce \(\gamma_j\) as a latent indicator variable that takes on the value 1 when \(\beta_j\) is non-zero, and 0 when \(\beta_j\) is zero, then the prior on \(\beta\) is equivalent to: \[\beta_j \big | \gamma_j=0 \sim \delta_0, \hspace{5mm} \beta_j \big | \gamma_j=1 \sim N(0,\sigma^2_{\beta}), \hspace{5mm} \text{and} \hspace{5mm} \text{log }\frac{p(\gamma_j=1)}{p(\gamma_j=0)}=\alpha_1 A_j + \alpha_0\]

If we specify inverse-gamma priors on the variances, the joint probability density function is: \[ \begin{align*} &\text{P}(\boldsymbol y, \boldsymbol \beta, \boldsymbol\gamma, \boldsymbol \alpha , \sigma^2_e, \sigma^2_{\beta} \big | \boldsymbol X, \boldsymbol A )\\ &= \prod_{i=1}^N \text{P}(y_i \big |X_{i\cdot},\boldsymbol \beta,\sigma^2_e) \prod_{j=1}^P \text{P}(\beta_j \big |\gamma_j,\sigma^2_{\beta}) \prod_{j=1}^P \text{P}(\gamma_j \big |\alpha,A_j) \text{ P}(\sigma_e^2) \text{ P}(\sigma_{\beta}^2)\\ &= \prod_{i=1}^N N(y_i;\sum_{j=1}^P \beta_j x_{ij},\sigma^2_e) \prod_{j=1}^P [N(\beta_j;0,\sigma^2_{\beta})^{\gamma_j } \delta_0(\beta_j)^{1-\gamma_j}] \prod_{j=1}^P (1+e^{-\alpha_1 A_j - \alpha_0})^{-\gamma_j}(1+e^{\alpha_1 A_j + \alpha_0})^{\gamma_j-1}\\ &\cdot IG(\sigma_e^2; g_0,h_0) \cdot IG(\sigma_{\beta}^2; g_1,h_1) \end{align*} \]

Inference procedure

We adopt a Gibbs sampling scheme to estimate the model parameters. We sample each of the parameters other than \(\alpha\) based on its posterior distribution and estimate \(\alpha\) using an EM-like approach, and iterate over these steps until convergence:

Sample \(\boldsymbol \gamma, \boldsymbol \beta\) given \(\boldsymbol y, \boldsymbol X, \boldsymbol \alpha, \boldsymbol A, \sigma^2_e, \sigma^2_{\beta}\) :
For \(j=1,2, ... ,P\):
First, sample \(\gamma_j\) according to: \[\frac{p(\gamma_j=1|\cdot)}{p(\gamma_j=0|\cdot)} =\sqrt{\frac{\lambda}{\sigma^2_{\beta}}} \text{ exp}\Bigl(\frac{\nu^2}{2\lambda}\Bigr) \text{ exp}(\alpha_1 A_j + \alpha_0).\] Next, if \(\gamma_j=1\), sample \(\beta_j \sim N(\nu, \lambda)\); otherwise, \(\beta_j=0\).
Here, \(\nu = \lambda \cdot \frac{1}{\sigma^2_e} \sum_{i=1}^N x_{ij}(y_i-\sum_{k\neq j}x_{ik}\beta_k)\), \(\lambda = \Bigl(\frac{\sum_{i=1}^N x_{ij}^2}{\sigma^2_e}+\frac{1}{\sigma^2_{\beta}}\Bigr)^{-1}\).
Sample \(\sigma^2_e\) given \(y,\beta,X\) :
\[p(\sigma^2_e \big |\cdot) \sim \text{InverseGamma}(g_0+\frac{N}{2},h_0+\frac{1}{2}\sum_{i=1}^N\epsilon_i^2)\]
Sample \(\sigma^2_{\beta}\) given \(y,\beta\) :
\[p(\sigma^2_{\beta} \big |\cdot) \sim \text{InverseGamma}(g_1+\frac{1}{2}\sum_{j=1}^P \gamma_j,h_1+\frac{1}{2}\sum_{j=1}^P \gamma_j\beta_j^2)\]
Update \(\boldsymbol \alpha\) using an EM-like approach:
Suppose that at step \((t)\), the posterior probability \(p(\gamma|\cdot)=q^{(t)}(\gamma)\), then \(\boldsymbol \alpha^{(t)}\) can be estimated by optimizing the expected log-likelihood: \[ \begin{align*} \boldsymbol \alpha^{(t)} &= \underset{\boldsymbol \alpha}{\text{argmax }} \mathbb{E}_{\gamma \sim q^{(t)}} \text{ log } p(\gamma \big |\boldsymbol \alpha,A)\\ &= \underset{(\alpha_0,\alpha_1)}{\text{argmax }} \mathbb{E}_{\gamma \sim q^{(t)}} \Bigl[ \sum_{j=1}^P -\gamma_j \text{ log}(1+e^{-\alpha_0-\alpha_1 A_j}) - (1-\gamma_j) \text{ log}(1+e^{\alpha_0+\alpha_1 A_j})\Bigr]\\ &= \underset{(\alpha_0,\alpha_1)}{\text{argmax }} \sum_{j=1}^P \Bigl[q^{(t)}(\gamma_j=1)(\alpha_0+\alpha_1 A_j) - \text{log}(1+e^{\alpha_0+\alpha_1 A_j})\Bigr] \end{align*} \]

Bayesian variable selection regression with logistic prior inclusion probability

Yifan Zhou

12/1/2018

Model setup

Inference procedure