Description
in Computational Biology
Niko Beerenwinkel
David Dreifuss
Pedro Ferreira
Xiang Ge Luo
Problem 1: Conditional independence and BNs (3 points)
Statistical Models
Consider the following graphical structures, corresponding to (different) Bayesian networks. For which network does the statement A ⊥ B | C hold? For which does the statement A ⊥ B hold? Prove your answers by the laws of probability.
a) b)
Problem 2: Markov blanket
Consider the following graphical structure of a Bayesian network: (2 points)
Determine the Markov blanket MB(D) of node D and show that the conditional probability P(D | A,B,C,E,F,G) is the same as P(D | MB(D)).
Problem 3: Learning Bayesian networks from protein data (5 points)
In this exercise, we will use the R package BiDAG to learn Bayesian networks from protein data. The data provided in sachs.data.txt consists of the measurements of 11 phosphorylated proteins and phospholipids derived from primary immune system cells, subjected to both general and specific molecular interventions [2]. (Hint: read the help files of the package and use default parameters unless otherwise stated.)
[3, 4]. (1 point) [Note: The BGe score is a fully-decomposable marginal likelihood function P(D | G) for scoring Bayesian networks. The main underlying assumption is that the data is normally distributed with N(µ,W−1). The precision matrix W follows a Wishart prior Wn(T−1,αw), where αw > n − 1 is the degrees of freedom and T is the positive definite parametric matrix. The mean vector µ follows a normal prior N(ν,αµW) with αµ > 0.]
(b) Learn a Bayesian network using the order MCMC algorithm. Plot the directed acyclic graph (DAG). Evaluate the log BGe score of the test data against the estimated DAG. (Hint: one can use the R package graph for the plot.) (1 point)
(c) One of the arguments in the scoreparameters function is bgepar = list(am = 1, aw = NULL), which corresponds to the hyper-parameters αµ and αw for the BGe score. By default, αµ = 1 and αw = n + αµ+1.
Parameter am 10−5 10−3 10−1 10 102
Average number of edges
Average BGe score of the test data
What do you observe? Choose the value of am corresponding to the highest test BGe score and plot the DAG re-learned from the whole dataset. (3 point)
References
[2] Sachs, K., Perez, O., Pe’er, D., Lauffenburger, D. A., & Nolan, G. P. (2005). Causal proteinsignaling networks derived from multiparameter single-cell data. Science, 308(5721), 523-529.
[3] Geiger, D., & Heckerman, D. (2002). Parameter priors for directed acyclic graphical models and the characterization of several probability distributions. The Annals of Statistics, 30(5), 1412-1440.
[4] Kuipers, J., Moffa, G., & Heckerman, D. (2014). Addendum on the scoring of Gaussian directed acyclic graphical models. The Annals of Statistics, 42(4), 1689-1691.



Reviews
There are no reviews yet.