Dynamic Path Analysis And Model Based Clustering Of Microarray Data
Part I We consider a situation where we observe continuous and binary data for different subjects at discrete time points. At each time point the binary responses are modeled with probit equations and the continuous responses with linear regression equations. The model construction results in a Gaussian system of equations with directed acyclic graph structure, where the variables, as well as the parameters, are time-dependent. The functional parameters are further modeled with a mixed model representation of splines and estimation is carried out with a Bayesian analysis. We establish a connection with dynamic graphical models and through a simple Gibbs sampler we obtain posterior estimates of direct, indirect and total effects of our model. These estimates allow us to describe how the effects of fixed covariates are working partly directly and partly indirectly through endogenous time-dependent covariates. We show how our methodology can be applied in certain situations arising in Survival analysis and we illustrate our methods on a simple data set. Part II A recent study on a cohort of 344 well-characterized patients with acute myeloid leukemia suggests that subjects can be segregated into distinct groups using unsupervised clustering based on their DNA methylation profiles. We suggest a model based approach, where we introduce latent cluster specific methylation indicators on each gene. These indicators along with some standard assumptions impose a specific mixture distribution on each cluster and the parameters of the induced likelihood are estimated using the EM algorithm. We also introduce latent gene importance indicators, which provides us with information about which genes discriminate between patients. By calculating posterior expectations of the above indicators we can predict genomewide methylation patterns across different subtypes of AML, which facilitates AML classifications of new patients based on their methylation profiles. The methods we develop extend naturally to other data types of similar nature such as expression data. This leads to a joint analysis over multiple data platforms, resulting in a higher discriminating power.
dissertation or thesis