--- title: "Model and sampler details" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Model and sampler details} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` This section connects the notation in Lu, Li, and Love (2021) to the `bpgmm` interface. It describes the fitted model, the covariance labels, and the RJMCMC switches used by the sampler. Full fitted examples are given in the getting-started and worked-example vignettes. ## Observation model Let \(X = (x_1, \ldots, x_n)\) be a matrix of continuous observations with \(p\) variables and \(n\) observations. `bpgmm` expects this same orientation: variables in rows and observations in columns. For cluster \(k\), the mixture of factor analyzers model writes \[ x_i = \mu_k + \Lambda_k y_{ki} + \epsilon_{ki}, \] where \[ y_{ki} \sim N(0, I_{q_k}), \qquad \epsilon_{ki} \sim N(0, \Psi_k). \] After integrating out the latent factor \(y_{ki}\), the cluster-specific covariance is \[ \Sigma_k = \Lambda_k \Lambda_k^\top + \Psi_k. \] The plot below shows the geometric role of this covariance. The loading matrix \(\Lambda_k\) controls the dominant low-dimensional direction, while \(\Psi_k\) adds cluster-specific noise around that subspace. ```{r, fig.width = 5.5, fig.height = 4.5, fig.alt = "Two covariance ellipses illustrating cluster-specific covariance matrices."} ellipse_points <- function(center, sigma, level = 0.90, n = 120) { eig <- eigen(sigma, symmetric = TRUE) angles <- seq(0, 2 * pi, length.out = n) circle <- rbind(cos(angles), sin(angles)) radius <- sqrt(qchisq(level, df = 2)) points <- t(center + radius * eig$vectors %*% diag(sqrt(eig$values), 2) %*% circle) colnames(points) <- c("x", "y") points } lambda_1 <- matrix(c(1.1, 0.4), ncol = 1) lambda_2 <- matrix(c(0.2, 1.0), ncol = 1) psi_1 <- diag(c(0.20, 0.08)) psi_2 <- diag(c(0.10, 0.25)) sigma_1 <- lambda_1 %*% t(lambda_1) + psi_1 sigma_2 <- lambda_2 %*% t(lambda_2) + psi_2 ell_1 <- ellipse_points(c(-1.2, -0.5), sigma_1) ell_2 <- ellipse_points(c(1.1, 0.6), sigma_2) plot( ell_1, type = "l", lwd = 2, col = "#0072B2", xlim = c(-3.5, 3.5), ylim = c(-2.5, 3), xlab = "Variable 1", ylab = "Variable 2", main = "Mixture-of-factor-analyzers covariance" ) lines(ell_2, lwd = 2, col = "#D55E00") points(rbind(c(-1.2, -0.5), c(1.1, 0.6)), pch = 19, col = c("#0072B2", "#D55E00")) arrows(-1.2, -0.5, -1.2 + lambda_1[1], -0.5 + lambda_1[2], col = "#0072B2", lwd = 2, length = 0.08) arrows(1.1, 0.6, 1.1 + lambda_2[1], 0.6 + lambda_2[2], col = "#D55E00", lwd = 2, length = 0.08) legend( "topleft", legend = c("Cluster 1 covariance", "Cluster 2 covariance", "Loading direction"), col = c("#0072B2", "#D55E00", "gray30"), lwd = c(2, 2, 2), bty = "n" ) ``` The mixture density is \[ f(x_i) = \sum_{k = 1}^m \tau_k N(x_i \mid \mu_k, \Lambda_k \Lambda_k^\top + \Psi_k), \] where \(\tau_k\) is the mixture weight and \(m\) is the number of clusters. Equivalently, with allocation indicator \(z_i \in \{1,\ldots,m\}\), \[ x_i \mid z_i = k, \Theta \sim N_p(\mu_k, \Sigma_k), \qquad \Pr(z_i = k \mid \tau) = \tau_k . \] In `pgmm_rjmcmc()`: - `X` is the \(p \times n\) data matrix. - `m_init` is the starting value of \(m\). - `m_range` is the allowed range of \(m\). - `q_new` is the factor dimension assigned to a newly proposed cluster. - `constraint` is the starting covariance model. ## PGMM covariance constraints The paper uses three letters to describe the covariance structure. Each letter is either `C` for constrained or `U` for unconstrained. | Letter | Meaning when `C` | Meaning when `U` | |---|---|---| | 1 | all clusters share one loading matrix \(\Lambda\) | each cluster has \(\Lambda_k\) | | 2 | all clusters share one noise covariance \(\Psi\) | each cluster has \(\Psi_k\) | | 3 | noise covariance is isotropic, \(\Psi_k = \psi_k I_p\) | noise covariance is diagonal | The eight PGMM models are: ```{r} library(bpgmm) models <- c("CCC", "CCU", "CUC", "CUU", "UCC", "UCU", "UUC", "UUU") data.frame( model = models, constraint = vapply( models, function(x) paste(model_to_constraint(x), collapse = ","), character(1) ) ) ``` The package uses the numeric encoding internally: `1` means constrained and `0` means unconstrained. ```{r} model_to_constraint("UUU") constraint_to_model(c(1, 1, 0)) ``` ## Priors and posterior updates The supplement gives the natural conjugate priors used by the MCMC updates. The main priors are: \[ \tau \sim \mathrm{Dirichlet}(\gamma, \ldots, \gamma), \] \[ y_{ki} \sim N(0, I_{q_k}), \] \[ \mu_k \sim N(\bar{x}, \alpha_1^{-1} \Psi_k), \] \[ \lambda_{kj} \sim N(0, \alpha_2^{-1} \Psi_k), \] and \[ \psi_{kj} \sim \mathrm{IG}(\delta, \beta). \] The complete parameter state can be read as \[ \Theta = \{\tau_k, \mu_k, \Lambda_k, \Psi_k, y_{ki}, z_i: k = 1,\ldots,m;\ i = 1,\ldots,n\}, \] with hyperparameters \[ H = (\alpha_1, \alpha_2, \beta, \delta, \gamma). \] The package samples the hyperparameters \(\alpha_1\), \(\alpha_2\), and \(\beta\), with gamma hyperpriors controlled by: - `d_vec`: shape parameters. - `s_vec`: rate parameters. - `delta`: inverse-gamma shape for the noise covariance. - `ggamma`: Dirichlet concentration for mixture weights. The allocation update uses the posterior probability \[ p_{ki} = \frac{ \tau_k N(x_i \mid \mu_k, \Psi_k + \Lambda_k\Lambda_k^\top) }{ \sum_{\ell = 1}^{m} \tau_\ell N(x_i \mid \mu_\ell, \Psi_\ell + \Lambda_\ell\Lambda_\ell^\top) }. \] Then \[ (z_{i1}, \ldots, z_{im}) \mid X, \Theta, H \sim \mathrm{Multinomial}(1, p_{1i}, \ldots, p_{mi}). \] The allocation update uses this probability, and the joint posterior includes the product of allocated mixture weights: \[ p(Z \mid \tau) = \prod_{i = 1}^{n} \tau_{z_i}. \] ## Augmented representation and sufficient statistics The sampler augments each observation with latent factors and uses the stacked notation from the supplement. For cluster \(k\), define \[ \tilde{y}_{ki} = \begin{pmatrix} 1 \\ y_{ki} \end{pmatrix}, \qquad \tilde{\Lambda}_k = (\mu_k, \Lambda_k), \qquad A_k = \mathrm{diag}(\alpha_1, \alpha_2 I_{q_k}). \] Then the cluster likelihood can be written as \[ x_i \mid z_i = k, \tilde{\Lambda}_k, \Psi_k \sim N_p(\tilde{\Lambda}_k \tilde{y}_{ki}, \Psi_k). \] For allocated observations in cluster \(k\), the package accumulates \[ C_{XXk} = \sum_{i: z_i = k} x_i x_i', \quad C_{X\tilde{Y}k} = \sum_{i: z_i = k} x_i \tilde{y}_{ki}', \quad C_{\tilde{Y}\tilde{Y}k} = \sum_{i: z_i = k} \tilde{y}_{ki} \tilde{y}_{ki}', \] \[ C_{XYk} = \sum_{i: z_i = k} x_i y_{ki}', \quad C_{YYk} = \sum_{i: z_i = k} y_{ki} y_{ki}', \quad n_k = \sum_{i=1}^n \mathbf{1}(z_i = k). \] Completing the square gives the quadratic form used in every \(\Psi_k\) and \(\Lambda_k\) update: \[ \sum_{i: z_i = k} (x_i - \tilde{\Lambda}_k \tilde{y}_{ki})(x_i - \tilde{\Lambda}_k \tilde{y}_{ki})' = C_{XXk} - 2 C_{X\tilde{Y}k} \tilde{\Lambda}_k' + \tilde{\Lambda}_k C_{\tilde{Y}\tilde{Y}k} \tilde{\Lambda}_k'. \] ## Latent scores, means, and mixture weights Given allocations, the supplement gives the following Gibbs updates. Latent scores for allocated observations: \[ y_{ki} \mid x_i, z_i = k, \Theta \sim N_{q_k}\!\big(D_k (x_i - \mu_k), \Sigma_k\big), \qquad D_k = \Lambda_k' \Psi_k^{-1}, \qquad \Sigma_k = (\Lambda_k' \Psi_k^{-1} \Lambda_k + I_{q_k})^{-1}. \] Unallocated observations draw from the prior: \[ y_i \mid z_i = 0 \sim N(0, I_{q_{\mathrm{new}}}). \] Cluster means: \[ \mu_k \mid \cdots \sim N_p\!\big(\bar{x}, \alpha_1^{-1} \Psi_k\big), \] where \(\bar{x}\) is the prior mean on the working scale. `pgmm_rjmcmc()` centers \(X\) internally so \(\bar{x} = 0\) on the centered scale, matching the augmented loading posterior in Supplement A.1; sampled means are returned on the original data scale. Mixture weights: \[ \tau \mid Z \sim \mathrm{Dirichlet}(n_1 + \gamma, \ldots, n_m + \gamma). \] ## Hyperparameter updates The package samples \((\alpha_1, \alpha_2, \beta)\) with gamma hyperpriors controlled by `d_vec` and `s_vec`. With total latent dimension \(q = \sum_{k=1}^m q_k\), \[ \alpha_1 \mid \cdots \sim \mathrm{Gamma}\!\left( \tfrac{mp}{2} + d_{\alpha_1}, \tfrac{1}{2}\sum_k \mathrm{tr}(\Lambda_k' \Psi_k^{-1} \Lambda_k) + s_{\alpha_1} \right), \] \[ \alpha_2 \mid \cdots \sim \mathrm{Gamma}\!\left( \tfrac{qp}{2} + d_{\alpha_2}, \tfrac{1}{2}\sum_k \mathrm{tr}(\Lambda_k' \Psi_k^{-1} \Lambda_k) + s_{\alpha_2} \right), \] \[ \beta \mid \cdots \sim \mathrm{Gamma}\!\left( mp\delta + d_\beta, \sum_k \mathrm{tr}(\Psi_k^{-1}) + s_\beta \right). \] ## PGMM conditional posteriors (corrected forms) The eight covariance labels differ in whether \(\Lambda_k\), \(\Psi_k\), and the diagonal of \(\Psi_k\) are shared across clusters. The table below records the prior scaling that enters the \(\Psi_k^{-1}\) rate after completing the square. All models use the \(-2\) cross term and \(+1\) quadratic term on \((C_{\tilde{Y}\tilde{Y}k} + \text{prior adjustment})\); model 7 (UUC) is implemented with this corrected form rather than the misprinted coefficients in Appendix A.2. | Model | Shared \(\Lambda\)? | Shared \(\Psi\)? | Isotropic \(\Psi_k\)? | Prior adjustment in \(\Psi_k^{-1}\) rate | |---|---|---|---|---| | CCC | yes | yes | yes | \(C_{\tilde{Y}\tilde{Y}k} + A_k/m\), \(\beta/(mp)\) | | CCU | yes | yes | no | \(C_{\tilde{Y}\tilde{Y}k} + A_k/m\), \(\beta/m\) | | CUC | yes | no | yes | \(C_{\tilde{Y}\tilde{Y}k} + A_k/m\), \(\beta/p\) | | CUU | yes | no | no | \(C_{\tilde{Y}\tilde{Y}k} + A_k/m\), \(\beta\) | | UCC | no | yes | yes | \(C_{\tilde{Y}\tilde{Y}k} + A_k\), \(\beta/(mp)\) | | UCU | no | yes | no | \(C_{\tilde{Y}\tilde{Y}k} + A_k\), \(\beta/m\) | | UUC | no | no | yes | \(C_{\tilde{Y}\tilde{Y}k} + A_k\), \(\beta/p\) | | UUU | no | no | no | \(C_{\tilde{Y}\tilde{Y}k} + A_k\), \(\beta\) | For shared \(\Lambda\), the loading update pools information across clusters, for example in model 1 (CCC): \[ \Lambda \mid \cdots \sim N\!\left( \Big(\sum_k C_{X\tilde{Y}k}\Big) \Big(\sum_k C_{\tilde{Y}\tilde{Y}k} + \tfrac{\alpha_2}{m} I\Big)^{-1}, \;\; \Big(\sum_k C_{\tilde{Y}\tilde{Y}k} + \tfrac{\alpha_2}{m} I\Big)^{-1} \otimes \Psi_k \right), \] with the same \(\Lambda\) copied to every cluster. For cluster-specific loadings (models 5–8), \[ \Lambda_k \mid \cdots \sim N\!\left( C_{X\tilde{Y}k} \big(C_{\tilde{Y}\tilde{Y}k} + \alpha_2 I\big)^{-1}, \; \big(C_{\tilde{Y}\tilde{Y}k} + \alpha_2 I\big)^{-1} \otimes \Psi_k \right). \] Isotropic and shared-\(\Psi\) cases replace matrix Gamma draws by scalar or diagonal Gamma draws with the shape and rate formulas in Appendix A.2; the stay-step sampler in `calculate_post_lambda_psi()` implements these forms. ## RJMCMC moves RJMCMC is used because the number of parameters changes when the number of clusters or the covariance constraint changes. The cluster-number moves are: | Move | Purpose | |---|---| | stay | update parameters without changing \(m\) | | birth | add a new empty cluster | | death | remove an empty cluster | | split | split one occupied cluster into two clusters | | combine | merge two occupied clusters | The covariance-structure moves toggle one constraint at a time: - toggle whether \(\Lambda_k\) is shared across clusters; - toggle whether \(\Psi_k\) is shared across clusters; - toggle whether \(\Psi_k\) is isotropic or diagonal. In `pgmm_rjmcmc()`: - `m_step = 1` enables birth/death moves for \(m\). - `split_combine = 1` adds split/combine moves when `m_step = 1`. - `v_step = 1` enables covariance-constraint moves. For a proposed move from model \(M\) to \(M'\), the RJMCMC acceptance probability has the standard form \[ a = \min\left\{ 1, \frac{ p(X, Z, \Theta', H' \mid M')\,p(M')\,q(M \mid M') }{ p(X, Z, \Theta, H \mid M)\,p(M)\,q(M' \mid M) } \left|J\right| \right\}, \] where \(q(\cdot)\) is the proposal density and \(|J|\) is the Jacobian term for dimension-changing moves such as split/combine. Birth/death and split/combine change \(m\); covariance moves change the constraint label \(v \in \{\mathrm{CCC}, \ldots, \mathrm{UUU}\}\). ### Birth and death Birth adds an empty cluster with weight \(\tau_{m+1} \sim \mathrm{Beta}(1, m)\) and rescales existing weights by \(1 - \tau_{m+1}\). Death removes one empty cluster and reverses the weight transformation. The acceptance ratio includes the Jacobian exponent \(m\) for birth and \(m-1\) for death, as in Green (1995). ### Split and combine Split proposes two new clusters from one occupied component. Let \(\tau_{j^*}\) and \(\mu_{j^*}\) denote the merged weight and mean. With \(\tau_{j_1} = w \tau_{j^*}\), \(\tau_{j_2} = (1-w)\tau_{j^*}\), \(w \sim \mathrm{Beta}(2,2)\), and per-coordinate signs \(s_i \in \{-1,+1\}\), \[ \mu_{j_1,i} = \mu_{j^*,i} + s_i a_{2i} \Sigma_{j^*,i} \sqrt{\tau_{j_2}/\tau_{j_1}}, \qquad \mu_{j_2,i} = \mu_{j^*,i} - s_i a_{2i} \Sigma_{j^*,i} \sqrt{\tau_{j_1}/\tau_{j_2}}, \] with \(a_{2i} \sim \mathrm{Gamma}(1,2)\). The \(p\) independent signs contribute a factor \(2^{-p}\) to the split proposal density. Combine merges two occupied components, runs intermediate Gibbs sweeps, and uses the reverse proposal in the acceptance ratio. ### Covariance-structure moves The `v_step` toggles one letter in the covariance label at a time. Extension proposes new loadings and noise matrices under the less constrained model; contraction proposes under the more constrained model. When the proposal distribution is symmetric in the auxiliary variables, the acceptance ratio reduces to a ratio of posterior ordinals, as noted in Section 4.4 of the paper. ## Interface mapping | Paper notation | Package argument or output | |---|---| | \(X = (x_1,\ldots,x_n)\) | `X`, a \(p \times n\) numeric matrix | | \(m\) | `m_init`, `m_range`, `active_cluster_samples` | | \(q_k\) | `q_new` for newly proposed clusters | | \(z_i\) | `allocation_samples`, `summary$allocation` | | \(\tau_k\) | `tau_samples` | | \(\mu_k\) | `mean_samples` | | \(\Lambda_k\) | `lambda_samples` | | \(\Psi_k\) | `psi_samples` | | covariance model \(v\) | `constraint`, `constraint_samples` | An applied model-selection call usually has the following structure: ```{r, eval = FALSE} fit <- pgmm_rjmcmc( X = your_data_matrix, m_init = 5, m_range = c(1, 10), q_new = 4, burn = 5000, niter = 15000, constraint = model_to_constraint("UUU"), m_step = 1, v_step = 1, split_combine = 1, verbose = FALSE ) ``` ## Citation If you use these methods, cite: Lu, X., Li, Y., & Love, T. (2021). On Bayesian Analysis of Parsimonious Gaussian Mixture Models. *Journal of Classification*, 38, 576-593. https://doi.org/10.1007/s00357-021-09391-8 In R: ```{r} citation("bpgmm") ```