This section connects the notation
in Li, Lu, and Love (2021) to the bpgmm interface. It
describes the fitted model, the covariance labels, and the RJMCMC
switches used by the sampler. Full fitted examples are given in the
getting-started and worked-example vignettes.
Let \(X = (x_1, \ldots, x_n)\) be a
matrix of continuous observations with \(p\) variables and \(n\) observations. bpgmm
expects this same orientation: variables in rows and observations in
columns.
For cluster \(k\), the mixture of factor analyzers model writes
\[ x_i = \mu_k + \Lambda_k y_{ki} + \epsilon_{ki}, \]
where
\[ y_{ki} \sim N(0, I_{q_k}), \qquad \epsilon_{ki} \sim N(0, \Psi_k). \]
After integrating out the latent factor \(y_{ki}\), the cluster-specific covariance is
\[ \Sigma_k = \Lambda_k \Lambda_k^\top + \Psi_k. \]
The plot below shows the geometric role of this covariance. The loading matrix \(\Lambda_k\) controls the dominant low-dimensional direction, while \(\Psi_k\) adds cluster-specific noise around that subspace.
ellipse_points <- function(center, sigma, level = 0.90, n = 120) {
eig <- eigen(sigma, symmetric = TRUE)
angles <- seq(0, 2 * pi, length.out = n)
circle <- rbind(cos(angles), sin(angles))
radius <- sqrt(qchisq(level, df = 2))
points <- t(center + radius * eig$vectors %*% diag(sqrt(eig$values), 2) %*% circle)
colnames(points) <- c("x", "y")
points
}
lambda_1 <- matrix(c(1.1, 0.4), ncol = 1)
lambda_2 <- matrix(c(0.2, 1.0), ncol = 1)
psi_1 <- diag(c(0.20, 0.08))
psi_2 <- diag(c(0.10, 0.25))
sigma_1 <- lambda_1 %*% t(lambda_1) + psi_1
sigma_2 <- lambda_2 %*% t(lambda_2) + psi_2
ell_1 <- ellipse_points(c(-1.2, -0.5), sigma_1)
ell_2 <- ellipse_points(c(1.1, 0.6), sigma_2)
plot(
ell_1,
type = "l",
lwd = 2,
col = "#0072B2",
xlim = c(-3.5, 3.5),
ylim = c(-2.5, 3),
xlab = "Variable 1",
ylab = "Variable 2",
main = "Mixture-of-factor-analyzers covariance"
)
lines(ell_2, lwd = 2, col = "#D55E00")
points(rbind(c(-1.2, -0.5), c(1.1, 0.6)), pch = 19, col = c("#0072B2", "#D55E00"))
arrows(-1.2, -0.5, -1.2 + lambda_1[1], -0.5 + lambda_1[2], col = "#0072B2", lwd = 2, length = 0.08)
arrows(1.1, 0.6, 1.1 + lambda_2[1], 0.6 + lambda_2[2], col = "#D55E00", lwd = 2, length = 0.08)
legend(
"topleft",
legend = c("Cluster 1 covariance", "Cluster 2 covariance", "Loading direction"),
col = c("#0072B2", "#D55E00", "gray30"),
lwd = c(2, 2, 2),
bty = "n"
)The mixture density is
\[ f(x_i) = \sum_{k = 1}^m \tau_k N(x_i \mid \mu_k, \Lambda_k \Lambda_k^\top + \Psi_k), \]
where \(\tau_k\) is the mixture weight and \(m\) is the number of clusters. Equivalently, with allocation indicator \(z_i \in \{1,\ldots,m\}\),
\[ x_i \mid z_i = k, \Theta \sim N_p(\mu_k, \Sigma_k), \qquad \Pr(z_i = k \mid \tau) = \tau_k . \]
In pgmm_rjmcmc():
X is the \(p \times
n\) data matrix.m_init is the starting value of \(m\).m_range is the allowed range of \(m\).q_new is the factor dimension assigned to a newly
proposed cluster.constraint is the starting covariance model.The paper uses three letters to describe the covariance structure.
Each letter is either C for constrained or U
for unconstrained.
| Letter | Meaning when C |
Meaning when U |
|---|---|---|
| 1 | all clusters share one loading matrix \(\Lambda\) | each cluster has \(\Lambda_k\) |
| 2 | all clusters share one noise covariance \(\Psi\) | each cluster has \(\Psi_k\) |
| 3 | noise covariance is isotropic, \(\Psi_k = \psi_k I_p\) | noise covariance is diagonal |
The eight PGMM models are:
library(bpgmm)
models <- c("CCC", "CCU", "CUC", "CUU", "UCC", "UCU", "UUC", "UUU")
data.frame(
model = models,
constraint = vapply(
models,
function(x) paste(model_to_constraint(x), collapse = ","),
character(1)
)
)
#> model constraint
#> CCC CCC 1,1,1
#> CCU CCU 1,1,0
#> CUC CUC 1,0,1
#> CUU CUU 1,0,0
#> UCC UCC 0,1,1
#> UCU UCU 0,1,0
#> UUC UUC 0,0,1
#> UUU UUU 0,0,0The package uses the numeric encoding internally: 1
means constrained and 0 means unconstrained.
The supplement gives the natural conjugate priors used by the MCMC updates. The main priors are:
\[ \tau \sim \mathrm{Dirichlet}(\gamma, \ldots, \gamma), \]
\[ y_{ki} \sim N(0, I_{q_k}), \]
\[ \mu_k \sim N(\bar{x}, \alpha_1^{-1} \Psi_k), \]
\[ \lambda_{kj} \sim N(0, \alpha_2^{-1} \Psi_k), \]
and
\[ \psi_{kj} \sim \mathrm{IG}(\delta, \beta). \]
The complete parameter state can be read as
\[ \Theta = \{\tau_k, \mu_k, \Lambda_k, \Psi_k, y_{ki}, z_i: k = 1,\ldots,m;\ i = 1,\ldots,n\}, \]
with hyperparameters
\[ H = (\alpha_1, \alpha_2, \beta, \delta, \gamma). \]
The package samples the hyperparameters \(\alpha_1\), \(\alpha_2\), and \(\beta\), with gamma hyperpriors controlled by:
d_vec: shape parameters.s_vec: rate parameters.delta: inverse-gamma shape for the noise
covariance.ggamma: Dirichlet concentration for mixture
weights.The allocation update uses the posterior probability
\[ p_{ki} = \frac{ \tau_k N(x_i \mid \mu_k, \Psi_k + \Lambda_k\Lambda_k^\top) }{ \sum_{\ell = 1}^{m} \tau_\ell N(x_i \mid \mu_\ell, \Psi_\ell + \Lambda_\ell\Lambda_\ell^\top) }. \]
Then
\[ (z_{i1}, \ldots, z_{im}) \mid X, \Theta, H \sim \mathrm{Multinomial}(1, p_{1i}, \ldots, p_{mi}). \]
The allocation update uses this probability, and the joint posterior includes the product of allocated mixture weights:
\[ p(Z \mid \tau) = \prod_{i = 1}^{n} \tau_{z_i}. \]
The sampler augments each observation with latent factors and uses the stacked notation from the supplement. For cluster \(k\), define
\[ \tilde{y}_{ki} = \begin{pmatrix} 1 \\ y_{ki} \end{pmatrix}, \qquad \tilde{\Lambda}_k = (\mu_k, \Lambda_k), \qquad A_k = \mathrm{diag}(\alpha_1, \alpha_2 I_{q_k}). \]
Then the cluster likelihood can be written as
\[ x_i \mid z_i = k, \tilde{\Lambda}_k, \Psi_k \sim N_p(\tilde{\Lambda}_k \tilde{y}_{ki}, \Psi_k). \]
For allocated observations in cluster \(k\), the package accumulates
\[ C_{XXk} = \sum_{i: z_i = k} x_i x_i', \quad C_{X\tilde{Y}k} = \sum_{i: z_i = k} x_i \tilde{y}_{ki}', \quad C_{\tilde{Y}\tilde{Y}k} = \sum_{i: z_i = k} \tilde{y}_{ki} \tilde{y}_{ki}', \]
\[ C_{XYk} = \sum_{i: z_i = k} x_i y_{ki}', \quad C_{YYk} = \sum_{i: z_i = k} y_{ki} y_{ki}', \quad n_k = \sum_{i=1}^n \mathbf{1}(z_i = k). \]
Completing the square gives the quadratic form used in every \(\Psi_k\) and \(\Lambda_k\) update:
\[ \sum_{i: z_i = k} (x_i - \tilde{\Lambda}_k \tilde{y}_{ki})(x_i - \tilde{\Lambda}_k \tilde{y}_{ki})' = C_{XXk} - 2 C_{X\tilde{Y}k} \tilde{\Lambda}_k' + \tilde{\Lambda}_k C_{\tilde{Y}\tilde{Y}k} \tilde{\Lambda}_k'. \]
Given allocations, the supplement gives the following Gibbs updates.
Latent scores for allocated observations:
\[ y_{ki} \mid x_i, z_i = k, \Theta \sim N_{q_k}\!\big(D_k (x_i - \mu_k), \Sigma_k\big), \qquad D_k = \Lambda_k' \Psi_k^{-1}, \qquad \Sigma_k = (\Lambda_k' \Psi_k^{-1} \Lambda_k + I_{q_k})^{-1}. \]
Unallocated observations draw from the prior:
\[ y_i \mid z_i = 0 \sim N(0, I_{q_{\mathrm{new}}}). \]
Cluster means:
\[ \mu_k \mid \cdots \sim N_p\!\big(\bar{x}, \alpha_1^{-1} \Psi_k\big), \]
where \(\bar{x}\) is the prior mean
on the working scale. pgmm_rjmcmc() centers \(X\) internally so \(\bar{x} = 0\) on the centered scale,
matching the augmented loading posterior in Supplement A.1; sampled
means are returned on the original data scale.
Mixture weights:
\[ \tau \mid Z \sim \mathrm{Dirichlet}(n_1 + \gamma, \ldots, n_m + \gamma). \]
The package samples \((\alpha_1, \alpha_2,
\beta)\) with gamma hyperpriors controlled by d_vec
and s_vec. With total latent dimension \(q = \sum_{k=1}^m q_k\),
\[ \alpha_1 \mid \cdots \sim \mathrm{Gamma}\!\left( \tfrac{mp}{2} + d_{\alpha_1}, \tfrac{1}{2}\sum_k \mathrm{tr}(\Lambda_k' \Psi_k^{-1} \Lambda_k) + s_{\alpha_1} \right), \]
\[ \alpha_2 \mid \cdots \sim \mathrm{Gamma}\!\left( \tfrac{qp}{2} + d_{\alpha_2}, \tfrac{1}{2}\sum_k \mathrm{tr}(\Lambda_k' \Psi_k^{-1} \Lambda_k) + s_{\alpha_2} \right), \]
\[ \beta \mid \cdots \sim \mathrm{Gamma}\!\left( mp\delta + d_\beta, \sum_k \mathrm{tr}(\Psi_k^{-1}) + s_\beta \right). \]
The eight covariance labels differ in whether \(\Lambda_k\), \(\Psi_k\), and the diagonal of \(\Psi_k\) are shared across clusters. The table below records the prior scaling that enters the \(\Psi_k^{-1}\) rate after completing the square. All models use the \(-2\) cross term and \(+1\) quadratic term on \((C_{\tilde{Y}\tilde{Y}k} + \text{prior adjustment})\); model 7 (UUC) is implemented with this corrected form rather than the misprinted coefficients in Appendix A.2.
| Model | Shared \(\Lambda\)? | Shared \(\Psi\)? | Isotropic \(\Psi_k\)? | Prior adjustment in \(\Psi_k^{-1}\) rate |
|---|---|---|---|---|
| CCC | yes | yes | yes | \(C_{\tilde{Y}\tilde{Y}k} + A_k/m\), \(\beta/(mp)\) |
| CCU | yes | yes | no | \(C_{\tilde{Y}\tilde{Y}k} + A_k/m\), \(\beta/m\) |
| CUC | yes | no | yes | \(C_{\tilde{Y}\tilde{Y}k} + A_k/m\), \(\beta/p\) |
| CUU | yes | no | no | \(C_{\tilde{Y}\tilde{Y}k} + A_k/m\), \(\beta\) |
| UCC | no | yes | yes | \(C_{\tilde{Y}\tilde{Y}k} + A_k\), \(\beta/(mp)\) |
| UCU | no | yes | no | \(C_{\tilde{Y}\tilde{Y}k} + A_k\), \(\beta/m\) |
| UUC | no | no | yes | \(C_{\tilde{Y}\tilde{Y}k} + A_k\), \(\beta/p\) |
| UUU | no | no | no | \(C_{\tilde{Y}\tilde{Y}k} + A_k\), \(\beta\) |
For shared \(\Lambda\), the loading update pools information across clusters, for example in model 1 (CCC):
\[ \Lambda \mid \cdots \sim N\!\left( \Big(\sum_k C_{X\tilde{Y}k}\Big) \Big(\sum_k C_{\tilde{Y}\tilde{Y}k} + \tfrac{\alpha_2}{m} I\Big)^{-1}, \;\; \Big(\sum_k C_{\tilde{Y}\tilde{Y}k} + \tfrac{\alpha_2}{m} I\Big)^{-1} \otimes \Psi_k \right), \]
with the same \(\Lambda\) copied to every cluster. For cluster-specific loadings (models 5–8),
\[ \Lambda_k \mid \cdots \sim N\!\left( C_{X\tilde{Y}k} \big(C_{\tilde{Y}\tilde{Y}k} + \alpha_2 I\big)^{-1}, \; \big(C_{\tilde{Y}\tilde{Y}k} + \alpha_2 I\big)^{-1} \otimes \Psi_k \right). \]
Isotropic and shared-\(\Psi\) cases
replace matrix Gamma draws by scalar or diagonal Gamma draws with the
shape and rate formulas in Appendix A.2; the stay-step sampler in
calculate_post_lambda_psi() implements these forms.
RJMCMC is used because the number of parameters changes when the number of clusters or the covariance constraint changes.
The cluster-number moves are:
| Move | Purpose |
|---|---|
| stay | update parameters without changing \(m\) |
| birth | add a new empty cluster |
| death | remove an empty cluster |
| split | split one occupied cluster into two clusters |
| combine | merge two occupied clusters |
The covariance-structure moves toggle one constraint at a time:
In pgmm_rjmcmc():
m_step = 1 enables birth/death moves for \(m\).split_combine = 1 adds split/combine moves when
m_step = 1.v_step = 1 enables covariance-constraint moves.For a proposed move from model \(M\) to \(M'\), the RJMCMC acceptance probability has the standard form
\[ a = \min\left\{ 1, \frac{ p(X, Z, \Theta', H' \mid M')\,p(M')\,q(M \mid M') }{ p(X, Z, \Theta, H \mid M)\,p(M)\,q(M' \mid M) } \left|J\right| \right\}, \]
where \(q(\cdot)\) is the proposal density and \(|J|\) is the Jacobian term for dimension-changing moves such as split/combine. Birth/death and split/combine change \(m\); covariance moves change the constraint label \(v \in \{\mathrm{CCC}, \ldots, \mathrm{UUU}\}\).
Birth adds an empty cluster with weight \(\tau_{m+1} \sim \mathrm{Beta}(1, m)\) and rescales existing weights by \(1 - \tau_{m+1}\). Death removes one empty cluster and reverses the weight transformation. The acceptance ratio includes the Jacobian exponent \(m\) for birth and \(m-1\) for death, as in Green (1995).
Split proposes two new clusters from one occupied component. Let \(\tau_{j^*}\) and \(\mu_{j^*}\) denote the merged weight and mean. With \(\tau_{j_1} = w \tau_{j^*}\), \(\tau_{j_2} = (1-w)\tau_{j^*}\), \(w \sim \mathrm{Beta}(2,2)\), and per-coordinate signs \(s_i \in \{-1,+1\}\),
\[ \mu_{j_1,i} = \mu_{j^*,i} + s_i a_{2i} \Sigma_{j^*,i} \sqrt{\tau_{j_2}/\tau_{j_1}}, \qquad \mu_{j_2,i} = \mu_{j^*,i} - s_i a_{2i} \Sigma_{j^*,i} \sqrt{\tau_{j_1}/\tau_{j_2}}, \]
with \(a_{2i} \sim \mathrm{Gamma}(1,2)\). The \(p\) independent signs contribute a factor \(2^{-p}\) to the split proposal density. Combine merges two occupied components, runs intermediate Gibbs sweeps, and uses the reverse proposal in the acceptance ratio.
The v_step toggles one letter in the covariance label at
a time. Extension proposes new loadings and noise matrices under the
less constrained model; contraction proposes under the more constrained
model. When the proposal distribution is symmetric in the auxiliary
variables, the acceptance ratio reduces to a ratio of posterior
ordinals, as noted in Section 4.4 of the paper.
| Paper notation | Package argument or output |
|---|---|
| \(X = (x_1,\ldots,x_n)\) | X, a \(p \times n\)
numeric matrix |
| \(m\) | m_init, m_range,
active_cluster_samples |
| \(q_k\) | q_new for newly proposed clusters |
| \(z_i\) | allocation_samples,
summary$allocation |
| \(\tau_k\) | tau_samples |
| \(\mu_k\) | mean_samples |
| \(\Lambda_k\) | lambda_samples |
| \(\Psi_k\) | psi_samples |
| covariance model \(v\) | constraint, constraint_samples |
An applied model-selection call usually has the following structure:
If you use these methods, cite:
Li, Y., Lu, X., & Love, T. (2021). On Bayesian Analysis of Parsimonious Gaussian Mixture Models. Journal of Classification, 38, 576-593. https://doi.org/10.1007/s00357-021-09391-8
In R:
citation("bpgmm")
#> To cite package 'bpgmm' in publications use:
#>
#> Li Y, Lu X, Love T (2021). On Bayesian Analysis of Parsimonious
#> Gaussian Mixture Models. Journal of Classification, 38, 576-593.
#> doi:10.1007/s00357-021-09391-8
#>
#> A BibTeX entry for LaTeX users is
#>
#> @Article{,
#> title = {On Bayesian Analysis of Parsimonious Gaussian Mixture Models},
#> author = {Yaoxiang Li and Xiang Lu and Tanzy Love},
#> journal = {Journal of Classification},
#> year = {2021},
#> volume = {38},
#> pages = {576--593},
#> doi = {10.1007/s00357-021-09391-8},
#> }