Model and sampler details

This section connects the notation in Li, Lu, and Love (2021) to the bpgmm interface. It describes the fitted model, the covariance labels, and the RJMCMC switches used by the sampler. Full fitted examples are given in the getting-started and worked-example vignettes.

Observation model

Let $X = (x_1, \ldots, x_n)$ be a matrix of continuous observations with $p$ variables and $n$ observations. bpgmm expects this same orientation: variables in rows and observations in columns.

For cluster $k$, the mixture of factor analyzers model writes

\[ x_i = \mu_k + \Lambda_k y_{ki} + \epsilon_{ki}, \]

where

\[ y_{ki} \sim N(0, I_{q_k}), \qquad \epsilon_{ki} \sim N(0, \Psi_k). \]

After integrating out the latent factor $y_{ki}$, the cluster-specific covariance is

\[ \Sigma_k = \Lambda_k \Lambda_k^\top + \Psi_k. \]

The plot below shows the geometric role of this covariance. The loading matrix $\Lambda_k$ controls the dominant low-dimensional direction, while $\Psi_k$ adds cluster-specific noise around that subspace.

ellipse_points <- function(center, sigma, level = 0.90, n = 120) {
  eig <- eigen(sigma, symmetric = TRUE)
  angles <- seq(0, 2 * pi, length.out = n)
  circle <- rbind(cos(angles), sin(angles))
  radius <- sqrt(qchisq(level, df = 2))
  points <- t(center + radius * eig$vectors %*% diag(sqrt(eig$values), 2) %*% circle)
  colnames(points) <- c("x", "y")
  points
}

lambda_1 <- matrix(c(1.1, 0.4), ncol = 1)
lambda_2 <- matrix(c(0.2, 1.0), ncol = 1)
psi_1 <- diag(c(0.20, 0.08))
psi_2 <- diag(c(0.10, 0.25))
sigma_1 <- lambda_1 %*% t(lambda_1) + psi_1
sigma_2 <- lambda_2 %*% t(lambda_2) + psi_2

ell_1 <- ellipse_points(c(-1.2, -0.5), sigma_1)
ell_2 <- ellipse_points(c(1.1, 0.6), sigma_2)

plot(
  ell_1,
  type = "l",
  lwd = 2,
  col = "#0072B2",
  xlim = c(-3.5, 3.5),
  ylim = c(-2.5, 3),
  xlab = "Variable 1",
  ylab = "Variable 2",
  main = "Mixture-of-factor-analyzers covariance"
)
lines(ell_2, lwd = 2, col = "#D55E00")
points(rbind(c(-1.2, -0.5), c(1.1, 0.6)), pch = 19, col = c("#0072B2", "#D55E00"))
arrows(-1.2, -0.5, -1.2 + lambda_1[1], -0.5 + lambda_1[2], col = "#0072B2", lwd = 2, length = 0.08)
arrows(1.1, 0.6, 1.1 + lambda_2[1], 0.6 + lambda_2[2], col = "#D55E00", lwd = 2, length = 0.08)
legend(
  "topleft",
  legend = c("Cluster 1 covariance", "Cluster 2 covariance", "Loading direction"),
  col = c("#0072B2", "#D55E00", "gray30"),
  lwd = c(2, 2, 2),
  bty = "n"
)

Two covariance ellipses illustrating cluster-specific covariance matrices.

The mixture density is

\[ f(x_i) = \sum_{k = 1}^m \tau_k N(x_i \mid \mu_k, \Lambda_k \Lambda_k^\top + \Psi_k), \]

where $\tau_k$ is the mixture weight and $m$ is the number of clusters. Equivalently, with allocation indicator $z_i \in \{1,\ldots,m\}$,

\[ x_i \mid z_i = k, \Theta \sim N_p(\mu_k, \Sigma_k), \qquad \Pr(z_i = k \mid \tau) = \tau_k . \]

In pgmm_rjmcmc():

X is the $p \times n$ data matrix.
m_init is the starting value of $m$.
m_range is the allowed range of $m$.
q_new is the factor dimension assigned to a newly proposed cluster.
constraint is the starting covariance model.

PGMM covariance constraints

The paper uses three letters to describe the covariance structure. Each letter is either C for constrained or U for unconstrained.

Letter	Meaning when `C`	Meaning when `U`
1	all clusters share one loading matrix $\Lambda$	each cluster has $\Lambda_k$
2	all clusters share one noise covariance $\Psi$	each cluster has $\Psi_k$
3	noise covariance is isotropic, $\Psi_k = \psi_k I_p$	noise covariance is diagonal

The eight PGMM models are:

library(bpgmm)

models <- c("CCC", "CCU", "CUC", "CUU", "UCC", "UCU", "UUC", "UUU")
data.frame(
  model = models,
  constraint = vapply(
    models,
    function(x) paste(model_to_constraint(x), collapse = ","),
    character(1)
  )
)
#>     model constraint
#> CCC   CCC      1,1,1
#> CCU   CCU      1,1,0
#> CUC   CUC      1,0,1
#> CUU   CUU      1,0,0
#> UCC   UCC      0,1,1
#> UCU   UCU      0,1,0
#> UUC   UUC      0,0,1
#> UUU   UUU      0,0,0

The package uses the numeric encoding internally: 1 means constrained and 0 means unconstrained.

model_to_constraint("UUU")
#> [1] 0 0 0
constraint_to_model(c(1, 1, 0))
#> [1] "CCU"

Priors and posterior updates

The supplement gives the natural conjugate priors used by the MCMC updates. The main priors are:

\[ \tau \sim \mathrm{Dirichlet}(\gamma, \ldots, \gamma), \]

\[ y_{ki} \sim N(0, I_{q_k}), \]

\[ \mu_k \sim N(\bar{x}, \alpha_1^{-1} \Psi_k), \]

\[ \lambda_{kj} \sim N(0, \alpha_2^{-1} \Psi_k), \]

and

\[ \psi_{kj} \sim \mathrm{IG}(\delta, \beta). \]

The complete parameter state can be read as

\[ \Theta = \{\tau_k, \mu_k, \Lambda_k, \Psi_k, y_{ki}, z_i: k = 1,\ldots,m;\ i = 1,\ldots,n\}, \]

with hyperparameters

\[ H = (\alpha_1, \alpha_2, \beta, \delta, \gamma). \]

The package samples the hyperparameters $\alpha_1$, $\alpha_2$, and $\beta$, with gamma hyperpriors controlled by:

d_vec: shape parameters.
s_vec: rate parameters.
delta: inverse-gamma shape for the noise covariance.
ggamma: Dirichlet concentration for mixture weights.

The allocation update uses the posterior probability

\[ p_{ki} = \frac{ \tau_k N(x_i \mid \mu_k, \Psi_k + \Lambda_k\Lambda_k^\top) }{ \sum_{\ell = 1}^{m} \tau_\ell N(x_i \mid \mu_\ell, \Psi_\ell + \Lambda_\ell\Lambda_\ell^\top) }. \]

Then

\[ (z_{i1}, \ldots, z_{im}) \mid X, \Theta, H \sim \mathrm{Multinomial}(1, p_{1i}, \ldots, p_{mi}). \]

The allocation update uses this probability, and the joint posterior includes the product of allocated mixture weights:

\[ p(Z \mid \tau) = \prod_{i = 1}^{n} \tau_{z_i}. \]

Augmented representation and sufficient statistics

The sampler augments each observation with latent factors and uses the stacked notation from the supplement. For cluster $k$, define

\[ \tilde{y}_{ki} = \begin{pmatrix} 1 \\ y_{ki} \end{pmatrix}, \qquad \tilde{\Lambda}_k = (\mu_k, \Lambda_k), \qquad A_k = \mathrm{diag}(\alpha_1, \alpha_2 I_{q_k}). \]

Then the cluster likelihood can be written as

\[ x_i \mid z_i = k, \tilde{\Lambda}_k, \Psi_k \sim N_p(\tilde{\Lambda}_k \tilde{y}_{ki}, \Psi_k). \]

For allocated observations in cluster $k$, the package accumulates

\[ C_{XXk} = \sum_{i: z_i = k} x_i x_i', \quad C_{X\tilde{Y}k} = \sum_{i: z_i = k} x_i \tilde{y}_{ki}', \quad C_{\tilde{Y}\tilde{Y}k} = \sum_{i: z_i = k} \tilde{y}_{ki} \tilde{y}_{ki}', \]

\[ C_{XYk} = \sum_{i: z_i = k} x_i y_{ki}', \quad C_{YYk} = \sum_{i: z_i = k} y_{ki} y_{ki}', \quad n_k = \sum_{i=1}^n \mathbf{1}(z_i = k). \]

Completing the square gives the quadratic form used in every $\Psi_k$ and $\Lambda_k$ update:

\[ \sum_{i: z_i = k} (x_i - \tilde{\Lambda}_k \tilde{y}_{ki})(x_i - \tilde{\Lambda}_k \tilde{y}_{ki})' = C_{XXk} - 2 C_{X\tilde{Y}k} \tilde{\Lambda}_k' + \tilde{\Lambda}_k C_{\tilde{Y}\tilde{Y}k} \tilde{\Lambda}_k'. \]

Latent scores, means, and mixture weights

Given allocations, the supplement gives the following Gibbs updates.

Latent scores for allocated observations:

\[ y_{ki} \mid x_i, z_i = k, \Theta \sim N_{q_k}\!\big(D_k (x_i - \mu_k), \Sigma_k\big), \qquad D_k = \Lambda_k' \Psi_k^{-1}, \qquad \Sigma_k = (\Lambda_k' \Psi_k^{-1} \Lambda_k + I_{q_k})^{-1}. \]

Unallocated observations draw from the prior:

\[ y_i \mid z_i = 0 \sim N(0, I_{q_{\mathrm{new}}}). \]

Cluster means:

\[ \mu_k \mid \cdots \sim N_p\!\big(\bar{x}, \alpha_1^{-1} \Psi_k\big), \]

where $\bar{x}$ is the prior mean on the working scale. pgmm_rjmcmc() centers $X$ internally so $\bar{x} = 0$ on the centered scale, matching the augmented loading posterior in Supplement A.1; sampled means are returned on the original data scale.

Mixture weights:

\[ \tau \mid Z \sim \mathrm{Dirichlet}(n_1 + \gamma, \ldots, n_m + \gamma). \]

Hyperparameter updates

The package samples $(\alpha_1, \alpha_2, \beta)$ with gamma hyperpriors controlled by d_vec and s_vec. With total latent dimension $q = \sum_{k=1}^m q_k$,

\[ \alpha_1 \mid \cdots \sim \mathrm{Gamma}\!\left( \tfrac{mp}{2} + d_{\alpha_1}, \tfrac{1}{2}\sum_k \mathrm{tr}(\Lambda_k' \Psi_k^{-1} \Lambda_k) + s_{\alpha_1} \right), \]

\[ \alpha_2 \mid \cdots \sim \mathrm{Gamma}\!\left( \tfrac{qp}{2} + d_{\alpha_2}, \tfrac{1}{2}\sum_k \mathrm{tr}(\Lambda_k' \Psi_k^{-1} \Lambda_k) + s_{\alpha_2} \right), \]

\[ \beta \mid \cdots \sim \mathrm{Gamma}\!\left( mp\delta + d_\beta, \sum_k \mathrm{tr}(\Psi_k^{-1}) + s_\beta \right). \]

PGMM conditional posteriors (corrected forms)

The eight covariance labels differ in whether $\Lambda_k$, $\Psi_k$, and the diagonal of $\Psi_k$ are shared across clusters. The table below records the prior scaling that enters the $\Psi_k^{-1}$ rate after completing the square. All models use the $-2$ cross term and $+1$ quadratic term on $(C_{\tilde{Y}\tilde{Y}k} + \text{prior adjustment})$; model 7 (UUC) is implemented with this corrected form rather than the misprinted coefficients in Appendix A.2.

Model	Shared $\Lambda$?	Shared $\Psi$?	Isotropic $\Psi_k$?	Prior adjustment in $\Psi_k^{-1}$ rate
CCC	yes	yes	yes	$C_{\tilde{Y}\tilde{Y}k} + A_k/m$, $\beta/(mp)$
CCU	yes	yes	no	$C_{\tilde{Y}\tilde{Y}k} + A_k/m$, $\beta/m$
CUC	yes	no	yes	$C_{\tilde{Y}\tilde{Y}k} + A_k/m$, $\beta/p$
CUU	yes	no	no	$C_{\tilde{Y}\tilde{Y}k} + A_k/m$, $\beta$
UCC	no	yes	yes	$C_{\tilde{Y}\tilde{Y}k} + A_k$, $\beta/(mp)$
UCU	no	yes	no	$C_{\tilde{Y}\tilde{Y}k} + A_k$, $\beta/m$
UUC	no	no	yes	$C_{\tilde{Y}\tilde{Y}k} + A_k$, $\beta/p$
UUU	no	no	no	$C_{\tilde{Y}\tilde{Y}k} + A_k$, $\beta$

For shared $\Lambda$, the loading update pools information across clusters, for example in model 1 (CCC):

\[ \Lambda \mid \cdots \sim N\!\left( \Big(\sum_k C_{X\tilde{Y}k}\Big) \Big(\sum_k C_{\tilde{Y}\tilde{Y}k} + \tfrac{\alpha_2}{m} I\Big)^{-1}, \;\; \Big(\sum_k C_{\tilde{Y}\tilde{Y}k} + \tfrac{\alpha_2}{m} I\Big)^{-1} \otimes \Psi_k \right), \]

with the same $\Lambda$ copied to every cluster. For cluster-specific loadings (models 5–8),

\[ \Lambda_k \mid \cdots \sim N\!\left( C_{X\tilde{Y}k} \big(C_{\tilde{Y}\tilde{Y}k} + \alpha_2 I\big)^{-1}, \; \big(C_{\tilde{Y}\tilde{Y}k} + \alpha_2 I\big)^{-1} \otimes \Psi_k \right). \]

Isotropic and shared-$\Psi$ cases replace matrix Gamma draws by scalar or diagonal Gamma draws with the shape and rate formulas in Appendix A.2; the stay-step sampler in calculate_post_lambda_psi() implements these forms.

RJMCMC moves

RJMCMC is used because the number of parameters changes when the number of clusters or the covariance constraint changes.

The cluster-number moves are:

Move	Purpose
stay	update parameters without changing $m$
birth	add a new empty cluster
death	remove an empty cluster
split	split one occupied cluster into two clusters
combine	merge two occupied clusters

The covariance-structure moves toggle one constraint at a time:

toggle whether $\Lambda_k$ is shared across clusters;
toggle whether $\Psi_k$ is shared across clusters;
toggle whether $\Psi_k$ is isotropic or diagonal.

In pgmm_rjmcmc():

m_step = 1 enables birth/death moves for $m$.
split_combine = 1 adds split/combine moves when m_step = 1.
v_step = 1 enables covariance-constraint moves.

For a proposed move from model $M$ to $M'$, the RJMCMC acceptance probability has the standard form

\[ a = \min\left\{ 1, \frac{ p(X, Z, \Theta', H' \mid M')\,p(M')\,q(M \mid M') }{ p(X, Z, \Theta, H \mid M)\,p(M)\,q(M' \mid M) } \left|J\right| \right\}, \]

where $q(\cdot)$ is the proposal density and $|J|$ is the Jacobian term for dimension-changing moves such as split/combine. Birth/death and split/combine change $m$; covariance moves change the constraint label $v \in \{\mathrm{CCC}, \ldots, \mathrm{UUU}\}$.

Birth and death

Birth adds an empty cluster with weight $\tau_{m+1} \sim \mathrm{Beta}(1, m)$ and rescales existing weights by $1 - \tau_{m+1}$. Death removes one empty cluster and reverses the weight transformation. The acceptance ratio includes the Jacobian exponent $m$ for birth and $m-1$ for death, as in Green (1995).

Split and combine

Split proposes two new clusters from one occupied component. Let $\tau_{j^*}$ and $\mu_{j^*}$ denote the merged weight and mean. With $\tau_{j_1} = w \tau_{j^*}$, $\tau_{j_2} = (1-w)\tau_{j^*}$, $w \sim \mathrm{Beta}(2,2)$, and per-coordinate signs $s_i \in \{-1,+1\}$,

\[ \mu_{j_1,i} = \mu_{j^*,i} + s_i a_{2i} \Sigma_{j^*,i} \sqrt{\tau_{j_2}/\tau_{j_1}}, \qquad \mu_{j_2,i} = \mu_{j^*,i} - s_i a_{2i} \Sigma_{j^*,i} \sqrt{\tau_{j_1}/\tau_{j_2}}, \]

with $a_{2i} \sim \mathrm{Gamma}(1,2)$. The $p$ independent signs contribute a factor $2^{-p}$ to the split proposal density. Combine merges two occupied components, runs intermediate Gibbs sweeps, and uses the reverse proposal in the acceptance ratio.

Covariance-structure moves

The v_step toggles one letter in the covariance label at a time. Extension proposes new loadings and noise matrices under the less constrained model; contraction proposes under the more constrained model. When the proposal distribution is symmetric in the auxiliary variables, the acceptance ratio reduces to a ratio of posterior ordinals, as noted in Section 4.4 of the paper.

Interface mapping

Paper notation	Package argument or output
$X = (x_1,\ldots,x_n)$	`X`, a $p \times n$ numeric matrix
$m$	`m_init`, `m_range`, `active_cluster_samples`
$q_k$	`q_new` for newly proposed clusters
$z_i$	`allocation_samples`, `summary$allocation`
$\tau_k$	`tau_samples`
$\mu_k$	`mean_samples`
$\Lambda_k$	`lambda_samples`
$\Psi_k$	`psi_samples`
covariance model $v$	`constraint`, `constraint_samples`

An applied model-selection call usually has the following structure:

fit <- pgmm_rjmcmc(
  X = your_data_matrix,
  m_init = 5,
  m_range = c(1, 10),
  q_new = 4,
  burn = 5000,
  niter = 15000,
  constraint = model_to_constraint("UUU"),
  m_step = 1,
  v_step = 1,
  split_combine = 1,
  verbose = FALSE
)

Citation

If you use these methods, cite:

Li, Y., Lu, X., & Love, T. (2021). On Bayesian Analysis of Parsimonious Gaussian Mixture Models. Journal of Classification, 38, 576-593. https://doi.org/10.1007/s00357-021-09391-8

In R:

citation("bpgmm")
#> To cite package 'bpgmm' in publications use:
#> 
#>   Li Y, Lu X, Love T (2021). On Bayesian Analysis of Parsimonious
#>   Gaussian Mixture Models. Journal of Classification, 38, 576-593.
#>   doi:10.1007/s00357-021-09391-8
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Article{,
#>     title = {On Bayesian Analysis of Parsimonious Gaussian Mixture Models},
#>     author = {Yaoxiang Li and Xiang Lu and Tanzy Love},
#>     journal = {Journal of Classification},
#>     year = {2021},
#>     volume = {38},
#>     pages = {576--593},
#>     doi = {10.1007/s00357-021-09391-8},
#>   }

Letter	Meaning when `C`	Meaning when `U`
1	all clusters share one loading matrix \(\Lambda\)	each cluster has \(\Lambda_k\)
2	all clusters share one noise covariance \(\Psi\)	each cluster has \(\Psi_k\)
3	noise covariance is isotropic, \(\Psi_k = \psi_k I_p\)	noise covariance is diagonal

Model	Shared \(\Lambda\)?	Shared \(\Psi\)?	Isotropic \(\Psi_k\)?	Prior adjustment in \(\Psi_k^{-1}\) rate
CCC	yes	yes	yes	\(C_{\tilde{Y}\tilde{Y}k} + A_k/m\), \(\beta/(mp)\)
CCU	yes	yes	no	\(C_{\tilde{Y}\tilde{Y}k} + A_k/m\), \(\beta/m\)
CUC	yes	no	yes	\(C_{\tilde{Y}\tilde{Y}k} + A_k/m\), \(\beta/p\)
CUU	yes	no	no	\(C_{\tilde{Y}\tilde{Y}k} + A_k/m\), \(\beta\)
UCC	no	yes	yes	\(C_{\tilde{Y}\tilde{Y}k} + A_k\), \(\beta/(mp)\)
UCU	no	yes	no	\(C_{\tilde{Y}\tilde{Y}k} + A_k\), \(\beta/m\)
UUC	no	no	yes	\(C_{\tilde{Y}\tilde{Y}k} + A_k\), \(\beta/p\)
UUU	no	no	no	\(C_{\tilde{Y}\tilde{Y}k} + A_k\), \(\beta\)

Move	Purpose
stay	update parameters without changing \(m\)
birth	add a new empty cluster
death	remove an empty cluster
split	split one occupied cluster into two clusters
combine	merge two occupied clusters

Paper notation	Package argument or output
\(X = (x_1,\ldots,x_n)\)	`X`, a \(p \times n\) numeric matrix
\(m\)	`m_init`, `m_range`, `active_cluster_samples`
\(q_k\)	`q_new` for newly proposed clusters
\(z_i\)	`allocation_samples`, `summary$allocation`
\(\tau_k\)	`tau_samples`
\(\mu_k\)	`mean_samples`
\(\Lambda_k\)	`lambda_samples`
\(\Psi_k\)	`psi_samples`
covariance model \(v\)	`constraint`, `constraint_samples`