---
title: "Model and sampler details"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Model and sampler details}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

This section connects the notation in Lu, Li, and Love (2021) to the `bpgmm`
interface. It describes the fitted model, the covariance labels, and the
RJMCMC switches used by the sampler. Full fitted examples are given in the
getting-started and worked-example vignettes.

## Observation model

Let \(X = (x_1, \ldots, x_n)\) be a matrix of continuous observations with
\(p\) variables and \(n\) observations. `bpgmm` expects this same orientation:
variables in rows and observations in columns.

For cluster \(k\), the mixture of factor analyzers model writes

\[
x_i = \mu_k + \Lambda_k y_{ki} + \epsilon_{ki},
\]

where

\[
y_{ki} \sim N(0, I_{q_k}), \qquad
\epsilon_{ki} \sim N(0, \Psi_k).
\]

After integrating out the latent factor \(y_{ki}\), the cluster-specific
covariance is

\[
\Sigma_k = \Lambda_k \Lambda_k^\top + \Psi_k.
\]

The plot below shows the geometric role of this covariance. The loading matrix
\(\Lambda_k\) controls the dominant low-dimensional direction, while
\(\Psi_k\) adds cluster-specific noise around that subspace.

```{r, fig.width = 5.5, fig.height = 4.5, fig.alt = "Two covariance ellipses illustrating cluster-specific covariance matrices."}
ellipse_points <- function(center, sigma, level = 0.90, n = 120) {
  eig <- eigen(sigma, symmetric = TRUE)
  angles <- seq(0, 2 * pi, length.out = n)
  circle <- rbind(cos(angles), sin(angles))
  radius <- sqrt(qchisq(level, df = 2))
  points <- t(center + radius * eig$vectors %*% diag(sqrt(eig$values), 2) %*% circle)
  colnames(points) <- c("x", "y")
  points
}

lambda_1 <- matrix(c(1.1, 0.4), ncol = 1)
lambda_2 <- matrix(c(0.2, 1.0), ncol = 1)
psi_1 <- diag(c(0.20, 0.08))
psi_2 <- diag(c(0.10, 0.25))
sigma_1 <- lambda_1 %*% t(lambda_1) + psi_1
sigma_2 <- lambda_2 %*% t(lambda_2) + psi_2

ell_1 <- ellipse_points(c(-1.2, -0.5), sigma_1)
ell_2 <- ellipse_points(c(1.1, 0.6), sigma_2)

plot(
  ell_1,
  type = "l",
  lwd = 2,
  col = "#0072B2",
  xlim = c(-3.5, 3.5),
  ylim = c(-2.5, 3),
  xlab = "Variable 1",
  ylab = "Variable 2",
  main = "Mixture-of-factor-analyzers covariance"
)
lines(ell_2, lwd = 2, col = "#D55E00")
points(rbind(c(-1.2, -0.5), c(1.1, 0.6)), pch = 19, col = c("#0072B2", "#D55E00"))
arrows(-1.2, -0.5, -1.2 + lambda_1[1], -0.5 + lambda_1[2], col = "#0072B2", lwd = 2, length = 0.08)
arrows(1.1, 0.6, 1.1 + lambda_2[1], 0.6 + lambda_2[2], col = "#D55E00", lwd = 2, length = 0.08)
legend(
  "topleft",
  legend = c("Cluster 1 covariance", "Cluster 2 covariance", "Loading direction"),
  col = c("#0072B2", "#D55E00", "gray30"),
  lwd = c(2, 2, 2),
  bty = "n"
)
```

The mixture density is

\[
f(x_i) = \sum_{k = 1}^m \tau_k
  N(x_i \mid \mu_k, \Lambda_k \Lambda_k^\top + \Psi_k),
\]

where \(\tau_k\) is the mixture weight and \(m\) is the number of clusters.
Equivalently, with allocation indicator \(z_i \in \{1,\ldots,m\}\),

\[
x_i \mid z_i = k, \Theta
  \sim N_p(\mu_k, \Sigma_k),
\qquad
\Pr(z_i = k \mid \tau) = \tau_k .
\]

In `pgmm_rjmcmc()`:

- `X` is the \(p \times n\) data matrix.
- `m_init` is the starting value of \(m\).
- `m_range` is the allowed range of \(m\).
- `q_new` is the factor dimension assigned to a newly proposed cluster.
- `constraint` is the starting covariance model.

## PGMM covariance constraints

The paper uses three letters to describe the covariance structure. Each letter
is either `C` for constrained or `U` for unconstrained.

| Letter | Meaning when `C` | Meaning when `U` |
|---|---|---|
| 1 | all clusters share one loading matrix \(\Lambda\) | each cluster has \(\Lambda_k\) |
| 2 | all clusters share one noise covariance \(\Psi\) | each cluster has \(\Psi_k\) |
| 3 | noise covariance is isotropic, \(\Psi_k = \psi_k I_p\) | noise covariance is diagonal |

The eight PGMM models are:

```{r}
library(bpgmm)

models <- c("CCC", "CCU", "CUC", "CUU", "UCC", "UCU", "UUC", "UUU")
data.frame(
  model = models,
  constraint = vapply(
    models,
    function(x) paste(model_to_constraint(x), collapse = ","),
    character(1)
  )
)
```

The package uses the numeric encoding internally: `1` means constrained and
`0` means unconstrained.

```{r}
model_to_constraint("UUU")
constraint_to_model(c(1, 1, 0))
```

## Priors and posterior updates

The supplement gives the natural conjugate priors used by the MCMC updates.
The main priors are:

\[
\tau \sim \mathrm{Dirichlet}(\gamma, \ldots, \gamma),
\]

\[
y_{ki} \sim N(0, I_{q_k}),
\]

\[
\mu_k \sim N(\bar{x}, \alpha_1^{-1} \Psi_k),
\]

\[
\lambda_{kj} \sim N(0, \alpha_2^{-1} \Psi_k),
\]

and

\[
\psi_{kj} \sim \mathrm{IG}(\delta, \beta).
\]

The complete parameter state can be read as

\[
\Theta =
\{\tau_k, \mu_k, \Lambda_k, \Psi_k, y_{ki}, z_i:
  k = 1,\ldots,m;\ i = 1,\ldots,n\},
\]

with hyperparameters

\[
H = (\alpha_1, \alpha_2, \beta, \delta, \gamma).
\]

The package samples the hyperparameters \(\alpha_1\), \(\alpha_2\), and
\(\beta\), with gamma hyperpriors controlled by:

- `d_vec`: shape parameters.
- `s_vec`: rate parameters.
- `delta`: inverse-gamma shape for the noise covariance.
- `ggamma`: Dirichlet concentration for mixture weights.

The allocation update uses the posterior probability

\[
p_{ki} =
\frac{
  \tau_k N(x_i \mid \mu_k, \Psi_k + \Lambda_k\Lambda_k^\top)
}{
  \sum_{\ell = 1}^{m}
  \tau_\ell N(x_i \mid \mu_\ell, \Psi_\ell + \Lambda_\ell\Lambda_\ell^\top)
}.
\]

Then

\[
(z_{i1}, \ldots, z_{im}) \mid X, \Theta, H
  \sim \mathrm{Multinomial}(1, p_{1i}, \ldots, p_{mi}).
\]

The allocation update uses this probability, and the joint posterior includes
the product of allocated mixture weights:

\[
p(Z \mid \tau) = \prod_{i = 1}^{n} \tau_{z_i}.
\]

## Augmented representation and sufficient statistics

The sampler augments each observation with latent factors and uses the stacked
notation from the supplement. For cluster \(k\), define

\[
\tilde{y}_{ki} =
\begin{pmatrix} 1 \\ y_{ki} \end{pmatrix},
\qquad
\tilde{\Lambda}_k = (\mu_k, \Lambda_k),
\qquad
A_k = \mathrm{diag}(\alpha_1, \alpha_2 I_{q_k}).
\]

Then the cluster likelihood can be written as

\[
x_i \mid z_i = k, \tilde{\Lambda}_k, \Psi_k
  \sim N_p(\tilde{\Lambda}_k \tilde{y}_{ki}, \Psi_k).
\]

For allocated observations in cluster \(k\), the package accumulates

\[
C_{XXk} = \sum_{i: z_i = k} x_i x_i',
\quad
C_{X\tilde{Y}k} = \sum_{i: z_i = k} x_i \tilde{y}_{ki}',
\quad
C_{\tilde{Y}\tilde{Y}k} = \sum_{i: z_i = k} \tilde{y}_{ki} \tilde{y}_{ki}',
\]

\[
C_{XYk} = \sum_{i: z_i = k} x_i y_{ki}',
\quad
C_{YYk} = \sum_{i: z_i = k} y_{ki} y_{ki}',
\quad
n_k = \sum_{i=1}^n \mathbf{1}(z_i = k).
\]

Completing the square gives the quadratic form used in every \(\Psi_k\) and
\(\Lambda_k\) update:

\[
\sum_{i: z_i = k}
(x_i - \tilde{\Lambda}_k \tilde{y}_{ki})(x_i - \tilde{\Lambda}_k \tilde{y}_{ki})'
=
C_{XXk} - 2 C_{X\tilde{Y}k} \tilde{\Lambda}_k'
  + \tilde{\Lambda}_k C_{\tilde{Y}\tilde{Y}k} \tilde{\Lambda}_k'.
\]

## Latent scores, means, and mixture weights

Given allocations, the supplement gives the following Gibbs updates.

Latent scores for allocated observations:

\[
y_{ki} \mid x_i, z_i = k, \Theta
  \sim N_{q_k}\!\big(D_k (x_i - \mu_k), \Sigma_k\big),
\qquad
D_k = \Lambda_k' \Psi_k^{-1},
\qquad
\Sigma_k = (\Lambda_k' \Psi_k^{-1} \Lambda_k + I_{q_k})^{-1}.
\]

Unallocated observations draw from the prior:

\[
y_i \mid z_i = 0 \sim N(0, I_{q_{\mathrm{new}}}).
\]

Cluster means:

\[
\mu_k \mid \cdots \sim N_p\!\big(\bar{x}, \alpha_1^{-1} \Psi_k\big),
\]

where \(\bar{x}\) is the prior mean on the working scale. `pgmm_rjmcmc()` centers
\(X\) internally so \(\bar{x} = 0\) on the centered scale, matching the
augmented loading posterior in Supplement A.1; sampled means are returned on the
original data scale.

Mixture weights:

\[
\tau \mid Z \sim \mathrm{Dirichlet}(n_1 + \gamma, \ldots, n_m + \gamma).
\]

## Hyperparameter updates

The package samples \((\alpha_1, \alpha_2, \beta)\) with gamma hyperpriors
controlled by `d_vec` and `s_vec`. With total latent dimension
\(q = \sum_{k=1}^m q_k\),

\[
\alpha_1 \mid \cdots \sim
\mathrm{Gamma}\!\left(
  \tfrac{mp}{2} + d_{\alpha_1},
  \tfrac{1}{2}\sum_k \mathrm{tr}(\Lambda_k' \Psi_k^{-1} \Lambda_k) + s_{\alpha_1}
\right),
\]

\[
\alpha_2 \mid \cdots \sim
\mathrm{Gamma}\!\left(
  \tfrac{qp}{2} + d_{\alpha_2},
  \tfrac{1}{2}\sum_k \mathrm{tr}(\Lambda_k' \Psi_k^{-1} \Lambda_k) + s_{\alpha_2}
\right),
\]

\[
\beta \mid \cdots \sim
\mathrm{Gamma}\!\left(
  mp\delta + d_\beta,
  \sum_k \mathrm{tr}(\Psi_k^{-1}) + s_\beta
\right).
\]

## PGMM conditional posteriors (corrected forms)

The eight covariance labels differ in whether \(\Lambda_k\), \(\Psi_k\), and the
diagonal of \(\Psi_k\) are shared across clusters. The table below records the
prior scaling that enters the \(\Psi_k^{-1}\) rate after completing the square.
All models use the \(-2\) cross term and \(+1\) quadratic term on
\((C_{\tilde{Y}\tilde{Y}k} + \text{prior adjustment})\); model 7 (UUC) is
implemented with this corrected form rather than the misprinted coefficients in
Appendix A.2.

| Model | Shared \(\Lambda\)? | Shared \(\Psi\)? | Isotropic \(\Psi_k\)? | Prior adjustment in \(\Psi_k^{-1}\) rate |
|---|---|---|---|---|
| CCC | yes | yes | yes | \(C_{\tilde{Y}\tilde{Y}k} + A_k/m\), \(\beta/(mp)\) |
| CCU | yes | yes | no | \(C_{\tilde{Y}\tilde{Y}k} + A_k/m\), \(\beta/m\) |
| CUC | yes | no | yes | \(C_{\tilde{Y}\tilde{Y}k} + A_k/m\), \(\beta/p\) |
| CUU | yes | no | no | \(C_{\tilde{Y}\tilde{Y}k} + A_k/m\), \(\beta\) |
| UCC | no | yes | yes | \(C_{\tilde{Y}\tilde{Y}k} + A_k\), \(\beta/(mp)\) |
| UCU | no | yes | no | \(C_{\tilde{Y}\tilde{Y}k} + A_k\), \(\beta/m\) |
| UUC | no | no | yes | \(C_{\tilde{Y}\tilde{Y}k} + A_k\), \(\beta/p\) |
| UUU | no | no | no | \(C_{\tilde{Y}\tilde{Y}k} + A_k\), \(\beta\) |

For shared \(\Lambda\), the loading update pools information across clusters, for
example in model 1 (CCC):

\[
\Lambda \mid \cdots \sim
N\!\left(
  \Big(\sum_k C_{X\tilde{Y}k}\Big)
  \Big(\sum_k C_{\tilde{Y}\tilde{Y}k} + \tfrac{\alpha_2}{m} I\Big)^{-1},
  \;\;
  \Big(\sum_k C_{\tilde{Y}\tilde{Y}k} + \tfrac{\alpha_2}{m} I\Big)^{-1}
  \otimes \Psi_k
\right),
\]

with the same \(\Lambda\) copied to every cluster. For cluster-specific
loadings (models 5–8),

\[
\Lambda_k \mid \cdots \sim
N\!\left(
  C_{X\tilde{Y}k}
  \big(C_{\tilde{Y}\tilde{Y}k} + \alpha_2 I\big)^{-1},
  \;
  \big(C_{\tilde{Y}\tilde{Y}k} + \alpha_2 I\big)^{-1}
  \otimes \Psi_k
\right).
\]

Isotropic and shared-\(\Psi\) cases replace matrix Gamma draws by scalar or
diagonal Gamma draws with the shape and rate formulas in Appendix A.2; the
stay-step sampler in `calculate_post_lambda_psi()` implements these forms.

## RJMCMC moves

RJMCMC is used because the number of parameters changes when the number of
clusters or the covariance constraint changes.

The cluster-number moves are:

| Move | Purpose |
|---|---|
| stay | update parameters without changing \(m\) |
| birth | add a new empty cluster |
| death | remove an empty cluster |
| split | split one occupied cluster into two clusters |
| combine | merge two occupied clusters |

The covariance-structure moves toggle one constraint at a time:

- toggle whether \(\Lambda_k\) is shared across clusters;
- toggle whether \(\Psi_k\) is shared across clusters;
- toggle whether \(\Psi_k\) is isotropic or diagonal.

In `pgmm_rjmcmc()`:

- `m_step = 1` enables birth/death moves for \(m\).
- `split_combine = 1` adds split/combine moves when `m_step = 1`.
- `v_step = 1` enables covariance-constraint moves.

For a proposed move from model \(M\) to \(M'\), the RJMCMC acceptance
probability has the standard form

\[
a =
\min\left\{
1,
\frac{
  p(X, Z, \Theta', H' \mid M')\,p(M')\,q(M \mid M')
}{
  p(X, Z, \Theta, H \mid M)\,p(M)\,q(M' \mid M)
}
\left|J\right|
\right\},
\]

where \(q(\cdot)\) is the proposal density and \(|J|\) is the Jacobian term
for dimension-changing moves such as split/combine. Birth/death and
split/combine change \(m\); covariance moves change the constraint label
\(v \in \{\mathrm{CCC}, \ldots, \mathrm{UUU}\}\).

### Birth and death

Birth adds an empty cluster with weight \(\tau_{m+1} \sim \mathrm{Beta}(1, m)\)
and rescales existing weights by \(1 - \tau_{m+1}\). Death removes one empty
cluster and reverses the weight transformation. The acceptance ratio includes
the Jacobian exponent \(m\) for birth and \(m-1\) for death, as in Green (1995).

### Split and combine

Split proposes two new clusters from one occupied component. Let \(\tau_{j^*}\)
and \(\mu_{j^*}\) denote the merged weight and mean. With
\(\tau_{j_1} = w \tau_{j^*}\), \(\tau_{j_2} = (1-w)\tau_{j^*}\),
\(w \sim \mathrm{Beta}(2,2)\), and per-coordinate signs
\(s_i \in \{-1,+1\}\),

\[
\mu_{j_1,i} = \mu_{j^*,i}
  + s_i a_{2i} \Sigma_{j^*,i} \sqrt{\tau_{j_2}/\tau_{j_1}},
\qquad
\mu_{j_2,i} = \mu_{j^*,i}
  - s_i a_{2i} \Sigma_{j^*,i} \sqrt{\tau_{j_1}/\tau_{j_2}},
\]

with \(a_{2i} \sim \mathrm{Gamma}(1,2)\). The \(p\) independent signs contribute
a factor \(2^{-p}\) to the split proposal density. Combine merges two occupied
components, runs intermediate Gibbs sweeps, and uses the reverse proposal in
the acceptance ratio.

### Covariance-structure moves

The `v_step` toggles one letter in the covariance label at a time. Extension
proposes new loadings and noise matrices under the less constrained model;
contraction proposes under the more constrained model. When the proposal
distribution is symmetric in the auxiliary variables, the acceptance ratio
reduces to a ratio of posterior ordinals, as noted in Section 4.4 of the paper.

## Interface mapping

| Paper notation | Package argument or output |
|---|---|
| \(X = (x_1,\ldots,x_n)\) | `X`, a \(p \times n\) numeric matrix |
| \(m\) | `m_init`, `m_range`, `active_cluster_samples` |
| \(q_k\) | `q_new` for newly proposed clusters |
| \(z_i\) | `allocation_samples`, `summary$allocation` |
| \(\tau_k\) | `tau_samples` |
| \(\mu_k\) | `mean_samples` |
| \(\Lambda_k\) | `lambda_samples` |
| \(\Psi_k\) | `psi_samples` |
| covariance model \(v\) | `constraint`, `constraint_samples` |

An applied model-selection call usually has the following structure:

```{r, eval = FALSE}
fit <- pgmm_rjmcmc(
  X = your_data_matrix,
  m_init = 5,
  m_range = c(1, 10),
  q_new = 4,
  burn = 5000,
  niter = 15000,
  constraint = model_to_constraint("UUU"),
  m_step = 1,
  v_step = 1,
  split_combine = 1,
  verbose = FALSE
)
```

## Citation

If you use these methods, cite:

Lu, X., Li, Y., & Love, T. (2021). On Bayesian Analysis of Parsimonious
Gaussian Mixture Models. *Journal of Classification*, 38, 576-593.
https://doi.org/10.1007/s00357-021-09391-8

In R:

```{r}
citation("bpgmm")
```