Package 'ggpca' reference manual

Title:	Publication-Ready PCA, t-SNE, and UMAP Plots
Description:	Provides tools for creating publication-ready dimensionality reduction plots, including Principal Component Analysis (PCA), t-Distributed Stochastic Neighbor Embedding (t-SNE), and Uniform Manifold Approximation and Projection (UMAP). This package helps visualize high-dimensional data with options for custom labels, density plots, and faceting, using the 'ggplot2' framework Wickham (2016) <doi:10.1007/978-3-319-24277-4>.
Authors:	Yaoxiang Li [cre, aut]
Maintainer:	Yaoxiang Li <[email protected]>
License:	GPL-3
Version:	0.1.2
Built:	2025-01-20 01:25:41 UTC
Source:	https://github.com/yaoxiangli/ggpca

Create publication-ready PCA, t-SNE, or UMAP plots

Description

This function generates dimensionality reduction plots (PCA, t-SNE, UMAP) with options for custom labels, titles, density plots, and faceting. It allows users to visualize high-dimensional data using various dimensionality reduction techniques.

Usage

ggpca(
  data,
  metadata_cols,
  mode = c("pca", "tsne", "umap"),
  scale = TRUE,
  x_pc = "PC1",
  y_pc = "PC2",
  color_var = NULL,
  ellipse = TRUE,
  ellipse_level = 0.9,
  ellipse_type = "norm",
  ellipse_alpha = 0.9,
  point_size = 3,
  point_alpha = 0.6,
  facet_var = NULL,
  tsne_perplexity = 30,
  umap_n_neighbors = 15,
  density_plot = "none",
  color_palette = "Set1",
  xlab = NULL,
  ylab = NULL,
  title = NULL,
  subtitle = NULL,
  caption = NULL
)
ggpca(
  data,
  metadata_cols,
  mode = c("pca", "tsne", "umap"),
  scale = TRUE,
  x_pc = "PC1",
  y_pc = "PC2",
  color_var = NULL,
  ellipse = TRUE,
  ellipse_level = 0.9,
  ellipse_type = "norm",
  ellipse_alpha = 0.9,
  point_size = 3,
  point_alpha = 0.6,
  facet_var = NULL,
  tsne_perplexity = 30,
  umap_n_neighbors = 15,
  density_plot = "none",
  color_palette = "Set1",
  xlab = NULL,
  ylab = NULL,
  title = NULL,
  subtitle = NULL,
  caption = NULL
)

Arguments

`data`	A data frame containing the data to be plotted. Must include both feature columns (numeric) and metadata columns (categorical).
`metadata_cols`	A character vector of column names or a numeric vector of column indices for the metadata columns. These columns are used for grouping and faceting.
`mode`	The dimensionality reduction method to use. One of `"pca"` (Principal Component Analysis), `"tsne"` (t-Distributed Stochastic Neighbor Embedding), or `"umap"` (Uniform Manifold Approximation and Projection).
`scale`	Logical indicating whether to scale features (default: `TRUE` for PCA). Not used for `"tsne"` or `"umap"`.
`x_pc`	Name of the principal component or dimension to plot on the x-axis (default: `"PC1"` for PCA).
`y_pc`	Name of the principal component or dimension to plot on the y-axis (default: `"PC2"` for PCA).
`color_var`	(Optional) Name of the column used to color points in the plot. If `NULL`, no color is applied. Supports both discrete and continuous variables. Default: `NULL`.
`ellipse`	Logical indicating whether to add confidence ellipses for groups (only supported for PCA and only if `color_var` is discrete; default: `TRUE`).
`ellipse_level`	Confidence level for ellipses (default: `0.9`).
`ellipse_type`	Type of ellipse to plot, e.g., "norm" for normal distribution (default: `"norm"`).
`ellipse_alpha`	Transparency level for ellipses, where 0 is fully transparent and 1 is fully opaque (default: `0.9`).
`point_size`	Size of the points in the plot (default: `3`).
`point_alpha`	Transparency level for the points, where 0 is fully transparent and 1 is fully opaque (default: `0.6`).
`facet_var`	Formula for faceting the plot (e.g., `Category ~ .`), allowing users to split the plot by different groups.
`tsne_perplexity`	Perplexity parameter for t-SNE, which balances local and global aspects of the data (default: `30`).
`umap_n_neighbors`	Number of neighbors for UMAP, which determines the local structure (default: `15`).
`density_plot`	Controls whether to add density plots for the x, y, or both axes. Accepts one of `"none"`, `"x"`, `"y"`, or `"both"` (default: `"none"`).
`color_palette`	Name of the color palette (used for discrete variables) to use for the plot. Supports `"Set1"`, `"Set2"`, etc. from `RColorBrewer` (default: `"Set1"`).
`xlab`	Custom x-axis label (default: `NULL`, will be auto-generated based on the data).
`ylab`	Custom y-axis label (default: `NULL`, will be auto-generated based on the data).
`title`	Plot title (default: `NULL`).
`subtitle`	Plot subtitle (default: `NULL`).
`caption`	Plot caption (default: `NULL`).

Value

A ggplot2 object representing the dimensionality reduction plot, including scatter plots, optional density plots, and faceting options. The plot can be further customized using ggplot2 functions.

Author(s)

Yaoxiang Li

Examples


# Load dataset
pca_data <- read.csv(system.file("extdata", "example.csv", package = "ggpca"))

# PCA example
p_pca_y_group <- ggpca(
  pca_data,
  metadata_cols = c(1:6),
  mode = "pca",
  color_var = "group",
  ellipse = TRUE,
  density_plot = "y",
  title = "PCA with Y-axis Density Plot",
  subtitle = "Example dataset, colored by group",
  caption = "Data source: Example dataset"
)
print(p_pca_y_group)

# t-SNE example
p_tsne_time <- ggpca(
  pca_data,
  metadata_cols = c(1:6),
  mode = "tsne",
  color_var = "time",
  tsne_perplexity = 30,
  title = "t-SNE Plot of Example Dataset",
  subtitle = "Colored by time",
  caption = "Data source: Example dataset"
)
print(p_tsne_time)

# Load dataset
pca_data <- read.csv(system.file("extdata", "example.csv", package = "ggpca"))

# PCA example
p_pca_y_group <- ggpca(
  pca_data,
  metadata_cols = c(1:6),
  mode = "pca",
  color_var = "group",
  ellipse = TRUE,
  density_plot = "y",
  title = "PCA with Y-axis Density Plot",
  subtitle = "Example dataset, colored by group",
  caption = "Data source: Example dataset"
)
print(p_pca_y_group)

# t-SNE example
p_tsne_time <- ggpca(
  pca_data,
  metadata_cols = c(1:6),
  mode = "tsne",
  color_var = "time",
  tsne_perplexity = 30,
  title = "t-SNE Plot of Example Dataset",
  subtitle = "Colored by time",
  caption = "Data source: Example dataset"
)
print(p_tsne_time)

Process Missing Values in a Data Frame

Description

This function filters columns in a data frame based on a specified threshold for missing values and performs imputation on remaining non-metadata columns using half of the minimum value found in each column. Metadata columns are specified by the user and are exempt from filtering and imputation.

Usage

process_missing_value(data, missing_threshold = 25, metadata_cols = NULL)
process_missing_value(data, missing_threshold = 25, metadata_cols = NULL)

Arguments

`data`	A data frame containing the data to be processed.
`missing_threshold`	A numeric value representing the percentage threshold of missing values which should lead to the removal of a column. Default is 25.
`metadata_cols`	A vector of either column names or indices that should be treated as metadata and thus exempt from missing value filtering and imputation. If NULL, no columns are treated as metadata.

Value

A data frame with filtered and imputed columns as necessary.

Examples

data <- data.frame(
  A = c(1, 2, NA, 4),
  B = c(NA, NA, NA, 4),
  C = c(1, 2, 3, 4)
)
# Process missing values while ignoring column 'C' as metadata
processed_data <- process_missing_value(data, missing_threshold = 50, metadata_cols = "C")
data <- data.frame(
  A = c(1, 2, NA, 4),
  B = c(NA, NA, NA, 4),
  C = c(1, 2, 3, 4)
)
# Process missing values while ignoring column 'C' as metadata
processed_data <- process_missing_value(data, missing_threshold = 50, metadata_cols = "C")

Run the Shiny Application

Description

This function launches the Shiny application with the specified user interface and server function. The function does not return a value but starts the Shiny app, allowing users to interact with it.

Usage

run_app(
  onStart = NULL,
  options = list(),
  enableBookmarking = NULL,
  uiPattern = "/",
  ...
)
run_app(
  onStart = NULL,
  options = list(),
  enableBookmarking = NULL,
  uiPattern = "/",
  ...
)

Arguments

`onStart`	A function that will be called before the app is actually run. This is only needed for `shinyAppObj`, since in the `shinyAppDir` case, a `global.R` file can be used for this purpose.
`options`	Named options that should be passed to the `runApp` call (these can be any of the following: "port", "launch.browser", "host", "quiet", "display.mode" and "test.mode"). You can also specify `width` and `height` parameters which provide a hint to the embedding environment about the ideal height/width for the app.
`enableBookmarking`	Can be one of `"url"`, `"server"`, or `"disable"`. The default value, `NULL`, will respect the setting from any previous calls to `enableBookmarking()`. See `enableBookmarking()` for more information on bookmarking your app.
`uiPattern`	A regular expression that will be applied to each `GET` request to determine whether the `ui` should be used to handle the request. Note that the entire request path must match the regular expression in order for the match to be considered successful.
`...`	Arguments to pass to 'golem_opts'. See '?golem::get_golem_options' for more details.

Value

No return value, called for side effects.

Package 'ggpca'

Help Index

Create publication-ready PCA, t-SNE, or UMAP plots

Description

Usage

Arguments

Value

Author(s)

Examples

Process Missing Values in a Data Frame

Description

Usage

Arguments

Value

Examples

Run the Shiny Application

Description

Usage

Arguments

Value