RJSONIO - Serialize R Objects to JSON
Converts R objects to and from JavaScript Object Notation (JSON). The package provides a stable interface for reading JSON from strings, files, and connections, and for serializing common R objects, including vectors, lists, data frames, arrays, environments, and S4 objects. It also exposes parser handlers, callbacks, and S4 methods for applications that need customized JSON processing while preserving established RJSONIO behavior.
Last updated
cpp
11.40 score 1 stars 62 dependents 1.9k scripts 20k downloadstextreuse - Detect Text Reuse and Document Similarity
Tools for measuring similarity among documents and detecting passages which have been reused. Implements shingled n-gram, skip n-gram, and other tokenizers; similarity/dissimilarity functions; pairwise comparisons; minhash and locality sensitive hashing algorithms; and a version of the Smith-Waterman local alignment algorithm suitable for natural language.
Last updated
peer-reviewedcpp
8.82 score 201 stars 235 scripts 385 downloads
sofa - Connector to 'CouchDB'
Provides an interface to the 'NoSQL' database 'CouchDB' (<https://couchdb.apache.org/>). Methods are provided for managing databases within 'CouchDB', including creating/deleting/updating/transferring, and managing documents within databases. One can connect with a local 'CouchDB' instance, or a remote 'CouchDB' database such as 'IBM Cloudant'. Documents can be inserted directly from vectors, lists, data.frames, and 'JSON'. Targeted at 'CouchDB' v2 or greater.
Last updated
couchdbdatabasenosqldocumentscloudantcouchdb-client
7.64 score 33 stars 55 scripts 577 downloads
medrxivr - Access and Search MedRxiv and BioRxiv Preprint Data
An increasingly important source of health-related bibliographic content are preprints - preliminary versions of research articles that have yet to undergo peer review. The two preprint repositories most relevant to health-related sciences are medRxiv <https://www.medrxiv.org/> and bioRxiv, both of which are operated by the Cold Spring Harbor Laboratory. 'medrxivr' provides programmatic access to the 'Cold Spring Harbour Laboratory (CSHL)' API <https://api.biorxiv.org/>, allowing users to easily download medRxiv and bioRxiv preprint metadata (e.g. title, abstract, publication date, author list, etc) into R. 'medrxivr' also provides functions to search the downloaded preprint records using regular expressions and Boolean logic, as well as helper functions that allow users to export their search results to a .BIB file for easy import to a reference manager and to download the full-text PDFs of preprints matching their search criteria.
Last updated
bibliographic-databasebiorxivevidence-synthesismedrxiv-datapeer-reviewedpreprint-recordssystematic-reviews
7.36 score 62 stars 62 scripts 66 downloadsgendercoder - Recodes Sex/Gender Descriptions into a Standard Set
Provides dictionary-based tools for recoding free-text gender responses into consistent categories while preserving gender diversity where possible. The package standardises spelling, capitalization, whitespace, and common variants through curated named character-vector dictionaries, supports either detailed or collapsed output categories, and can retain original unmatched responses for manual review. It also includes helpers for creating custom dictionaries from approximate string matches and a local interactive application for recoding uploaded data files.
Last updated
gender-diversityozunconf18unconf
6.77 score 47 stars 57 scriptsbpgmm - Bayesian Model Selection Approach for Parsimonious Gaussian Mixture Models
Model-based clustering using Bayesian parsimonious Gaussian mixture models. MCMC (Markov chain Monte Carlo) are used for parameter estimation. The RJMCMC (Reversible-jump Markov chain Monte Carlo) is used for model selection. GREEN et al. (1995) <doi:10.1093/biomet/82.4.711>.
Last updated
armadilloclusteringclustering-algorithmcppmachine-learningmcmcrjmcmcopenblascpp
5.02 score 1 stars 10 scripts 207 downloadsggvolcano - Publication-Ready Volcano Plots
Provides publication-ready volcano plots for visualizing differential expression results, commonly used in RNA-seq and similar analyses. This tool helps create high-quality visual representations of data using the 'ggplot2' framework Wickham (2016) <doi:10.1007/978-3-319-24277-4>.
Last updated
4.57 score 75 scripts 287 downloadscmmr - CEU Mass Mediator RESTful API
CEU (CEU San Pablo University) Mass Mediator is an on-line tool for aiding researchers in performing metabolite annotation. 'cmmr' (CEU Mass Mediator RESTful API) allows for programmatic access in R: batch search, batch advanced search, MS/MS (tandem mass spectrometry) search, etc. For more information about the API Endpoint please go to <https://github.com/YaoxiangLi/cmmr>.
Last updated
batch-searchceu-mass-mediatormetablomicsms-search
4.43 score 16 stars 17 scripts 198 downloadsggpca - Publication-Ready PCA, t-SNE, and UMAP Plots
Provides tools for creating publication-ready dimensionality reduction plots, including Principal Component Analysis (PCA), t-Distributed Stochastic Neighbor Embedding (t-SNE), and Uniform Manifold Approximation and Projection (UMAP). This package helps visualize high-dimensional data with options for custom labels, density plots, and faceting, using the 'ggplot2' framework Wickham (2016) <doi:10.1007/978-3-319-24277-4>.
Last updated
2.60 score 4 stars 2 scripts 193 downloads