textreuse - Detect Text Reuse and Document Similarity
Tools for measuring similarity among documents and detecting passages which have been reused. Implements shingled n-gram, skip n-gram, and other tokenizers; similarity/dissimilarity functions; pairwise comparisons; minhash and locality sensitive hashing algorithms; and a version of the Smith-Waterman local alignment algorithm suitable for natural language.
Last updated 23 days ago
9.27 score 198 stars 226 scripts 642 downloads
sofa - Connector to 'CouchDB'
Provides an interface to the 'NoSQL' database 'CouchDB' (<http://couchdb.apache.org>). Methods are provided for managing databases within 'CouchDB', including creating/deleting/updating/transferring, and managing documents within databases. One can connect with a local 'CouchDB' instance, or a remote 'CouchDB' databases such as 'Cloudant'. Documents can be inserted directly from vectors, lists, data.frames, and 'JSON'. Targeted at 'CouchDB' v2 or greater.
Last updated 1 months ago
7.51 score 33 stars 54 scripts 674 downloads
medrxivr - Access and Search MedRxiv and BioRxiv Preprint Data
An increasingly important source of health-related bibliographic content are preprints - preliminary versions of research articles that have yet to undergo peer review. The two preprint repositories most relevant to health-related sciences are medRxiv <https://www.medrxiv.org/> and bioRxiv <https://www.biorxiv.org/>, both of which are operated by the Cold Spring Harbor Laboratory. 'medrxivr' provides programmatic access to the 'Cold Spring Harbour Laboratory (CSHL)' API <https://api.biorxiv.org/>, allowing users to easily download medRxiv and bioRxiv preprint metadata (e.g. title, abstract, publication date, author list, etc) into R. 'medrxivr' also provides functions to search the downloaded preprint records using regular expressions and Boolean logic, as well as helper functions that allow users to export their search results to a .BIB file for easy import to a reference manager and to download the full-text PDFs of preprints matching their search criteria.
Last updated 28 days ago
7.16 score 55 stars 44 scripts 301 downloadsgendercoder - Recodes Sex/Gender Descriptions into a Standard Set
Provides functions and dictionaries for recoding of freetext gender responses into more consistent categories.
Last updated 28 days ago
6.66 score 46 stars 45 scriptscmmr - CEU Mass Mediator RESTful API
CEU (CEU San Pablo University) Mass Mediator is an on-line tool for aiding researchers in performing metabolite annotation. 'cmmr' (CEU Mass Mediator RESTful API) allows for programmatic access in R: batch search, batch advanced search, MS/MS (tandem mass spectrometry) search, etc. For more information about the API Endpoint please go to <https://github.com/YaoxiangLi/cmmr>.
Last updated 5 months ago
4.73 score 15 stars 12 scripts 168 downloadsggvolcano - Publication-Ready Volcano Plots
Provides publication-ready volcano plots for visualizing differential expression results, commonly used in RNA-seq and similar analyses. This tool helps create high-quality visual representations of data using the 'ggplot2' framework Wickham (2016) <doi:10.1007/978-3-319-24277-4>.
Last updated 28 days ago
4.32 score 42 scripts 374 downloadsggpca - Publication-Ready PCA, t-SNE, and UMAP Plots
Provides tools for creating publication-ready dimensionality reduction plots, including Principal Component Analysis (PCA), t-Distributed Stochastic Neighbor Embedding (t-SNE), and Uniform Manifold Approximation and Projection (UMAP). This package helps visualize high-dimensional data with options for custom labels, density plots, and faceting, using the 'ggplot2' framework Wickham (2016) <doi:10.1007/978-3-319-24277-4>.
Last updated 2 months ago
2.78 score 2 stars 1 scripts 546 downloadsShinyLink - 'Shiny' Based Record Linkage Tool
A bridge is created between existing robust open-source record linkage algorithms and an urgently needed user-friendly platform that removes financial and technical barriers, setting a new standard for data interoperability in public health and bioinformatics. The 'fastLink' algorithms are used for matching. Ted Enamorado et al. (2019) <doi:10.1017/S0003055418000783>.
Last updated 2 years ago
2.70 score 6 scripts 206 downloadsbpgmm - Bayesian Model Selection Approach for Parsimonious Gaussian Mixture Models
Model-based clustering using Bayesian parsimonious Gaussian mixture models. MCMC (Markov chain Monte Carlo) are used for parameter estimation. The RJMCMC (Reversible-jump Markov chain Monte Carlo) is used for model selection. GREEN et al. (1995) <doi:10.1093/biomet/82.4.711>.
Last updated 3 years ago
1.00 score 240 downloads