textreuse - Detect Text Reuse and Document Similarity
Tools for measuring similarity among documents and detecting passages which have been reused. Implements shingled n-gram, skip n-gram, and other tokenizers; similarity/dissimilarity functions; pairwise comparisons; minhash and locality sensitive hashing algorithms; and a version of the Smith-Waterman local alignment algorithm suitable for natural language.
Last updated 1 months ago
peer-reviewed
195 stars 5.42 score 27 dependenciesgendercoder - Recodes Sex/Gender Descriptions into a Standard Set
Provides functions and dictionaries for recoding of freetext gender responses into more consistent categories.
Last updated 7 months ago
gender-diversityozunconf18unconf
46 stars 3.14 score 0 dependenciessofa - Connector to 'CouchDB'
Provides an interface to the 'NoSQL' database 'CouchDB' (<http://couchdb.apache.org>). Methods are provided for managing databases within 'CouchDB', including creating/deleting/updating/transferring, and managing documents within databases. One can connect with a local 'CouchDB' instance, or a remote 'CouchDB' databases such as 'Cloudant'. Documents can be inserted directly from vectors, lists, data.frames, and 'JSON'. Targeted at 'CouchDB' v2 or greater.
Last updated 4 months ago
couchdbdatabasenosqldocumentscloudantcouchdb-client
33 stars 2.82 score 9 dependenciescmmr - CEU Mass Mediator RESTful API
CEU (CEU San Pablo University) Mass Mediator is an on-line tool for aiding researchers in performing metabolite annotation. 'cmmr' (CEU Mass Mediator RESTful API) allows for programmatic access in R: batch search, batch advanced search, MS/MS (tandem mass spectrometry) search, etc. For more information about the API Endpoint please go to <https://github.com/lzyacht/cmmr>.
Last updated 3 years ago
batch-searchceu-mass-mediatormetablomicsms-search
15 stars 2.06 score 19 dependenciesggvolcano - Publication-Ready Volcano Plots
Provides publication-ready volcano plots for visualizing differential expression results, commonly used in RNA-seq and similar analyses. This tool helps create high-quality visual representations of data using the 'ggplot2' framework Wickham (2016) <doi:10.1007/978-3-319-24277-4>.
Last updated 11 days ago
0.09 score 58 dependenciesShinyLink - 'Shiny' Based Record Linkage Tool
A bridge is created between existing robust open-source record linkage algorithms and an urgently needed user-friendly platform that removes financial and technical barriers, setting a new standard for data interoperability in public health and bioinformatics. The 'fastLink' algorithms are used for matching. Ted Enamorado et al. (2019) <doi:10.1017/S0003055418000783>.
Last updated 2 years ago
0.00 score 120 dependenciesbpgmm - Bayesian Model Selection Approach for Parsimonious Gaussian Mixture Models
Model-based clustering using Bayesian parsimonious Gaussian mixture models. MCMC (Markov chain Monte Carlo) are used for parameter estimation. The RJMCMC (Reversible-jump Markov chain Monte Carlo) is used for model selection. GREEN et al. (1995) <doi:10.1093/biomet/82.4.711>.
Last updated 2 years ago
0.00 score 65 dependencies