textreuse - Detect Text Reuse and Document Similarity
Tools for measuring similarity among documents and detecting passages which have been reused. Implements shingled n-gram, skip n-gram, and other tokenizers; similarity/dissimilarity functions; pairwise comparisons; minhash and locality sensitive hashing algorithms; and a version of the Smith-Waterman local alignment algorithm suitable for natural language.
Last updated 4 months ago
peer-reviewed
9.01 score 197 stars 216 scripts 473 downloadssofa - Connector to 'CouchDB'
Provides an interface to the 'NoSQL' database 'CouchDB' (<http://couchdb.apache.org>). Methods are provided for managing databases within 'CouchDB', including creating/deleting/updating/transferring, and managing documents within databases. One can connect with a local 'CouchDB' instance, or a remote 'CouchDB' databases such as 'Cloudant'. Documents can be inserted directly from vectors, lists, data.frames, and 'JSON'. Targeted at 'CouchDB' v2 or greater.
Last updated 6 months ago
couchdbdatabasenosqldocumentscloudantcouchdb-client
7.43 score 33 stars 54 scripts 568 downloadsmedrxivr - Access and Search MedRxiv and BioRxiv Preprint Data
An increasingly important source of health-related bibliographic content are preprints - preliminary versions of research articles that have yet to undergo peer review. The two preprint repositories most relevant to health-related sciences are medRxiv <https://www.medrxiv.org/> and bioRxiv <https://www.biorxiv.org/>, both of which are operated by the Cold Spring Harbor Laboratory. 'medrxivr' provides programmatic access to the 'Cold Spring Harbour Laboratory (CSHL)' API <https://api.biorxiv.org/>, allowing users to easily download medRxiv and bioRxiv preprint metadata (e.g. title, abstract, publication date, author list, etc) into R. 'medrxivr' also provides functions to search the downloaded preprint records using regular expressions and Boolean logic, as well as helper functions that allow users to export their search results to a .BIB file for easy import to a reference manager and to download the full-text PDFs of preprints matching their search criteria.
Last updated 16 days ago
bibliographic-databasebiorxivevidence-synthesismedrxiv-datapeer-reviewedpreprint-recordssystematic-reviews
7.01 score 52 stars 44 scripts 341 downloadsgendercoder - Recodes Sex/Gender Descriptions into a Standard Set
Provides functions and dictionaries for recoding of freetext gender responses into more consistent categories.
Last updated 9 months ago
gender-diversityozunconf18unconf
6.33 score 46 stars 42 scriptscmmr - CEU Mass Mediator RESTful API
CEU (CEU San Pablo University) Mass Mediator is an on-line tool for aiding researchers in performing metabolite annotation. 'cmmr' (CEU Mass Mediator RESTful API) allows for programmatic access in R: batch search, batch advanced search, MS/MS (tandem mass spectrometry) search, etc. For more information about the API Endpoint please go to <https://github.com/YaoxiangLi/cmmr>.
Last updated 1 months ago
batch-searchceu-mass-mediatormetablomicsms-search
4.76 score 16 stars 12 scripts 184 downloadsShinyLink - 'Shiny' Based Record Linkage Tool
A bridge is created between existing robust open-source record linkage algorithms and an urgently needed user-friendly platform that removes financial and technical barriers, setting a new standard for data interoperability in public health and bioinformatics. The 'fastLink' algorithms are used for matching. Ted Enamorado et al. (2019) <doi:10.1017/S0003055418000783>.
Last updated 2 years ago
2.70 score 6 scripts 165 downloadsggvolcano - Publication-Ready Volcano Plots
Provides publication-ready volcano plots for visualizing differential expression results, commonly used in RNA-seq and similar analyses. This tool helps create high-quality visual representations of data using the 'ggplot2' framework Wickham (2016) <doi:10.1007/978-3-319-24277-4>.
Last updated 3 months ago
2.26 score 36 scripts 363 downloadsbpgmm - Bayesian Model Selection Approach for Parsimonious Gaussian Mixture Models
Model-based clustering using Bayesian parsimonious Gaussian mixture models. MCMC (Markov chain Monte Carlo) are used for parameter estimation. The RJMCMC (Reversible-jump Markov chain Monte Carlo) is used for model selection. GREEN et al. (1995) <doi:10.1093/biomet/82.4.711>.
Last updated 2 years ago
1.00 score 231 downloads