library(rmdl)
#> Loading required package: vctrs
#> Loading required package: tibble
#>
#> Attaching package: 'tibble'
#> The following object is masked from 'package:vctrs':
#>
#> data_frame
Last updated on May 22, 2024, although still is a work-in-progress.
Background
The development of this project was inspired by the a problem of organizing many different hypotheses while writing my thesis, amongst other epidemiology/causal-based modeling problems. I had several data sets and many, many models, and had trouble with how to store and recall them. I wanted an easier way to pull my thoughts together, creating a dynamic structure that would unfold along with the research project itself.
- Identifying core data that will be queried with specific hypotheses
- Ability to handle grouping/strata of that data for subsets of analyses
- Forming hypotheses of multiple outcomes and multiple predictors with an epidemiological angle (e.g. exposures, outcomes, covariates)
- Running and updating tests as the data changes
- Extracting or recalling models as the research project progresses
The API was inspired by several packages and programming examples as below, which in no way are supplanted by this package.
Source | Descriptionn |
---|---|
R4DS | This was the first time I had seen an elegant way of generating multiple models and working with list-columns, particulary the type that could become tidy. |
{modelr} | An example of a package that simplifies modeling in R |
{modelgrid} | A framework for creating and managing multiple models, with a focus on the {caret} package |
{parsnip} | The core of this was based on the single interface for modeling that
tidymodels provides and serves as a foundation for flexible
model definitions |
{stacks} | An influential concept of an API designed for binding together mutliple model definitions, however is meant for a specific formula and pulling together multiple models for blended predictions |
{workflowsets} | This fits multiple models in a workflow to identify a potential “best” model, which is very flexible |
{easystats} | Forms a different “universe” in parallel to the
tidyverse for interpreting results of models, with a focus
on the presentation and exploration of statistical analysis |
{ggdag} | A tidy approach to creating directed acyclic graphs |
Purpose or raison d’être
Machine-learning has expanded at a quickening pace, such as the rapid development the tidymodels universe, where causality-based modeling has seemed to move from center stage in the programming world. Some of the differences that I have seen, which are by no means correct nor exhaustive, are below:
Machine learning | Causal modeling |
---|---|
Large data sets | Smaller data sets |
High-dimensionality | Low-dimensionality |
Model specifications and tuning | Term-selection |
Optimization focused | Hypothesis focused |