library(rmdl)
#> Loading required package: vctrs
#> Loading required package: tibble
#>
#> Attaching package: 'tibble'
#> The following object is masked from 'package:vctrs':
#>
#> data_frameLast updated on May 22, 2024, although still is a work-in-progress.
Background
The development of this project was inspired by the a problem of organizing many different hypotheses while writing my thesis, amongst other epidemiology/causal-based modeling problems. I had several data sets and many, many models, and had trouble with how to store and recall them. I wanted an easier way to pull my thoughts together, creating a dynamic structure that would unfold along with the research project itself.
- Identifying core data that will be queried with specific hypotheses
- Ability to handle grouping/strata of that data for subsets of analyses
- Forming hypotheses of multiple outcomes and multiple predictors with an epidemiological angle (e.g. exposures, outcomes, covariates)
- Running and updating tests as the data changes
- Extracting or recalling models as the research project progresses
The API was inspired by several packages and programming examples as below, which in no way are supplanted by this package.
| Source | Descriptionn |
|---|---|
| R4DS | This was the first time I had seen an elegant way of generating multiple models and working with list-columns, particulary the type that could become tidy. |
| {modelr} | An example of a package that simplifies modeling in R |
| {modelgrid} | A framework for creating and managing multiple models, with a focus on the {caret} package |
| {parsnip} | The core of this was based on the single interface for modeling that
tidymodels provides and serves as a foundation for flexible
model definitions |
| {stacks} | An influential concept of an API designed for binding together mutliple model definitions, however is meant for a specific formula and pulling together multiple models for blended predictions |
| {workflowsets} | This fits multiple models in a workflow to identify a potential “best” model, which is very flexible |
| {easystats} | Forms a different “universe” in parallel to the
tidyverse for interpreting results of models, with a focus
on the presentation and exploration of statistical analysis |
| {ggdag} | A tidy approach to creating directed acyclic graphs |
Purpose or raison d’être
Machine-learning has expanded at a quickening pace, such as the rapid development the tidymodels universe, where causality-based modeling has seemed to move from center stage in the programming world. Some of the differences that I have seen, which are by no means correct nor exhaustive, are below:
| Machine learning | Causal modeling |
|---|---|
| Large data sets | Smaller data sets |
| High-dimensionality | Low-dimensionality |
| Model specifications and tuning | Term-selection |
| Optimization focused | Hypothesis focused |