Differences between revisions 1 and 2
Revision 1 as of 2026-02-10 18:07:57
Size: 1443
Comment: Initial commit
Revision 2 as of 2026-04-07 20:56:05
Size: 2093
Comment: updates
Deletions are marked like this. Additions are marked like this.
Line 24: Line 24:
A model is trained like:
Line 25: Line 27:
tree <- rpms(rp_equ=y ~ va + vb + vc, data=data) tree <- rpms(y ~ va + vb + vc, data=data)
Line 28: Line 30:
Note that the dependent variable (`y` in the above example) must be [[R/DataTypes#Numeric|numeric]]; it being any other class leads to error messages like "'list' object cannot be coerced to type 'double'".
Line 29: Line 33:

Using a trained model, the predicted clusters can be attached to a (new) dataset like:

{{{
data$node <- end_nodes(tree, newdata=data)
}}}

Similarly, the predicted outcomes (which are uniform within a predicted cluster) can be attached to a (new) dataset like:

{{{
data$prediction <- predict(tree, newdata=data)
}}}
Line 37: Line 53:
tree <- rpms(rp_equ=y ~ va + vb + vc, data=data, weights=~wtvar, strata=~stratavar, cluster=~clustervar) tree <- rpms(y ~ va + vb + vc, data=data, weights=~wtvar, strata=~stratavar, cluster=~clustervar)
Line 44: Line 60:
=== Trees ===

See above examples for usage.
=== Visualization ===
Line 51: Line 65:
node_plot(object=tree, node=1, data=data) node_plot(tree, node=1, data=data)
Line 54: Line 68:
The `qtree` function translates a tree into [[LaTeX]] markup, for inclusion in a report. To render the entire tree, try the `qtree` function. This generates [[LaTeX]] figure markup which can be rendered separately. Note that rendering the figure depends on the `lscape` and `tikz-qtree` packages being included.
Line 61: Line 75:
tree <- rpms(rp_equ=y ~ va + vb + vc, data=data) tree <- rpms(y ~ va + vb + vc, data=data)

R rpms

rpms is an implementation of decision trees and random forests for R.


Installation

install.packages('rpms')


Usage

A model is trained like:

tree <- rpms(y ~ va + vb + vc, data=data)

Note that the dependent variable (y in the above example) must be numeric; it being any other class leads to error messages like "'list' object cannot be coerced to type 'double'".

Partitions are determined through randomized permutation and hypothesis tests.

Using a trained model, the predicted clusters can be attached to a (new) dataset like:

data$node <- end_nodes(tree, newdata=data)

Similarly, the predicted outcomes (which are uniform within a predicted cluster) can be attached to a (new) dataset like:

data$prediction <- predict(tree, newdata=data)

Options for Complex Survey Design

The hypothesis tests used in this package support complex survey designs.

tree <- rpms(y ~ va + vb + vc, data=data, weights=~wtvar, strata=~stratavar, cluster=~clustervar)

Given clusters, the trees are permuted in a 2 step algorithm: first across clusters and then within clusters. This algorithm does not perform well when the clusters are significantly varying in (effective) size.

Visualization

To plot a specific partition, try:

node_plot(tree, node=1, data=data)

To render the entire tree, try the qtree function. This generates LaTeX figure markup which can be rendered separately. Note that rendering the figure depends on the lscape and tikz-qtree packages being included.

Random Forests

tree <- rpms(y ~ va + vb + vc, data=data)

Uniformly random trees are generated, and then aggregated as a weighted average. The trees are weighted by inverse variance.


CategoryRicottone

R/Rpms (last edited 2026-04-07 20:56:05 by DominicRicottone)