R rpms
rpms is an implementation of decision trees and random forests for R.
Installation
install.packages('rpms')
Usage
A model is trained like:
tree <- rpms(y ~ va + vb + vc, data=data)
Note that the dependent variable (y in the above example) must be numeric; it being any other class leads to error messages like "'list' object cannot be coerced to type 'double'".
Partitions are determined through randomized permutation and hypothesis tests.
Using a trained model, the predicted clusters can be attached to a (new) dataset like:
data$node <- end_nodes(tree, newdata=data)
Similarly, the predicted outcomes (which are uniform within a predicted cluster) can be attached to a (new) dataset like:
data$prediction <- predict(tree, newdata=data)
Options for Complex Survey Design
The hypothesis tests used in this package support complex survey designs.
tree <- rpms(y ~ va + vb + vc, data=data, weights=~wtvar, strata=~stratavar, cluster=~clustervar)
Given clusters, the trees are permuted in a 2 step algorithm: first across clusters and then within clusters. This algorithm does not perform well when the clusters are significantly varying in (effective) size.
Visualization
To plot a specific partition, try:
node_plot(tree, node=1, data=data)
To render the entire tree, try the qtree function. This generates LaTeX figure markup which can be rendered separately. Note that rendering the figure depends on the lscape and tikz-qtree packages being included.
Random Forests
tree <- rpms(y ~ va + vb + vc, data=data)
Uniformly random trees are generated, and then aggregated as a weighted average. The trees are weighted by inverse variance.
