|
⇤ ← Revision 1 as of 2026-02-10 18:07:57
Size: 1443
Comment: Initial commit
|
← Revision 2 as of 2026-04-07 20:56:05 ⇥
Size: 2093
Comment: updates
|
| Deletions are marked like this. | Additions are marked like this. |
| Line 24: | Line 24: |
| A model is trained like: |
|
| Line 25: | Line 27: |
| tree <- rpms(rp_equ=y ~ va + vb + vc, data=data) | tree <- rpms(y ~ va + vb + vc, data=data) |
| Line 28: | Line 30: |
| Note that the dependent variable (`y` in the above example) must be [[R/DataTypes#Numeric|numeric]]; it being any other class leads to error messages like "'list' object cannot be coerced to type 'double'". |
|
| Line 29: | Line 33: |
Using a trained model, the predicted clusters can be attached to a (new) dataset like: {{{ data$node <- end_nodes(tree, newdata=data) }}} Similarly, the predicted outcomes (which are uniform within a predicted cluster) can be attached to a (new) dataset like: {{{ data$prediction <- predict(tree, newdata=data) }}} |
|
| Line 37: | Line 53: |
| tree <- rpms(rp_equ=y ~ va + vb + vc, data=data, weights=~wtvar, strata=~stratavar, cluster=~clustervar) | tree <- rpms(y ~ va + vb + vc, data=data, weights=~wtvar, strata=~stratavar, cluster=~clustervar) |
| Line 44: | Line 60: |
| === Trees === See above examples for usage. |
=== Visualization === |
| Line 51: | Line 65: |
| node_plot(object=tree, node=1, data=data) | node_plot(tree, node=1, data=data) |
| Line 54: | Line 68: |
| The `qtree` function translates a tree into [[LaTeX]] markup, for inclusion in a report. | To render the entire tree, try the `qtree` function. This generates [[LaTeX]] figure markup which can be rendered separately. Note that rendering the figure depends on the `lscape` and `tikz-qtree` packages being included. |
| Line 61: | Line 75: |
| tree <- rpms(rp_equ=y ~ va + vb + vc, data=data) | tree <- rpms(y ~ va + vb + vc, data=data) |
R rpms
rpms is an implementation of decision trees and random forests for R.
Installation
install.packages('rpms')
Usage
A model is trained like:
tree <- rpms(y ~ va + vb + vc, data=data)
Note that the dependent variable (y in the above example) must be numeric; it being any other class leads to error messages like "'list' object cannot be coerced to type 'double'".
Partitions are determined through randomized permutation and hypothesis tests.
Using a trained model, the predicted clusters can be attached to a (new) dataset like:
data$node <- end_nodes(tree, newdata=data)
Similarly, the predicted outcomes (which are uniform within a predicted cluster) can be attached to a (new) dataset like:
data$prediction <- predict(tree, newdata=data)
Options for Complex Survey Design
The hypothesis tests used in this package support complex survey designs.
tree <- rpms(y ~ va + vb + vc, data=data, weights=~wtvar, strata=~stratavar, cluster=~clustervar)
Given clusters, the trees are permuted in a 2 step algorithm: first across clusters and then within clusters. This algorithm does not perform well when the clusters are significantly varying in (effective) size.
Visualization
To plot a specific partition, try:
node_plot(tree, node=1, data=data)
To render the entire tree, try the qtree function. This generates LaTeX figure markup which can be rendered separately. Note that rendering the figure depends on the lscape and tikz-qtree packages being included.
Random Forests
tree <- rpms(y ~ va + vb + vc, data=data)
Uniformly random trees are generated, and then aggregated as a weighted average. The trees are weighted by inverse variance.
