Random Rotation Ensembles in Python

The paper “Random Rotation Ensembles” by Blaser et al. introduces a simple but powerful machine learning model: train an ensemble of trees on rotations of the data set, then average their predictions. In this article we present a simple Python implementation (bottom of this article), and compare the RandomRotationEnsemble with a few other tree-based regression models:

Model performance

The datasets used to assess model performance were retrieved using sklearn.datasets.fetch_openml. In the figure below \(R^2\) out-of-sample test scores are shown. The models were evaluated using \(10\)-fold cross validation, and the out-of-sample scores are plotted for each data set. A few critical hyperparameters were tuned for each model (e.g. min_samples_leaf).

The RandomRotationEnsemble is certainly competitive, but it’s no magic bullet: on some data sets it outperforms the other models, while on others it’s the worst model.

Model boundaries

To gain some understanding of how the model works, we use \(2\) variables from the us_crime data set and plot the results. An out-of-sample \(R^2\) score is also reported. The RandomRotationEnsemble (code below) yields a smoother estimate of the target variable that is not axis-aligned (unlike the other models). Hyperparameters were tuned for each model on parts on a training set, and the entire data set is plotted using black dots.

Python code for RandomRotationEnsemble

Here is a minimal working Python snippet implementing a Random Rotation Ensemble. I assume that the reader is familiar with the scikit-learn API, and chose to keep the documentation to minimum. If not, take a look at the paper “API design for machine learning software: experiences from the scikit-learn project” by Buitinck et al.

Python implementation: CPython
Python version       : 3.9.4
IPython version      : 7.22.0

sklearn: 1.1.2
numpy  : 1.23.5
scipy  : 1.9.3
import numpy as np
from sklearn.base import RegressorMixin, BaseEstimator
from sklearn.utils import check_X_y, check_array
from scipy.stats import special_ortho_group
from sklearn.tree import DecisionTreeRegressor

class RandomRotationEnsemble(BaseEstimator, RegressorMixin):
    def __init__(
        """An ensemble of DecisionTreeRegressors trained on random rotations.

        n_estimators : int, optional
            The number of estimators in the ensemble.
            The default is 10.
        seed : int, optional
            An integer used to seed random number generation.
            The default is 1.
            Arguments passed to DecisionTreeRegressor.
        self.n_estimators = n_estimators
        self.seed = seed
        self.kwargs = kwargs

        # Create a list of models
        self.models = [
            DecisionTreeRegressor(**self.kwargs, random_state=seed + estimator_num)
            for estimator_num in range(self.n_estimators)

    def get_params(self, deep=True):
        params = self.kwargs
        params["n_estimators"] = self.n_estimators
        params["seed"] = self.seed
        return params

    def set_params(self, **params):
        return type(self)(**params)

    def fit(self, X, y):
        # Check for NaN, infinite values, and so forth
        X, y = check_X_y(X, y)
        num_samples, num_features = X.shape

        # Sample rotation matrices. The first one is always the identity
        self.rotation_matrices = [np.eye(num_features)] + [
                random_state=self.seed + estimator_num
            for estimator_num in range(self.n_estimators - 1)

        # Fit each model
        for (rotmat, model) in zip(self.rotation_matrices, self.models):
            model.fit(X @ rotmat, y)

    def predict(self, X):
        X = check_array(X)

        # Predict using each model, then take mean value over every model
        return np.array(
                model.predict(X @ rotmat)
                for (rotmat, model) in zip(self.rotation_matrices, self.models)

# =====================================================
# ================= EXAMPLE USAGE =====================
# =====================================================
from sklearn.datasets import fetch_california_housing
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV

# Fetch and split data
X, y = fetch_california_housing(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Create model
model = Pipeline(
        ("scaler", StandardScaler()),
        ("estimator", RandomRotationEnsemble(n_estimators=100)),

# Search for best parameters
param_grid = {"esimator__min_samples_leaf": [2**i for i in range(1, 7)]}
search = GridSearchCV(model, param_grid=param_grid, cv=5)
search.fit(X_train, y_train)

# Set parameters and compute score
cross_val_score(model, X_test, y_test, cv=5).mean()  # 0.6675