Monte Carlo modeling in Python with probabilit
- 17. October 2025 (modified 27. October 2025)
- #statistics
probabilit is a Python package for Monte Carlo modeling that I helped create. It’s great for prototyping uncertainty calculations in some types of problems, such as those encountered in undergraduate engineering courses. In addition to a high-level modeling language, it contains algorithms for inducing correlations between variables, as well as computing the nearest correlation matrix.
Here is what probabilit can do:
- It has a high-level modeling language that lets us model equations and expressions using probability distributions
- It supports quasi Monte Carlo sampling: Latin Hypercube, Sobol, and Halton sequences
- It can correlate variables, e.g. induce a correlation between a uniform and a normal distribution
Here is what probabilit does not do:
- It’s not suited for modeling complex environments, like multi-agent systems, queues, differential equations, etc.
- It’s not as memory efficient nor as fast as writing pure numpy simulations
This article is a tutorial demonstrating the most important features of probabilit. We’ll be using version 0.1.0. Beware that in future versions some things might change slightly. However, the overall ideas will remain the same, and all the Python code in this tutorial is executable (it was created from a jupyter notebook).
import numpy as np
import matplotlib.pyplot as plt
Problem. Suppose we’re driving down the highway at approximately 100 km/h when suddenly a family of ducks appear and we have to slam the brakes. What is the braking distance?
Deterministic answer.
It turns out that the relevant equation, obtained by realizing that the braking energy must cancel out the kinetic energy, is d = v^2 / (2 * mu * g), where
d= braking distance (m)v= velocity (m/s)mu= coefficient of friction (dimensionless)g= gravitational acceleration (m/s²)
A deterministic answer is straightforward to compute in Python:
def braking_distance(velocity_kmh, mu, g=9.81):
velocity_ms = velocity_kmh / 3.6 # Convert to m/s
return velocity_ms**2 / (2 * mu * g)
velocity_kmh = 100
mu_dry = 0.7 # Coefficient of friction on dry road
distance_dry = braking_distance(velocity_kmh, mu_dry)
print(f"{distance_dry:.1f} meters")
56.2 meters
Probabilistic answer.
With probabilit we can compute braking distance as a distribution. More specifically we’ll be able to draw samples from the braking distance distribution. All we have to do is allow some of the input parameters to be distributions:
- We do not trust the speedometer to be exact, so we set velocity to be a triangular distribution.
- We do not trust that the coefficient of kinetic friction is exactly 0.7, so we use a normal distribution.
These distributions are somewhat arbitrary.
Any distribution in scipy can be used from within probabilit, with the syntax
Distribution("scipy_distr_name", scipy_distr_arg1, scipy_distr_arg2, ...).
from probabilit import Distribution
# Replace velocity and coefficient of friction with distributions
velocity_kmh = Distribution("triang", loc=95, scale=10, c=0.5)
mu_dry = Distribution("norm", loc=0.7, scale=0.03)
# We can use the same function as earlier
distance_dry = braking_distance(velocity_kmh, mu_dry)
# Sample from the answer. This samples the distributions above
# and propagates the samples through the expression defined by the
# 'braking_distance' function. "lhs" = latin hypercube sampling
samples = distance_dry.sample(2500, random_state=42, method="lhs")
plt.figure(figsize=(6, 2.5))
plt.hist(samples, bins="auto", density=True, edgecolor="black")
plt.xlabel("Braking distance")
plt.tight_layout()
plt.show()
import pandas as pd
# We can also get summary statistics
(
pd.Series(samples, name="braking_distance")
.describe(percentiles=[0.01, 0.05, 0.95, 0.99])
.to_frame()
.T.round(1)
)
| count | mean | std | min | 1% | 5% | 50% | 95% | 99% | max | |
|---|---|---|---|---|---|---|---|---|---|---|
| braking_distance | 2500.0 | 56.3 | 3.3 | 47.1 | 49.3 | 50.9 | 56.2 | 62.0 | 64.3 | 72.3 |
With uncertainty added to some of the input parameters, the answer is not nearly as certain as it seemed in the deterministic analysis. We have learned that braking distance could reasonably be anywhere between 51 meters and 62 meters or so. This is the power of (simple) Monte Carlo analysis.
When distance_dry is sampled, the ancestor nodes velocity_kmh and mu_dry are also sampled.
In the process, the attribute samples_ is set on all nodes.
This is useful if we want to inspect the samples drawn on ancestor variables.
For instance, here is a simple sensitivity analysis where we study the effect of the input parameters on the output:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(6, 3), sharey=True)
ax1.scatter(velocity_kmh.samples_, distance_dry.samples_, s=3, zorder=5)
ax2.scatter(mu_dry.samples_, distance_dry.samples_, s=3, zorder=5)
ax1.set_ylabel("Distance")
ax1.set_xlabel("Velocity")
ax2.set_xlabel("Coefficient of friction")
ax1.grid(True, ls="--", zorder=0, alpha=0.5)
ax2.grid(True, ls="--", zorder=0, alpha=0.5)
fig.tight_layout()
plt.show()
Problem. Suppose we save 100 units of money per month for 10 years and put the money in a mutual fund. The yearly interest rate is around 7 % after accounting for inflation. How much money will we have after 10 years?
Probabilistic solution. The interest rate will not be exactly 7 % every year, so we can add some uncertainty. Notice how probabilit expressions can be used in for-loops without any issues.
from probabilit import Lognormal, plot
YEARS = 10
SAVED_PER_MONTH = 100
money_saved = 0
for year in range(YEARS):
interest_rate = Lognormal(mean=1.07, std=0.05)
money_saved = money_saved * interest_rate + SAVED_PER_MONTH * 12
# Sample and plot the result
samples = money_saved.sample(999, random_state=42, method="lhs")
pairgrid = plot(money_saved) # returns a seaborn pairgrid
pairgrid.axes[0, 0].set_xlabel("Money saved")
pairgrid.axes[0, 0].set_ylabel("")
plt.show()
Exercise. Change the example above so that money saved per month is a distribution. Choose values that are realistic for your own financial situation and study how much money you can save over the next decade.
Bird survival
This final example illustrates compound distributions, where one distribution is an argument to another.
Problem. Suppose we have a distribution of eggs per nest for a certain species of bird. We also have a survival percentage for each egg. What is the distribution of the total number of birds per nest that live till adulthood?
Probabilistic solution. We set up separate distributions and pass them as arguments to each other:
# Number of eggs per nest
eggs_per_nest = Distribution("poisson", mu=2)
# Assume we are uncertain of the survival probability too
# According to one source online, around 10-30% of eggs survive
prob_survive = Distribution("beta", a=2, b=2, loc=0.1, scale=0.2)
# Here is the compound distribution, which depends on two distributions
adult_birds_per_nest = Distribution("binom", n=eggs_per_nest, p=prob_survive)
# Calling plot() without sampling will sample a copy of the nodes
plot(
eggs_per_nest,
prob_survive,
adult_birds_per_nest,
sample_kwargs={"random_state": 42, "method": "lhs"},
height=1.5,
aspect=1.2,
);
The discrete nature of the distributions make them hard to visualize without jitter in the default plot call, but hopefully you get the idea.
Exercise. Go to Wikipedia’s examples of compound distributions. Verify computationally that this statement is true: “Compounding a Gaussian distribution with variance distributed according to an exponential distribution yields a Laplace distribution.”
How does probabilit work?
Given an expression, probabilit creates a computational graph that is evaluated lazily.
For instance, the expression Constant(5) ** 2 is converted to Power(Constant(5), Constant(2)), where Power is a Transform node.
Other transform nodes are Exp, Log, Sin, Abs, Add, Divide, etc.
There are also Constant nodes and Distribution nodes.
from probabilit import Constant
expression = Constant(5) ** 2 + Distribution("uniform") / Constant(3)
print(expression)
Add(Power(Constant(5), Constant(2)), Divide(Distribution("uniform"), Constant(3)))
The computational graph can be visualized as a tree:
from probabilit import treeprint
treeprint(expression)
Add
├──Power
│ ├──Constant(5)
│ └──Constant(2)
└──Divide
├──Distribution("uniform")
└──Constant(3)
Not every probabilit expression is a tree.
For instance, if a is a node then b = abs(a) + a creates a graph that is not strictly speaking a tree.
All probabilit graphs are DAGs (directed acyclic graphs), since they cannot contain cycles.
A probabilit expression can be converted to a networkx MultiDiGraph.
import networkx as nx
# Create a MultiDiGraph
G = expression.to_graph()
# Draw information
pos = nx.spring_layout(G, seed=0)
labels = {node: type(node).__name__ for node in G.nodes()}
# Draw the graph
plt.figure(figsize=(6, 3))
nx.draw(
G,
pos,
with_labels=True,
node_color="lightblue",
node_size=500,
font_size=6,
font_weight="bold",
arrows=True,
arrowsize=12,
edge_color="gray",
labels=labels,
)
plt.show()
It’s important to understand that no computation is performed until .sample() is called on a node.
The expression above is just a graph with instructions about how to compute a result.
Once .sample() is called, the actual work is done:
expression.sample(10, random_state=42)
array([25.12484671, 25.31690477, 25.24399798, 25.19955283, 25.05200621,
25.05199817, 25.0193612 , 25.28872538, 25.20037167, 25.23602419])
When the call asks for the sink node expression to be sampled, all ancestors are sampled in turn and results are propagated through the graph.
This has both memory and performance overhead, which probabilit sacrifices in favor of a clean API and a pleasant modeling experience.
Latin HyperCube sampling (LHS)
Probabilit supports all major sampling methods in the scipy quasi-monte carlo module.
Below we show the difference between pseudo-random sampling (method=None) and Latin HyperCube (method="lhs").
uniform = Distribution("uniform")
exponential = Distribution("expon")
# Sample a sum to set `.samples_` on distributions for plotting
(uniform + exponential).sample(500, random_state=42, method=None)
plot(uniform, exponential, height=1.5, aspect=1.2, plot_kws={"s": 5});
(uniform + exponential).sample(500, random_state=42, method="lhs")
plot(uniform, exponential, height=1.5, aspect=1.2, plot_kws={"s": 5});
Notice how the low-discrepancy sequence produced by LHS leads to distributions of samples that looks very much like the theoretical distributions. In many Monte Carlo simulations, using LHS is an excellent choice because we obtain empirical distributions that match the theoretical ones with fewer samples drawn than what we would achieve had we used pseudo-random sequences.
Inducing correlations
Probabilit comes with Correlator objects that can be used to correlate variables.
They work within the framework of the modeling API, or outside of it (where numpy arrays can be used).
This means that these highly performant algorithms can be used outside of the modeling language too.
When all variables are Normal the Cholesky correlator can be a good choice. While it obtains the desired correlation, it does not preserve the marginal distributions. This is a major problem many real-world applications, so be careful when using the Cholesky correlator! Notice in the example below that the exponential distribution contains negative values after inducing correlations.
from probabilit.correlation import Cholesky
# Correlation matrix with desired Pearson correlation of 0.8
corr_mat = np.array([[1.0, 0.8], [0.8, 1.0]])
expression = (uniform + exponential).correlate(uniform, exponential, corr_mat=corr_mat)
expression.sample(500, random_state=42, method="lhs", correlator=Cholesky())
plot(uniform, exponential, height=1.5, aspect=1.2, plot_kws={"s": 5})
# Observed Pearson correlation
np.corrcoef(uniform.samples_, exponential.samples_)
array([[1. , 0.8],
[0.8, 1. ]])
Above, the marginal of the uniform distribution was preserved. This is a happy accident since we defined that distribution before we defined the exponential distribution in the code. If we switch them, then we will preserve the marginals of the exponential but destroy the marginal of the uniform. In general the Cholesky comes with no guarantees of preserving marginals.
Exercise. Switch the order and define the exponential distribution first, then the uniform distribution. Use the Cholesky correlator and notice how the marginal of the uniform is destroyed.
Iman-Conover
The Iman-Conover transformation is almost always a better choice than Cholesky. It leaves the marginal distributions unchanged, while attempting to induce the desired correlation. Iman-Conover is essentially a heuristic where data is transformed to rank-space, then Cholesky is used, then the data is transformed back. The end result is a permutation of the samples in each distribution. The idea is that by choosing a permutation that induces correlation in rank space, the permutation will also get close to the desired correlation in the original space. This typically works pretty well in most cases, but the algorithm comes without any optimality guarantees.
from probabilit.correlation import ImanConover
expression = (uniform + exponential)
expression.correlate(uniform, exponential, corr_mat=corr_mat)
expression.sample(500, random_state=42, method="lhs", correlator=ImanConover())
plot(uniform, exponential, height=1.5, aspect=1.2, plot_kws={"s": 5})
np.corrcoef(uniform.samples_, exponential.samples_)
array([[1. , 0.70443783],
[0.70443783, 1. ]])
Above we do not achieve the exact correlation that we wanted, but we get pretty close. The marginals are unaffected by the transformation.
Inducing correlations by permutation
Since Iman-Conover is just a clever and fast heuristic to pick a good permutation of the samples (rows) within a variable (column) in a matrix, perhaps we can do even better if we search the space of all permutations?
This space is huge, so we can only afford to search a small part of it.
The Permutation correlation does this, and by spending some time on computation we often end up with even better correlations than what Iman-Conover achieves.
Notice below that we get extremely close to the desired correlation.
from probabilit.correlation import Permutation
expression = (uniform + exponential).correlate(uniform, exponential, corr_mat=corr_mat)
correlator = Permutation(iterations=5_000, random_state=42)
expression.sample(500, random_state=42, method="lhs", correlator=correlator)
plot(uniform, exponential, height=1.5, aspect=1.2, plot_kws={"s": 5})
np.corrcoef(uniform.samples_, exponential.samples_)
array([[1. , 0.78002787],
[0.78002787, 1. ]])
Finally, let us compare ImanConover with the Composite algorithm.
The composite approach is what probabilit does by default.
It first uses Iman-Conover to find a good starting point, then proceeds by running Permutation on that starting point for a number of iterations.
variables = [Distribution("lognorm", s=1) for _ in range(10)]
# Set up correlation matrix
corr_mat = np.ones((10, 10)) * 0.8
np.fill_diagonal(corr_mat, 1.0)
expression = sum(variables).correlate(*variables, corr_mat=corr_mat)
# Iman-Conover
correlator = ImanConover()
expression.sample(999, random_state=42, method="lhs", correlator=correlator)
observed_corr = np.corrcoef(np.array([v.samples_ for v in variables]))
RMSE = np.sqrt(np.mean((corr_mat - observed_corr) ** 2))
print(f"RMSE (observed-desired) for Iman-Conover: {RMSE:.4f}")
RMSE (observed-desired) for Iman-Conover: 0.1011
from probabilit.correlation import Composite
# Composite() => ImanConover(), then Permutation()
correlator = Composite(random_state=0, iterations=5_000, tol=0.01)
expression.sample(999, random_state=42, method="lhs", correlator=correlator)
observed_corr = np.corrcoef(np.array([v.samples_ for v in variables]))
RMSE = np.sqrt(np.mean((corr_mat - observed_corr) ** 2))
print(f"RMSE norm (observed-desired) for Composite: {RMSE:.4f}")
RMSE norm (observed-desired) for Composite: 0.0367
If we are willing to spend some additional time on computation (here 2.5 seconds instead of 15 milliseconds), then we can obtain correlations that are much closer to what we desired. Often this is worth it, as speed is rarely a bottleneck in these types of analyses.
Nearest correlation matrix
Just like the ImanConover and Permutation correlator classes can be used without using the modeling API, the nearest_correlation_matrix can also be used without using the modeling API.
This algorithm is useful when users (e.g. a domain expert) proposes correlations, but they end up proposing a correlation matrix that is not valid.
This happens quite frequently, since specifying a high-dimensional, non-trivial correlation matrix that is positive definite is hard.
from probabilit.correlation import nearest_correlation_matrix
rng = np.random.default_rng(0)
n = 5
proposed_corr = rng.uniform(low=-1, high=1, size=(n, n))
# Make it symmetric
proposed_corr = np.triu(proposed_corr) + np.triu(proposed_corr, 1).T
np.fill_diagonal(proposed_corr, 1.0)
(pd.DataFrame(proposed_corr) * 100).round(1)
| 0 | 1 | 2 | 3 | 4 | |
|---|---|---|---|---|---|
| 0 | 100.0 | -46.0 | -91.8 | -96.7 | 62.7 |
| 1 | -46.0 | 100.0 | 45.9 | 8.7 | 87.0 |
| 2 | -91.8 | 45.9 | 100.0 | -93.3 | 45.9 |
| 3 | -96.7 | 8.7 | -93.3 | 100.0 | -15.5 |
| 4 | 62.7 | 87.0 | 45.9 | -15.5 | 100.0 |
np.linalg.eigvals(proposed_corr) # Negative eigenvalues => not positive definite
array([-1.05418258, -0.00675076, 1.41936444, 2.20321725, 2.43835165])
nearest_corr = nearest_correlation_matrix(proposed_corr)
(pd.DataFrame(nearest_corr) * 100).round(1)
| 0 | 1 | 2 | 3 | 4 | |
|---|---|---|---|---|---|
| 0 | 100.0 | -35.0 | -42.5 | -51.7 | 35.2 |
| 1 | -35.0 | 100.0 | 49.9 | 11.4 | 75.0 |
| 2 | -42.5 | 49.9 | 100.0 | -51.0 | 26.8 |
| 3 | -51.7 | 11.4 | -51.0 | 100.0 | -32.1 |
| 4 | 35.2 | 75.0 | 26.8 | -32.1 | 100.0 |
Summary and references
probabilit has two layers:
- A high-level modeling language, built on top of Python
- Low-level functionality for correlating variables, finding nearest correlation matrices, etc
Depending on what kind of Monte Carlo analysis you want to do, one or both (or none!) of these might be useful.
Probabilit is a bit like Tensorflow in that it uses a lazily evaluated computational graph. The idea of building a Monte Carlo analysis tool in this fashion is due to Feda. Integrating LHS and ImanConover into the tool was also his idea. Knut provided feedback on practical methods and distributions used in some types of Monte Carlo analysis, e.g. the importance of Triangular and PERT distributions. The idea behind the Permutation correlation and how to speed it up by not re-calculating the correlation matrix in every iteration (an idea we did not have time to go into in this tutorial) is my own, but these are not novel ideas.
To read more on these subjects, I recommend:
- The probabilit repository. Have a look at the implementations. Create an issue or PR if you find a bug :)
- To find the nearest correlation matrix, we use Equation (3) in the paper An augmented Lagrangian dual approach for the H-weighted nearest correlation matrix problem
- An excellent explanation of the Iman Conover algorithm is found in The geometry of the Iman-Conover transformation
- The book Global Sensitivity Analysis by Saltelli et al. is not directly related to Monte Carlo modeling, but Monte Carlo is often used for sensitivity analysis of models, and therefore this book might be of interest