Input Sample Generation

Introduction

Why Use Sampled Monte Carlo Input?

Monte Carlo simulations are valuable when model inputs are uncertain, random, or future developments are unknown.
The core idea: run the model many times (called realizations), each with different input values, and analyze the outputs statistically.

This approach allows estimation of metrics such as:

Mean
Standard deviation
Confidence intervals

However, when input variables are continuous, it becomes impossible to test every combination.
Even a discrete mesh grid quickly becomes infeasible — a challenge known as the curse of dimensionality.

Suppose the model has five input parameters (dimensions):

Levels per Parameter	Total Simulations Required
2	32
3	243
4	1,024
10	100,000

Computational cost grows exponentially with the number of parameters and levels.
Goal: select a representative set of input samples that sufficiently explores the parameter space while keeping computation manageable.

How to Sample Monte Carlo Input

Before sampling, each input parameter must be defined, including uncertainty.
Uncertainty is expressed via a distribution function across the parameter range.
If every value is equally likely (or unknown), a uniform distribution is often used.

The SA-Toolbox supports seven distribution types:

distributions

The SA-Toolbox also includes seven sampling methods. The choice affects:

Clustering or gaps in the sample
Number of samples and computational cost
Use of probability distributions
Maximum dimensions supported
Likelihood of artifacts during dataset resampling

Dataset Resampling

“Dataset resampling” refers to reducing an existing dataset by selecting entries based on their original order (e.g., row number) without considering parameter values.
See Step 4 in “Subset / Filter”.

Sampling Methods Overview

The SA-Toolbox provides several parameter sampling methods, divided into three categories:

(Pseudo)Random Methods
Structured Methods
Quasi-Monte Carlo Methods

Random methods: robust and simple, but may cluster and require many samples.

Structured methods: ensure each interval is sampled at least once, reducing gaps.

Quasi-Monte Carlo (QMC): use low-discrepancy sequences for even distributions, deterministic but efficient.

Simple Random Sampling

Classic Monte Carlo sampling selects values from parameter distributions without structure.

Simple Random Sampling

Advantages:

Simple to understand
Respects parameter distributions
Fast sample generation

Disadvantages:

Slow convergence, requires many samples
Uneven coverage
Assumes independent input parameters

References

Knuth, Donald Ervin (1981): The Art of Computer Programming. Volume 2. Seminumerical algorithms. 2. ed. Reading, Mass.: Addison-Wesley.

Grid Sampling

Divide each parameter range into equal intervals.
If \( s \) is the sample size and \( d \) the number of parameters:

\[ n = s^{1/d} \]

Each parameter takes \( n \) discrete values. Total grid points:

\[ (n+1)^d \]

Important: the midpoints of grid cells are sampled, not vertices. If \( s = n^d \), all midpoints are selected (full factorial design).

Advantages:

Systematic coverage
Easy to understand

Disadvantages:

Expensive for high dimensions
Assumes independent inputs

References

Latin Hypercube Sampling (LHS)

Divide parameter space into intervals of equal probability.
Each interval is sampled once, combining parameter values randomly.

LHS Sampling

Advantages:

Better coverage of marginal distributions
Fewer samples than grid
Scales to higher dimensions

Disadvantages:

Assumes independent parameters

References

Helton & Davis (2002, 2003)
SciPy LatinHypercube

Quasi-Monte Carlo Methods

Use Sobol and Halton sequences to minimize gaps. Deterministic, evenly distributed samples.

Sobol Sampling

Uses binary computations and direction numbers for each parameter \(d\)
Sample size ideally a power of 2
Deterministic

Advantages: fast, fewer samples, stable in high dimensions
Disadvantages: assumes independence

Sobol' sequence Sampling

References

SciPy Sobol Sequence Library

Sobol-Scramble

Adds a random component to Sobol sequences
Reduces artifacts, fast, deterministic with randomness

Advantages: stable, computationally cheap
Disadvantages: independence assumption, performance may degrade at high dimensions

References

SciPy Sobol-Scramble Library

Halton

Uses prime bases and radical inverse function
Good uniformity in low dimensions (<10)

Advantages: fast, analytical
Disadvantages: correlations appear in higher dimensions

References

SciPy Halton Library

Stick-Breaking Method

The Stick-Breaking method generates parameters that sum to 1 (e.g., mass fractions).
Values are sampled recursively using beta distribution.

Advantages: handles dependent parameters
Disadvantages: may not respect original distributions

References

Ng, Kai Wang; Tian, Guo-Liang; Tang, Man-Lai (2011): Dirichlet and related distributions. Theory, methods and applications. 1^st ed., Chichester: Wiley.

Sampling Summary

Table: Overview of the main characteristics of six sampling methods
The stick-breaking method has a very specific use case and is therefore not included in this comparison.

Method	Type	Randomness	Space-Filling	Convergence Rate	Advantages	Disadvantages	Use Cases
Simple Random Sampling	Pseudorandom Sampling	Random	Poor	\(N^{-1/2}\)	Very easy to implement; works with any data or model	Poor input space; may miss important features	Quick exploratory runs when model is simple or time is not a constraint
Grid Sampling	Deterministic (Structured)	None	Excellent	Varies with grid resolution	Complete coverage; easy to analyze	Infeasible in high dimensions; no randomness or replication possible	Low-dimensional problems where exhaustive exploration is feasible (e.g., \(\leq 3\)–4 inputs)
Latin Hypercube Sampling	Structured Random Sampling	Stratified Random	Good	Better than random	Good variance reduction; works well with existing datasets	May struggle with highly nonlinear or correlated inputs	Most common choice for SA when sample budget is limited and inputs are moderately correlated
Sobol	Quasi-Monte Carlo	Deterministic	Excellent	\((\log N)^d / N\) (\(\to 1/N\) for small \(d\))	Best in class for space filling; great for high-dimensional SA	No built-in variance estimates; sensitive to input order	High-accuracy SA with smooth models, especially for variance-based GSA
Sobol (scrambled)	Quasi-Monte Carlo (randomized)	Randomized Low-Discrepancy	Excellent	\((\log N)^d / N\) (\(\to 1/N\) for small \(d\))	Allows error estimation; combines low-discrepancy and randomness; enables replication	Slightly more complex to implement	Reliable, replicable GSA with uncertainty estimation (Sobol indices with confidence intervals)
Halton	Quasi-Monte Carlo	Deterministic	Good	\((\log N)^d / N\) (\(\to 1/N\) for small \(d\))	Simple generator; good for small problems	Not robust in high dimensions	Quick SA in low-dimensional inputs when low-discrepancy sequence is needed