Skip to content

Input Sample Generation

Introduction

Why Use Sampled Monte Carlo Input?

Monte Carlo simulations are valuable when model inputs are uncertain, random, or future developments are unknown.
The core idea: run the model many times (called realizations), each with different input values, and analyze the outputs statistically.

This approach allows estimation of metrics such as:

  • Mean
  • Standard deviation
  • Confidence intervals

However, when input variables are continuous, it becomes impossible to test every combination.
Even a discrete mesh grid quickly becomes infeasible — a challenge known as the curse of dimensionality.

Suppose the model has five input parameters (dimensions):

Levels per Parameter Total Simulations Required
2 32
3 243
4 1,024
10 100,000

Computational cost grows exponentially with the number of parameters and levels.
Goal: select a representative set of input samples that sufficiently explores the parameter space while keeping computation manageable.


How to Sample Monte Carlo Input

Before sampling, each input parameter must be defined, including uncertainty.
Uncertainty is expressed via a distribution function across the parameter range.
If every value is equally likely (or unknown), a uniform distribution is often used.

The SA-Toolbox supports seven distribution types:

distributions

The SA-Toolbox also includes seven sampling methods. The choice affects:

  • Clustering or gaps in the sample
  • Number of samples and computational cost
  • Use of probability distributions
  • Maximum dimensions supported
  • Likelihood of artifacts during dataset resampling

Dataset Resampling

“Dataset resampling” refers to reducing an existing dataset by selecting entries based on their original order (e.g., row number) without considering parameter values.
See Step 4 in “Subset / Filter”.


Sampling Methods Overview

The SA-Toolbox provides several parameter sampling methods, divided into three categories:

  1. (Pseudo)Random Methods
  2. Structured Methods
  3. Quasi-Monte Carlo Methods

Random methods: robust and simple, but may cluster and require many samples.

Structured methods: ensure each interval is sampled at least once, reducing gaps.

Quasi-Monte Carlo (QMC): use low-discrepancy sequences for even distributions, deterministic but efficient.


Simple Random Sampling

Classic Monte Carlo sampling selects values from parameter distributions without structure.

Simple Random Sampling

Advantages:

  • Simple to understand
  • Respects parameter distributions
  • Fast sample generation

Disadvantages:

  • Slow convergence, requires many samples
  • Uneven coverage
  • Assumes independent input parameters

References

  • Knuth, Donald Ervin (1981): The Art of Computer Programming. Volume 2. Seminumerical algorithms. 2. ed. Reading, Mass.: Addison-Wesley.

Grid Sampling

Divide each parameter range into equal intervals.
If \( s \) is the sample size and \( d \) the number of parameters:

\[ n = s^{1/d} \]

Each parameter takes \( n \) discrete values. Total grid points:

\[ (n+1)^d \]

Important: the midpoints of grid cells are sampled, not vertices. If \( s = n^d \), all midpoints are selected (full factorial design).

Advantages:

  • Systematic coverage
  • Easy to understand

Disadvantages:

  • Expensive for high dimensions
  • Assumes independent inputs

References



Latin Hypercube Sampling (LHS)

Divide parameter space into intervals of equal probability.
Each interval is sampled once, combining parameter values randomly.

LHS Sampling

Advantages:

  • Better coverage of marginal distributions
  • Fewer samples than grid
  • Scales to higher dimensions

Disadvantages:

  • Assumes independent parameters

References


Quasi-Monte Carlo Methods

Use Sobol and Halton sequences to minimize gaps. Deterministic, evenly distributed samples.

Sobol Sampling

  • Uses binary computations and direction numbers for each parameter \(d\)
  • Sample size ideally a power of 2
  • Deterministic

Advantages: fast, fewer samples, stable in high dimensions
Disadvantages: assumes independence

Sobol' sequence Sampling


Sobol-Scramble

  • Adds a random component to Sobol sequences
  • Reduces artifacts, fast, deterministic with randomness

Advantages: stable, computationally cheap
Disadvantages: independence assumption, performance may degrade at high dimensions


Halton

  • Uses prime bases and radical inverse function
  • Good uniformity in low dimensions (<10)

Advantages: fast, analytical
Disadvantages: correlations appear in higher dimensions


Stick-Breaking Method

The Stick-Breaking method generates parameters that sum to 1 (e.g., mass fractions).
Values are sampled recursively using beta distribution.

Advantages: handles dependent parameters
Disadvantages: may not respect original distributions

References

  • Ng, Kai Wang; Tian, Guo-Liang; Tang, Man-Lai (2011): Dirichlet and related distributions. Theory, methods and applications. 1st ed., Chichester: Wiley.

Sampling Summary

Table: Overview of the main characteristics of six sampling methods
The stick-breaking method has a very specific use case and is therefore not included in this comparison.

Method Type Randomness Space-Filling Convergence Rate Advantages Disadvantages Use Cases
Simple Random Sampling Pseudorandom Sampling Random Poor \(N^{-1/2}\) Very easy to implement; works with any data or model Poor input space; may miss important features Quick exploratory runs when model is simple or time is not a constraint
Grid Sampling Deterministic (Structured) None Excellent Varies with grid resolution Complete coverage; easy to analyze Infeasible in high dimensions; no randomness or replication possible Low-dimensional problems where exhaustive exploration is feasible (e.g., \(\leq 3\)–4 inputs)
Latin Hypercube Sampling Structured Random Sampling Stratified Random Good Better than random Good variance reduction; works well with existing datasets May struggle with highly nonlinear or correlated inputs Most common choice for SA when sample budget is limited and inputs are moderately correlated
Sobol Quasi-Monte Carlo Deterministic Excellent \((\log N)^d / N\) (\(\to 1/N\) for small \(d\)) Best in class for space filling; great for high-dimensional SA No built-in variance estimates; sensitive to input order High-accuracy SA with smooth models, especially for variance-based GSA
Sobol (scrambled) Quasi-Monte Carlo (randomized) Randomized Low-Discrepancy Excellent \((\log N)^d / N\) (\(\to 1/N\) for small \(d\)) Allows error estimation; combines low-discrepancy and randomness; enables replication Slightly more complex to implement Reliable, replicable GSA with uncertainty estimation (Sobol indices with confidence intervals)
Halton Quasi-Monte Carlo Deterministic Good \((\log N)^d / N\) (\(\to 1/N\) for small \(d\)) Simple generator; good for small problems Not robust in high dimensions Quick SA in low-dimensional inputs when low-discrepancy sequence is needed