Input Sample Generation
Introduction
Why Use Sampled Monte Carlo Input?
Monte Carlo simulations are valuable when model inputs are uncertain, random, or future developments are unknown.
The core idea: run the model many times (called realizations), each with different input values, and analyze the outputs statistically.
This approach allows estimation of metrics such as:
- Mean
- Standard deviation
- Confidence intervals
However, when input variables are continuous, it becomes impossible to test every combination.
Even a discrete mesh grid quickly becomes infeasible — a challenge known as the curse of dimensionality.
Suppose the model has five input parameters (dimensions):
| Levels per Parameter | Total Simulations Required |
|---|---|
| 2 | 32 |
| 3 | 243 |
| 4 | 1,024 |
| 10 | 100,000 |
Computational cost grows exponentially with the number of parameters and levels.
Goal: select a representative set of input samples that sufficiently explores the parameter space while keeping computation manageable.
How to Sample Monte Carlo Input
Before sampling, each input parameter must be defined, including uncertainty.
Uncertainty is expressed via a distribution function across the parameter range.
If every value is equally likely (or unknown), a uniform distribution is often used.
The SA-Toolbox supports seven distribution types:
The SA-Toolbox also includes seven sampling methods. The choice affects:
- Clustering or gaps in the sample
- Number of samples and computational cost
- Use of probability distributions
- Maximum dimensions supported
- Likelihood of artifacts during dataset resampling
Dataset Resampling
“Dataset resampling” refers to reducing an existing dataset by selecting entries based on their original order (e.g., row number) without considering parameter values.
See Step 4 in “Subset / Filter”.
Sampling Methods Overview
The SA-Toolbox provides several parameter sampling methods, divided into three categories:
- (Pseudo)Random Methods
- Structured Methods
- Quasi-Monte Carlo Methods

Random methods: robust and simple, but may cluster and require many samples.
Structured methods: ensure each interval is sampled at least once, reducing gaps.
Quasi-Monte Carlo (QMC): use low-discrepancy sequences for even distributions, deterministic but efficient.
Simple Random Sampling
Classic Monte Carlo sampling selects values from parameter distributions without structure.

Advantages:
- Simple to understand
- Respects parameter distributions
- Fast sample generation
Disadvantages:
- Slow convergence, requires many samples
- Uneven coverage
- Assumes independent input parameters
References
- Knuth, Donald Ervin (1981): The Art of Computer Programming. Volume 2. Seminumerical algorithms. 2. ed. Reading, Mass.: Addison-Wesley.
Grid Sampling
Divide each parameter range into equal intervals.
If \( s \) is the sample size and \( d \) the number of parameters:
Each parameter takes \( n \) discrete values. Total grid points:
Important: the midpoints of grid cells are sampled, not vertices. If \( s = n^d \), all midpoints are selected (full factorial design).
Advantages:
- Systematic coverage
- Easy to understand
Disadvantages:
- Expensive for high dimensions
- Assumes independent inputs
References
Latin Hypercube Sampling (LHS)
Divide parameter space into intervals of equal probability.
Each interval is sampled once, combining parameter values randomly.

Advantages:
- Better coverage of marginal distributions
- Fewer samples than grid
- Scales to higher dimensions
Disadvantages:
- Assumes independent parameters
References
- Helton & Davis (2002, 2003)
- SciPy LatinHypercube
Quasi-Monte Carlo Methods
Use Sobol and Halton sequences to minimize gaps. Deterministic, evenly distributed samples.
Sobol Sampling
- Uses binary computations and direction numbers for each parameter \(d\)
- Sample size ideally a power of 2
- Deterministic
Advantages: fast, fewer samples, stable in high dimensions
Disadvantages: assumes independence

References
Sobol-Scramble
- Adds a random component to Sobol sequences
- Reduces artifacts, fast, deterministic with randomness
Advantages: stable, computationally cheap
Disadvantages: independence assumption, performance may degrade at high dimensions
References
Halton
- Uses prime bases and radical inverse function
- Good uniformity in low dimensions (<10)
Advantages: fast, analytical
Disadvantages: correlations appear in higher dimensions
References
Stick-Breaking Method
The Stick-Breaking method generates parameters that sum to 1 (e.g., mass fractions).
Values are sampled recursively using beta distribution.
Advantages: handles dependent parameters
Disadvantages: may not respect original distributions
References
- Ng, Kai Wang; Tian, Guo-Liang; Tang, Man-Lai (2011): Dirichlet and related distributions. Theory, methods and applications. 1st ed., Chichester: Wiley.
Sampling Summary
Table: Overview of the main characteristics of six sampling methods
The stick-breaking method has a very specific use case and is therefore not included in this comparison.
| Method | Type | Randomness | Space-Filling | Convergence Rate | Advantages | Disadvantages | Use Cases |
|---|---|---|---|---|---|---|---|
| Simple Random Sampling | Pseudorandom Sampling | Random | Poor | \(N^{-1/2}\) | Very easy to implement; works with any data or model | Poor input space; may miss important features | Quick exploratory runs when model is simple or time is not a constraint |
| Grid Sampling | Deterministic (Structured) | None | Excellent | Varies with grid resolution | Complete coverage; easy to analyze | Infeasible in high dimensions; no randomness or replication possible | Low-dimensional problems where exhaustive exploration is feasible (e.g., \(\leq 3\)–4 inputs) |
| Latin Hypercube Sampling | Structured Random Sampling | Stratified Random | Good | Better than random | Good variance reduction; works well with existing datasets | May struggle with highly nonlinear or correlated inputs | Most common choice for SA when sample budget is limited and inputs are moderately correlated |
| Sobol | Quasi-Monte Carlo | Deterministic | Excellent | \((\log N)^d / N\) (\(\to 1/N\) for small \(d\)) | Best in class for space filling; great for high-dimensional SA | No built-in variance estimates; sensitive to input order | High-accuracy SA with smooth models, especially for variance-based GSA |
| Sobol (scrambled) | Quasi-Monte Carlo (randomized) | Randomized Low-Discrepancy | Excellent | \((\log N)^d / N\) (\(\to 1/N\) for small \(d\)) | Allows error estimation; combines low-discrepancy and randomness; enables replication | Slightly more complex to implement | Reliable, replicable GSA with uncertainty estimation (Sobol indices with confidence intervals) |
| Halton | Quasi-Monte Carlo | Deterministic | Good | \((\log N)^d / N\) (\(\to 1/N\) for small \(d\)) | Simple generator; good for small problems | Not robust in high dimensions | Quick SA in low-dimensional inputs when low-discrepancy sequence is needed |