Pickling Scenarios and Bundles
mpisppy can pickle scenarios and proper bundles to disk so that
later PH / EF / MMW runs unpickle them instead of rebuilding from
scratch every time. This page covers the full pickling workflow:
the basic write / read flags, the pre-pickle preprocessing pipeline
(presolve, user callback, iter0 solve), and how to use pickling as
part of an algorithm-tuning workflow.
For background on proper bundles themselves see Proper Bundles. The cylinder driver that exposes most of these flags is described in generic_cylinders.py.
Why Pickle?
There are two distinct reasons to pickle:
Reuse across runs. Building scenarios (and especially proper bundles) can be expensive. Once they are pickled, every downstream tuning / experimentation run reads them from disk in a fraction of the time it would take to rebuild them.
Front-load deterministic work. Presolve, model-specific cleanup, and the iteration 0 solve are all deterministic given the model. They can be paid once at pickle time and then skipped (or warm- started) on every later run.
Basic Pickling and Unpickling
The generic_cylinders driver writes and reads pickles via two
pairs of flags:
--pickle-scenarios-dir <DIR>/--unpickle-scenarios-dir <DIR>for individual scenarios.--pickle-bundles-dir <DIR>/--unpickle-bundles-dir <DIR>for proper bundles.--scenarios-per-bundlemust also be given on both the writing and reading runs.
When the driver is asked to write pickles, all ranks are used for pickling and most other command line options are ignored on that run. Pickling is a separate phase from solving.
Warning
The directory passed to --pickle-bundles-dir /
--pickle-scenarios-dir is overwritten. Do not point it at a
directory you care about.
Note
When unpickling, options such as --num-scens are still required
because cfg needs them. Consistency between the command line and
the files in the pickle directory is not always checked.
Note
Unpickled scenarios inside proper bundles are not supported by
generic_cylinders directly — the wrappers would need to be more
sophisticated. Pickle the bundles, not the scenarios, when bundling.
Note
The scenario_denouement function might not be called when
pickling bundles.
Warning
Helper functions are not pickled, so there is a loose linkage with the helper functions in the module. The module that built the pickle and the module that consumes it must be source-compatible.
Pre-Pickle Preprocessing
By default a pickle captures exactly what scenario_creator returns:
a freshly built Pyomo model with no preprocessing applied and no solver
state. Several deterministic operations can optionally run between
scenario_creator and dill_pickle, so the cost is paid once and
shared by every downstream run.
The pipeline, when all stages are enabled, is:
scenario_creator(sname)
→ SPPresolve (FBBT / optional OBBT) # --presolve-before-pickle
→ <pre_pickle_function>(model, cfg) # --pre-pickle-function NAME
→ solve iter0 (store values + duals) # --iter0-before-pickle
→ dill_pickle(model)
Each stage is independently controlled by a command line flag and can be enabled or skipped. When more than one is enabled, the order above is preserved: presolve runs before the user callback, the callback runs before the iter0 solve, and pickling is the last step.
For proper bundles, every stage operates on the bundled extensive
form model returned by the bundle scenario_creator; this is the
model that is ultimately pickled. Intra-bundle cross-scenario
propagation in presolve happens automatically on that bundled EF model.
Stage 1: --presolve-before-pickle
Runs SPPresolve over the rank-local scenarios (or bundles) before
pickling. This is exactly the same machinery used by --presolve
when solving directly (see the presolve discussion in
generic_cylinders.py), so the tightened bounds and any
cross-rank Allreduce synchronization match what you would get at
solve time. The difference is that the cost is paid once and baked
into the pickle.
OBBT can be turned on in addition to FBBT via the existing
--obbt flag. Be aware that turning on OBBT at pickle time
introduces a solver dependency on the pickling job — if you pickle on
a machine without a solver installed (for example, an agnostic
AMPL / GAMS workflow or a CI environment), do not enable
--obbt for pickling.
python -m mpisppy.generic_cylinders --module-name farmer --num-scens 12 \
--pickle-bundles-dir farmer_pickles --scenarios-per-bundle 3 \
--presolve-before-pickle
Stage 2: --pre-pickle-function
Names a user callback that is invoked between presolve and the iter0 solve. The argument is a dotted Python path to a function with the signature:
def my_pre_pickle_fn(model, cfg):
"""Called once per scenario or bundle just before it is pickled.
Free to mutate `model` in place. Return value is ignored.
"""
The function does not have to live in your model module — any importable function works, so you can keep generic cleanup utilities in a shared module.
Typical uses:
Apply selected
pyomo.contrib.preprocessingtransformations (coefficient tightening, redundant constraint removal, variable aggregation, zero-term elimination) that you trust for your model.Fix variables to known-tight values you can compute outside the solver.
Delete dominated constraints identified by domain knowledge.
Rename or reorganize components for faster downstream access.
python -m mpisppy.generic_cylinders --module-name farmer --num-scens 12 \
--pickle-bundles-dir farmer_pickles --scenarios-per-bundle 3 \
--presolve-before-pickle \
--pre-pickle-function farmer_cleanup.fix_known_vars
Note
The callback is opt-in. If --pre-pickle-function is not given,
no user code runs in this stage. The flag takes a function name
rather than relying on a magic module-level attribute precisely so
that nothing happens silently.
Stage 3: --iter0-before-pickle
Solves each scenario (or bundle EF) once with its original objective —
no PH W, no PH proximal term, i.e. a PH iteration 0 solve — and
stores the result inside the pickle. Variable .value attributes
and any suffixes attached to the model survive pickling, so the
solution and its dual information are available to downstream
consumers.
By default the same solver as the rest of the run is used. To override just for the pickling phase:
--pickle-solver-name <NAME>selects a different solver for the pickle-time iter0 solve (e.g. a fast LP solver even though downstream uses a MIP solver).--pickle-solver-options <STRING>overrides solver options for the pickle-time solve.
python -m mpisppy.generic_cylinders --module-name farmer --num-scens 12 \
--pickle-bundles-dir farmer_pickles --scenarios-per-bundle 3 \
--presolve-before-pickle --iter0-before-pickle \
--solver-name gurobi_persistent --pickle-solver-name gurobi
Warning
Pickling with an LP-only solver and then running with a MIP solver downstream is allowed. The LP-relaxed variable values still serve as a starting point, but for MIPs this is not always a good warm start — some integer-feasibility-sensitive solvers can be slower when started from a non-integer point. If in doubt, use the same solver at pickle time and run time.
Warning
If the iter0 solve fails (infeasible, time limit, interrupt, or solver error), the pickling job shuts down. There is no “pickle anyway with a warning” fallback. Producing pickles with silently bad state would be worse than the job stopping. Fix the underlying problem and rerun.
Duals and Reduced Costs in the Pickle
When --iter0-before-pickle is set, mpisppy attaches a Pyomo
IMPORT suffix for duals (and reduced costs) to each model before
the iter0 solve. After the solve, those suffix values become part of
the pickled model.
Important
Pickles produced with --iter0-before-pickle carry dual and
reduced cost values on a Pyomo IMPORT suffix. Downstream
consumers (Lagrangian / Lagranger spokes, fixer extensions, custom
user code) can read those duals from the unpickled model without
having to re-solve. If you do not want this behavior, do not enable
--iter0-before-pickle.
How Downstream Runs Use the Iter0 Solution
There are two ways to consume the pickled iter0 solution:
Warm start (default). PH still runs iter0 on the unpickled models, but each subproblem solve now starts from pre-populated variable values. For MIPs this becomes a MIP start and is usually a significant speedup; for LPs it is less useful without a basis.
Skip iter0 entirely with
--iter0-from-pickle. PH reads the variable values from the unpickled scenarios, treats them as the iter0 result, and goes straight to the firstWupdate and iter1.PHBase.Iter0skips its solver loop entirely. The downstream run validates that every local scenario actually carries_mpisppy_data.pickle_metadata['iter0_before_pickle'] == True; if any does not, the run hard-fails rather than fabricating solver state. So the contract is: pickle with--iter0-before-pickle, run with--iter0-from-pickle, or get an error.
# 1. Pay iter0 once at pickle time
python -m mpisppy.generic_cylinders --module-name farmer --num-scens 12 \
--pickle-bundles-dir farmer_pickles --scenarios-per-bundle 3 \
--solver-name gurobi --iter0-before-pickle
# 2. Every later run skips iter0 entirely
mpiexec -np 3 python -m mpi4py mpisppy/generic_cylinders.py \
--module-name farmer --num-scens 12 \
--unpickle-bundles-dir farmer_pickles --scenarios-per-bundle 3 \
--solver-name gurobi --default-rho 1 --max-iterations 50 \
--lagrangian --xhatshuffle --iter0-from-pickle
Pickle Metadata
Each pre-processed pickle records which stages ran, which presolve
options were used, and which solver was invoked, on
model._mpisppy_data.pickle_metadata. When
--iter0-before-pickle runs, the metadata also stores the solver’s
reported outer_bound / inner_bound for that scenario so that
--iter0-from-pickle can restore the same split PH’s own
solve_loop would have set – important for MIPs solved with a
nonzero gap, where the two differ. The metadata travels inside the
pickle automatically, so it survives file moves and is easy to
inspect after the fact.
Tuning Workflow with Pickling
A common reason to pickle is algorithm tuning: you want to try many combinations of rho settings, spoke combinations, fixer thresholds, extension parameters, and so on, against the same scenarios. The scenario build is identical for every tuning run, so doing it again each time is pure waste.
A typical tuning workflow:
Pick a representative problem size (and bundle structure, if you are bundling).
Pickle once. Run
generic_cylinderswith--pickle-bundles-dir(or--pickle-scenarios-dir) and any of the pre-pickle stages that are useful for your model. Common combinations:--presolve-before-picklealone: pay FBBT once.--presolve-before-pickle --iter0-before-pickle: also bake in the iteration 0 solution (and duals) so every tuning run starts warm.Add
--pre-pickle-function ...if you have model-specific cleanup you trust.
Tune from the pickle. Every tuning run uses
--unpickle-bundles-dir <DIR>(or--unpickle-scenarios-dir) instead of rebuilding scenarios. The build / presolve / iter0 cost is gone, so the run-to-run wallclock difference becomes a clean measurement of the tuning change.When you find good settings, lock them in and rebuild pickles only when the underlying model changes.
This workflow especially benefits from --iter0-before-pickle: many
tuning experiments differ only in iter1+ behavior, so eliminating
iter0 from every iteration of the experiment loop is a real time win.
Single-Run Case: Pickling Iter0 to Use All CPUs
There is a less obvious case for pickling: even if you only intend to solve once, pickling first can be faster than not pickling. The reason is parallelism allocation.
When you run generic_cylinders with cylinders, the available MPI
ranks are split across the hub and the spokes. Iteration 0 of PH (the
deterministic warm-start solve) only runs on the hub side; the spoke
ranks are doing other work or are idle for the iter0 phase.
When you run a pickling job with --iter0-before-pickle, all the
available MPI ranks are used for forming bundles and running iter0
solves — there are no cylinders during pickling. So if your machine
has many cores and you have many bundles, the iter0 solves can be
spread across every available rank rather than only the ranks
allocated to the hub during a normal cylinder run.
The result: in some settings, the wall-clock cost of (pickle with
--iter0-before-pickle) + (cylinder run that consumes the warm
start) can be lower than the cost of a single direct cylinder run
that has to do iter0 on a smaller hub allocation.
This is not universal — for small models or small node counts the overhead of the extra serialization round trip dominates. But on larger HPC allocations with many bundles, it is a real option worth measuring.