Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Simulation Guide

This guide covers how to use Aura's simulation infrastructure for testing distributed protocols under controlled conditions.

When to Use Simulation

Simulation suits scenarios that unit tests cannot address. Use simulation for fault injection testing. Use it for multi-participant protocol testing. Use it for time-dependent behavior validation.

Do not use simulation for simple unit tests. Direct effect handler testing is faster and simpler for single-component validation. Do not treat simulation as the default end-to-end correctness oracle for user-facing flows. Aura's primary feedback loop remains the real-runtime harness running against the real software stack.

See Simulation Infrastructure Reference for the complete architecture documentation.

Simulation vs Harness

The simulation architecture is specified in Simulation Infrastructure Reference. The harness architecture is specified in User Flow Harness.

Use the real-runtime harness by default when validating product behavior through the TUI or webapp. Use simulation when you need deterministic virtual time, controlled network faults, scheduler control, or MBT and trace replay under constrained distributed conditions. Promote high-value simulation findings back into real-runtime harness coverage when the flow is user-visible or integration-sensitive.

Two Simulation Systems

The two simulation systems (TOML scenarios and Quint actions) are specified in Simulation Infrastructure Reference.

Use CaseSystem
End-to-end integrationTOML scenarios
Named fault injectionTOML scenarios
Conformance testingQuint actions
State space explorationQuint actions

When you need user-facing coverage, promote the scenario into the real-runtime harness lane after it is stable in simulation. Treat simulation as a substrate for controlled runtime conditions, not as the final UI executor.

TOML Scenario Authoring

Creating Scenarios

Scenario files live in the scenarios/ directory.

[metadata]
name = "recovery_basic"
description = "Basic guardian recovery flow"

[[phases]]
name = "setup"
actions = [
    { type = "create_participant", id = "owner" },
    { type = "create_participant", id = "guardian1" },
    { type = "create_participant", id = "guardian2" },
]

[[phases]]
name = "recovery"
actions = [
    { type = "run_choreography", choreography = "guardian_recovery", participants = ["owner", "guardian1", "guardian2"] },
]

[[properties]]
name = "owner_recovered"
property_type = "safety"

Each scenario has metadata, ordered phases, and property definitions.

Defining Phases

Phases execute in order. Each phase contains a list of actions.

[[phases]]
name = "fault_injection"
actions = [
    { type = "apply_network_condition", condition = "partition", duration = "5s" },
    { type = "advance_time", duration = "10s" },
]

Actions within a phase execute sequentially. Use multiple phases to organize complex scenarios.

Adding Fault Injection

Fault injection actions simulate adverse conditions.

[[phases]]
name = "chaos"
actions = [
    { type = "simulate_data_loss", participant = "guardian1", percentage = 50 },
    { type = "apply_network_condition", condition = "high_latency", duration = "30s" },
]

Available conditions include partition, high_latency, packet_loss, and byzantine.

Running Scenarios

cargo run --package aura-terminal -- scenario run scenarios/recovery_basic.toml

The scenario handler parses and executes the TOML file. Results report property pass/fail status.

Working with Handlers

Basic Handler Composition

#![allow(unused)]
fn main() {
use aura_simulator::handlers::SimulationEffectComposer;
use aura_core::DeviceId;

let device_id = DeviceId::new_from_entropy([1u8; 32]);
let composer = SimulationEffectComposer::for_testing(device_id).await?;
let env = composer
    .with_time_control()
    .with_fault_injection()
    .build()?;
}

The composer builds a complete effect environment from handler components.

Time Control

#![allow(unused)]
fn main() {
use aura_simulator::handlers::SimulationTimeHandler;

let mut time = SimulationTimeHandler::new();
let start = time.physical_time().await?;
time.jump_to_time(Duration::from_secs(60));
let later = time.physical_time().await?;
}

Simulated time advances only through explicit calls (jump_to_time) or sleep operations. This enables testing timeout behavior without delays.

Fault Injection

#![allow(unused)]
fn main() {
use aura_simulator::handlers::SimulationFaultHandler;
use aura_core::{AuraFault, AuraFaultKind, FaultEdge};

let faults = SimulationFaultHandler::new(42);

faults.inject_fault(
    AuraFault::new(AuraFaultKind::MessageDelay {
        edge: FaultEdge::new("alice", "bob"),
        min: Duration::from_millis(100),
        max: Duration::from_millis(500),
    }),
    None,
)?;

faults.inject_fault(
    AuraFault::new(AuraFaultKind::MessageDrop {
        edge: FaultEdge::new("alice", "bob"),
        probability: 0.1,
    }),
    None,
)?;
}

AuraFault is the canonical simulator fault model. Legacy scenario fault forms should be converted to AuraFault before injection or replay.

Triggered Scenarios

#![allow(unused)]
fn main() {
use aura_simulator::handlers::{
    SimulationScenarioHandler,
    ScenarioDefinition,
    TriggerCondition,
    InjectionAction,
};
use aura_core::{AuraFault, AuraFaultKind};

let handler = SimulationScenarioHandler::new(42);
handler.register_scenario(ScenarioDefinition {
    id: "late_partition".to_string(),
    name: "Late Partition".to_string(),
    trigger: TriggerCondition::AfterTime(Duration::from_secs(30)),
    actions: vec![InjectionAction::TriggerFault {
        fault: AuraFault::new(AuraFaultKind::NetworkPartition {
            partition: vec![vec!["device1".into(), "device2".into()], vec!["device3".into()]],
            duration: Some(Duration::from_secs(15)),
        }),
    },
    duration: Some(Duration::from_secs(45)),
    priority: 10,
});
}

Triggered scenarios inject faults at specific times or protocol states.

Integrating Feature Crates

Layer 5 feature crates (sync, recovery, chat, etc.) integrate with simulation through the effect system. This section covers patterns for wiring feature crates into simulation environments.

Required Effects

Feature crates are generic over effect traits. Common requirements include:

Effect TraitPurpose
NetworkEffectsTransport and peer communication
JournalEffectsFact retrieval and commits
CryptoEffectsHashing and signature verification
PhysicalTimeEffectsTimeouts and scheduling
RandomEffectsNonce generation

Pass the effect system from aura-simulator or aura-testkit for deterministic testing. In production, use aura-agent's runtime effects.

Configuration for Simulation

Feature crates typically provide testing configurations that minimize timeouts and remove jitter:

#![allow(unused)]
fn main() {
use aura_sync::SyncConfig;

// Production: conservative timeouts, adaptive scheduling
let prod_config = SyncConfig::for_production();

// Testing: fast timeouts, no jitter, predictable behavior
let test_config = SyncConfig::for_testing();

// Validate before use
test_config.validate()?;
}

Environment variables (prefixed per-crate, e.g., AURA_SYNC_*) allow per-process tuning without code changes.

Guard Chain Integration

The guard chain sequence is specified in Authorization.

For simulation, capability checks rely on Biscuit tokens evaluated by AuthorizationEffects. Guard evaluators must be provided by the runtime before sync operations. Validation occurs before sending or applying any protocol data.

Observability in Simulation

Connect feature crates to MetricsCollector for simulation diagnostics:

#![allow(unused)]
fn main() {
use aura_core::metrics::MetricsCollector;

let metrics = MetricsCollector::new();
// Protocol timings, retries, and failure reasons flow to metrics
// Log transport and authorization failures for debugging
}

Safety Requirements

Feature crates must follow effect system rules. All I/O and timing must flow through effects with no direct runtime calls. Validate Biscuit tokens before accepting peer data. Enforce flow budgets and leakage constraints at transport boundaries.

Verify compliance before simulation:

just ci-effects

Debugging Simulations

Deterministic Configuration

Always use deterministic settings for reproducible debugging.

#![allow(unused)]
fn main() {
let config = SimulatorConfig {
    device_id: DeviceId::new_from_entropy([1u8; 32]),
    network: NetworkConfig::default(),
    enable_fault_injection: false,
    deterministic_time: true,
};
}

Deterministic identifiers and time enable exact failure reproduction.

Effect System Compliance

Verify protocol code follows effect guidelines before simulation.

just check-arch

The architecture checker flags direct time, randomness, or I/O usage. Non-compliant code breaks simulation determinism.

Monitoring State

#![allow(unused)]
fn main() {
let metrics = middleware.get_metrics();
println!("Messages: {}", metrics.messages_sent);
println!("Faults: {}", metrics.faults_injected);
println!("Duration: {:?}", metrics.simulation_duration);
}

Middleware metrics help identify unexpected behavior.

Common Issues

Flaky simulation results indicate non-determinism. Check for direct system calls. Check for uncontrolled concurrency. Check for ordering assumptions.

Slow simulations indicate inefficient fault configuration. Reduce fault rates for initial debugging. Increase rates for stress testing.

Online Property Monitoring

Aura simulator supports per-tick property monitoring through aura_simulator::AuraProperty, aura_simulator::AuraPropertyMonitor, and aura_simulator::default_property_suite(...).

The monitor checks properties on each simulation tick using PropertyStateSnapshot input. This includes events, buffer sizes, local-type depths, flow budgets, and optional session, coroutine, and journal snapshots.

#![allow(unused)]
fn main() {
use aura_simulator::{
    AuraPropertyMonitor, ProtocolPropertyClass, ProtocolPropertySuiteIds,
    PropertyMonitoringConfig, SimulationScenarioConfig,
};

let monitoring = PropertyMonitoringConfig::new(
    ProtocolPropertyClass::Consensus,
    ProtocolPropertySuiteIds { session, context },
)
.with_check_interval(1)
.with_snapshot_provider(|tick| build_snapshot_for_tick(tick));

let config = SimulationScenarioConfig {
    property_monitoring: Some(monitoring),
    ..SimulationScenarioConfig::default()
};

let results = env.run_scenario("consensus".into(), "with property checks".into(), config).await?;
assert!(results.property_violations.is_empty());
}

Default suites are available for consensus, sync, chat, and recovery protocol classes. Scenario results include properties_checked and property_violations for CI reporting.

Adaptive Privacy Simulation

Adaptive privacy validation uses simulation for cross-layer evidence. The simulator exercises anonymous path establishment, established-path reuse and expiry, move batching, cover traffic, selector-based retrieval, hold retention, provider saturation, partitions, churn, and long-offline bootstrap re-entry.

The adaptive privacy matrix should include small, medium, and large reachable sets. It should also include clustered social topologies, partition and heal cycles, sparse sync opportunities, low organic traffic, provider saturation, and stale-node return with dead remembered descriptors.

The evidence path tunes policy constants for production. Production ships one fixed adaptive policy. Development and simulation may vary constants such as cover floor, path-diversity floor, delay gain, hold retention, and retrieval rotation, but those tuning surfaces must not become user-facing production configuration.

The current fixed-policy evidence is anchored by crates/aura-simulator/tests/adaptive_privacy_phase_six.rs. That lane checks the tuned policy, archived control-plane reports, bootstrap-observer reports, Telltale-backed anonymous path establishment, and reply-block accountability.

Quint Integration

Quint actions enable model-based testing. See Verification and MBT Guide for complete workflows.

When to Use Quint

Use Quint actions for conformance testing against formal specifications. Use them for generative state exploration. Do not use them for simple integration tests.

Basic Trace Replay

#![allow(unused)]
fn main() {
use aura_simulator::quint::itf_loader::ITFLoader;
use aura_simulator::quint::generative_simulator::GenerativeSimulator;

let trace = ITFLoader::load("trace.itf.json")?;
let simulator = GenerativeSimulator::new(config)?;
let result = simulator.replay_trace(&trace).await?;
assert!(result.all_properties_passed());
}

Trace replay validates implementation against Quint model behavior.

AMP epoch-transition simulation has a focused regression lane in crates/aura-simulator/tests/amp_transition_scenarios.rs. It covers delayed witnesses, partitions that create conflicting A2 certificates, subtractive old-epoch receive cutover, emergency quarantine, emergency cryptoshred, replay, abort, supersession, and alarm-spam cases. Run it with cargo test -p aura-simulator --test amp_transition_scenarios; broader local development can continue to use just test-crate aura-simulator without requiring an exhaustive Quint/Apalache matrix.

Conformance Workflow

Simulation feeds native/WASM conformance testing. See Testing Guide for conformance lanes and corpus policy.

Generating Corpus

quint run --out-itf=trace.itf.json verification/quint/consensus/core.qnt

ITF traces from Quint become conformance test inputs.

Running Conformance

just ci-conformance

Conformance lanes compare execution across platforms using simulation-controlled environments.

Checkpoints, Contracts, and Shared Replay

Phase 0 hardening uses three simulator workflows as CI gates.

Checkpoint Snapshot Workflow

SimulationScenarioHandler supports portable checkpoint snapshots. The export_checkpoint_snapshot(label) function exports a serializable ScenarioCheckpointSnapshot. The import_checkpoint_snapshot(snapshot) function restores a checkpoint into a fresh simulator instance.

This enables baseline checkpoint persistence for representative choreography suites. It also supports restore-and-continue regression tests. Upgrade smoke tests can resume from pre-upgrade snapshots.

Use checkpoints when validating runtime upgrades or migration safety, not only end-to-end success.

Scenario Contract Workflow

Conformance CI includes scenario contracts for consensus, sync, recovery, and reconfiguration. Each bundle is validated over a seed corpus. Validation checks terminal status (AllDone), required labels observed in trace, and minimum observable event count.

Contract results are written as JSON artifacts. CI fails on any violation with structured output.

Shared Replay Workflow

Replay-heavy parity lanes should use shared replay APIs. Use run_replay_shared(...) and run_concurrent_replay_shared(...) for this purpose.

These APIs reduce duplicate replay state across lanes and keep replay artifacts compatible with canonical trace fragments. Conformance lanes also emit deterministic replay metrics artifacts so regressions in replay footprint are visible during CI review.

For fault-aware replays, persist entries + faults bundles and re-inject faults before replay. Use aura_testkit::ReplayTrace::load_file(...) to load traces. Use ReplayTrace::replay_faults(...) and aura_simulator::AsyncSimulatorHostBridge::replay_expected_with_faults(...) to replay with faults.

Differential Replay Workflow

Use aura_simulator::DifferentialTester to compare baseline and candidate conformance artifacts. Two profiles are available. The strict profile requires byte-identical surfaces. The envelope_bounded profile uses Aura law-aware comparison with commutative and algebraic envelopes. In supported Telltale 11-backed report lanes, this is the low-level surface comparator, not the theorem-facing semantic authority: use the parity report's upstream semantic summary and run context.

For Telltale 11-backed parity lanes, prefer aura_simulator::run_telltale_parity_with_runner(...) or aura_simulator::run_telltale_control_plane_with_runner(...) over manually assembling candidate upstream sidecar paths. These helpers invoke the configured Telltale runner command, attach the generated candidate run-output sidecar, and then emit the normal Aura parity report. Supported file lanes still require a baseline upstream run sidecar, and they attach optional decision/sweep sidecars when available. Override the command with AURA_TELLTALE_SIMULATOR_RUNNER when needed.

For environment-oriented simulation state, Aura now exposes a bridge layer that uses Telltale 11-aligned names for migrated slices. Prefer the scenario-run artifact lane over ad hoc handler inspection: after run_scenario(...), inspect SimulationResults::environment_artifacts and the persisted environment_snapshot.json / environment_trace.json files under the configured simulator artifact root. When richer Aura-only environment detail is needed, read the optional environment_overlay.json supplement instead of expanding the core bridge schema.

For comparative experiment work, use aura_simulator::run_adaptive_privacy_policy_sweep(...) plus aura_simulator::compare_policy_sweeps(...) instead of ad hoc local sweep scripts. Those APIs keep Aura-specific bindings on top of the shared Telltale execution machinery while emitting Aura-owned AuraSweepArchiveV1 and AuraPolicyDiffReportV1 artifacts.

For reusable regression bundles, use aura_simulator::run_suite_catalog(...) and aura_simulator::compare_suite_catalogs(...). The suite catalogs are Aura-owned, but execution still goes through the shared Telltale harness and the archived comparison surface is AuraSuiteTournamentReportV1.

For theorem-aware failure evidence, convert parity reports with aura_simulator::counterexample_from_parity_report(...) or aura_simulator::counterexample_from_control_plane_report(...). These helpers preserve the shared Telltale counterexample witness and record whether the observed mismatch is schedule noise only.

For parity debugging, run:

just ci-choreo-parity
aura replay --trace-file artifacts/choreo-parity/native_replay/<scenario>__seed_<seed>.json

The replay command validates required conformance surfaces and verifies stored step/run digests against recomputed values.

Best Practices

Start with simple scenarios and add faults incrementally. Use deterministic seeds. Capture metrics for analysis.

Prefer TOML scenarios for human-readable tests. Prefer Quint actions for specification conformance. Combine both for comprehensive coverage.