Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Extensions

The core protocol supports several extensions. These extensions address operational requirements beyond basic signing. They enable share management and enhanced privacy. All extension state structures form semilattices that merge via join.

Error Handling

The Protocol/Core/Error.lean module provides unified error handling patterns across the protocol.

BlameableError Typeclass

Errors that can identify a misbehaving party implement the BlameableError typeclass:

class BlameableError (E : Type*) (PartyId : Type*) where
  blamedParty : E → Option PartyId

instance : BlameableError (DkgError PartyId) PartyId where
  blamedParty
    | .lengthMismatch => none
    | .duplicatePids => none
    | .commitMismatch p => some p

instance : BlameableError (Complaint PartyId) PartyId where
  blamedParty c := some c.accused

instance : BlameableError (VSS.VSSError PartyId) PartyId where
  blamedParty
    | .shareVerificationFailed accused _ => some accused
    | .missingCommitment p => some p
    | .missingShare from _ => some from
    | .thresholdMismatch _ _ => none
    | .duplicateDealer p => some p

Cheater Detection Modes

Following FROST, we provide configurable cheater detection:

inductive CheaterDetectionMode
  | disabled      -- No blame tracking (best performance)
  | firstCheater  -- Stop after finding one cheater (good balance)
  | allCheaters   -- Exhaustively identify all cheaters (complete but slower)

structure ProtocolConfig where
  cheaterDetection : CheaterDetectionMode := .firstCheater
  maxErrors : Nat := 0  -- 0 = unlimited

def ProtocolConfig.default : ProtocolConfig
def ProtocolConfig.highSecurity : ProtocolConfig   -- allCheaters mode
def ProtocolConfig.performance : ProtocolConfig    -- disabled mode

Error Aggregation

Utilities for collecting and analyzing errors across protocol execution:

/-- Count how many errors blame a specific party -/
def countBlame (errors : List E) (party : PartyId) : Nat

/-- Get all unique blamed parties from a list of errors -/
def allBlamedParties (errors : List E) : List PartyId

/-- Collect blame respecting detection mode -/
def collectBlame (config : ProtocolConfig) (errors : List E) : BlameResult PartyId

Error Categories

ModuleTypeCategoryDescription
DKGCoreDkgErrorFatalProtocol abort required
DKGThresholdComplaintRecoverableParty exclusion possible
DKGThresholdExclusionResultResultQuorum check outcome
VSSVSSErrorFatal/RecoverableDepends on complaint count
RefreshCoordCommitValidationErrorValidationCommit validation outcome
RefreshCoordRevealValidationErrorValidationReveal validation outcome
RepairCoordContribCommitValidationErrorValidationCommit validation outcome
RepairCoordContribRevealValidationErrorValidationReveal validation outcome

Session Abort Coordination

The Protocol/Core/Abort.lean module provides abort coordination for session-level failures that cannot be handled locally. Unlike local rejection sampling which handles norm bounds within a single signer, aborts require distributed consensus to abandon a session.

When Aborts Are Needed

Session aborts handle failures that affect the entire signing session.

Liveness failures occur when parties are offline or unresponsive. These require f+1 votes to confirm the abort. Examples include session timeout with insufficient responses and specific parties detected as unresponsive.

Security violations trigger immediate abort without voting. These indicate the protocol cannot safely continue. Examples include trust violations where more than n-t parties are faulty, aggregate signatures exceeding bounds despite local rejection, and DKG/VSS with too many complaints.

Explicit cancellation occurs when a party requests abort. This also requires f+1 votes to confirm.

Abort Reasons

The AbortReason type categorizes why a session needs to abort.

inductive AbortReason (PartyId : Type*) where
  -- Liveness failures (require f+1 votes)
  | timeout (respondents : Nat) (required : Nat)
  | insufficientParticipants (actual : Nat) (required : Nat)
  | partyUnresponsive (parties : List PartyId)

  -- Security violations (immediate abort)
  | trustViolation (faultyCount : Nat) (maxTolerable : Nat)
  | globalNormExceeded (aggregateNorm : Nat) (bound : Nat)
  | tooManyComplaints (complaints : Nat) (maxTolerable : Nat)

  -- Explicit request (requires f+1 votes)
  | requestedBy (requester : PartyId)

The isImmediate method returns true for security violations that bypass voting. The isSecurityViolation method distinguishes security issues from liveness issues.

Abort State (CRDT)

Abort state forms a CRDT for distributed coordination.

structure AbortState (S : Scheme) where
  session : Nat
  votes : Finset S.PartyId
  reasons : List (AbortReason S.PartyId)
  immediateTriggered : Bool := false

The CRDT merge unions the vote sets, concatenates reasons with deduplication, and ORs the immediate flag. This allows replicas to merge abort proposals from different parties in any order.

Abort Threshold

The abort threshold is f+1 where f = n - t is the maximum faulty parties. This ensures at least one honest party agreed to abort.

def AbortState.shouldAbort (state : AbortState S) (cfg : ThresholdConfig) : Bool :=
  state.hasImmediateReason || state.isConfirmedByVotes cfg

A session aborts when either an immediate reason was triggered or f+1 parties voted. The +1 prevents a coalition of f malicious parties from forcing spurious aborts.

Abort Messages

Parties propose aborts by broadcasting AbortMsg messages.

structure AbortMsg (S : Scheme) where
  sender : S.PartyId
  session : Nat
  reason : AbortReason S.PartyId
  evidence : List S.PartyId := []

Convenience constructors create messages for common scenarios. The AbortMsg.timeout constructor creates timeout messages. The AbortMsg.unresponsive constructor identifies specific unresponsive parties. The AbortMsg.trustViolation constructor reports security violations.

Processing Abort Messages

The AbortState.processMsg method updates state when receiving an abort message.

def AbortState.processMsg (state : AbortState S) (msg : AbortMsg S) : AbortState S :=
  if msg.session = state.session then
    Join.join state (AbortState.fromMsg msg)
  else
    state  -- Ignore messages for different sessions

Messages for different sessions are ignored. Messages for the current session are merged into the CRDT state.

Relationship to Local Rejection

Local rejection and session aborts serve different purposes.

MechanismScopeCoordinationExample
Local rejectionSingle partyNoneNorm bound exceeded
Session abortSession-widef+1 votesTimeout, trust violation

Local rejection handles per-party issues within a signing attempt. Each party independently retries until finding a valid response. Session aborts handle failures that require distributed coordination to resolve.

Security Markers

The Protocol/Core/Security.lean module provides typeclass markers for implementation security requirements. These markers document FFI implementation obligations.

Zeroizable Typeclass

Marks types containing secret material that must be securely erased from memory after use.

class Zeroizable (α : Type*) where
  needsZeroization : Bool := true

instance : Zeroizable (SecretBox α)
instance : Zeroizable (LinearMask S)
instance : Zeroizable (LinearDelta S)
instance : Zeroizable (LinearOpening S)

Implementation guidance:

PlatformAPI
Rustzeroize::Zeroize trait, ZeroizeOnDrop
Cexplicit_bzero(), SecureZeroMemory()
POSIXexplicit_bzero()
WindowsSecureZeroMemory()

ConstantTimeEq Typeclass

Marks types requiring constant-time equality comparison to prevent timing side-channels.

class ConstantTimeEq (α : Type*) where
  requiresConstantTime : Bool := true

instance : ConstantTimeEq (SecretBox α)
instance : ConstantTimeEq (LinearMask S)
instance : ConstantTimeEq (LinearDelta S)

Implementation guidance:

  • Use subtle::ConstantTimeEq in Rust
  • Use CRYPTO_memcmp() from OpenSSL
  • Never use early-exit comparison loops
  • Avoid branching on secret-dependent values

Linear Security Wrappers

The protocol uses linear-style type discipline to prevent misuse of single-use sensitive values. Private constructors and module-scoped access make violations visible.

LinearMask

Refresh masks that must be applied exactly once to preserve the zero-sum invariant.

structure LinearMask (S : Scheme) where
  private mk ::
  private partyId : S.PartyId
  private maskValue : S.Secret
  private publicImage : S.Public

def LinearMask.create (S : Scheme) (pid : S.PartyId) (mask : S.Secret) : LinearMask S
def LinearMask.consume (m : LinearMask S) : S.Secret × MaskConsumed S

Security: If a mask is applied twice or not at all, the sum of masks will not be zero and the global secret changes.

LinearDelta

Repair contributions that must be aggregated exactly once.

structure LinearDelta (S : Scheme) where
  private mk ::
  private senderId : S.PartyId
  private targetId : S.PartyId
  private deltaValue : S.Secret

def LinearDelta.create (S : Scheme) (sender target : S.PartyId) (delta : S.Secret) : LinearDelta S
def LinearDelta.consume (d : LinearDelta S) : S.Secret × DeltaConsumed S

Security: Using a delta twice adds its value twice to the reconstruction sum, corrupting the repaired share.

LinearOpening

Commitment randomness that should be used exactly once to prevent equivocation.

structure LinearOpening (S : Scheme) where
  private mk ::
  private openingValue : S.Opening
  private used : Bool := false

def LinearOpening.fresh (S : Scheme) (opening : S.Opening) : LinearOpening S
def LinearOpening.consume (o : LinearOpening S) : S.Opening × OpeningConsumed S

Security: Reusing an opening with different values could allow opening the same commitment to different values.

Consumption Proofs

Each linear type produces a consumption proof when consumed:

structure MaskConsumed (S : Scheme) where
  private mk ::
  forParty : S.PartyId

structure DeltaConsumed (S : Scheme) where
  private mk ::
  fromHelper : S.PartyId

structure OpeningConsumed (S : Scheme) where
  private mk ::
  consumed : Unit := ()

Downstream functions can require these proofs to ensure proper consumption occurred.

Reusable Protocol Patterns

The Protocol/Core/Patterns.lean module provides reusable abstractions for common protocol patterns. Scheme-specific versions are re-exported from Protocol/Core/Core.lean.

Constraint Aliases

Standard constraint combinations are aliased for cleaner signatures. Generic versions in Patterns.lean work with any type, while Core.lean provides Scheme-specific aliases:

-- Generic (Patterns.lean):
abbrev HashConstraints (PartyId : Type*) := BEq PartyId × Hashable PartyId
abbrev StateConstraints (PartyId : Type*) := HashConstraints PartyId × DecidableEq PartyId
abbrev FieldConstraints (Scalar : Type*) := Field Scalar × DecidableEq Scalar

-- Scheme-specific (Core.lean):
abbrev PartyIdHash (S : Scheme) := BEq S.PartyId × Hashable S.PartyId
abbrev PartyIdState (S : Scheme) := PartyIdHash S × DecidableEq S.PartyId
abbrev ScalarField (S : Scheme) := Field S.Scalar × DecidableEq S.Scalar

Usage:

-- Before (verbose):
def foo (S : Scheme) [BEq S.PartyId] [Hashable S.PartyId] [DecidableEq S.PartyId] ...

-- After (concise):
def foo (S : Scheme) [PartyIdState S] ...

Generic Share Verification

The pattern A(secret) = expectedPublic is centralized:

def verifySecretPublic (S : Scheme) (secret : S.Secret) (expectedPublic : S.Public) : Prop :=
  S.A secret = expectedPublic

def verifySecretPublicBool (S : Scheme) [DecidableEq S.Public]
    (secret : S.Secret) (expectedPublic : S.Public) : Bool :=
  S.A secret = expectedPublic

def verifyCommitmentOpening (S : Scheme) [DecidableEq S.Commitment]
    (value : S.Public) (opening : S.Opening) (expectedCommit : S.Commitment) : Bool :=
  S.commit value opening = expectedCommit

Used consistently across:

  • VSS share verification
  • Repair share reconstruction
  • Refresh share updates
  • DKG public key verification

Commit-Reveal Framework

Generic types for commit-reveal protocol phases (in Patterns.lean):

/-- Generic validation error -/
inductive CommitRevealError (PhaseType PartyId : Type*)
  | wrongPhase (current : PhaseType)
  | senderNotAuthorized (sender : PartyId)
  | duplicateMessage (sender : PartyId)
  | commitmentMismatch (sender : PartyId)
  | missingCommit (sender : PartyId)
  | verificationFailed (sender : PartyId) (reason : String)

/-- Typeclass for messages with sender -/
class HasSender (M : Type*) (PartyId : Type*) where
  sender : M → PartyId

def getSender [HasSender M PartyId] (m : M) : PartyId := HasSender.sender m

Scheme-specific verification (in Core.lean):

/-- Typeclass for verifiable commit-reveal pairs -/
class CommitVerifiable (CommitMsg RevealMsg : Type*) (S : Scheme) where
  verifyOpening : S → CommitMsg → RevealMsg → Bool

Utility Functions

These utilities are in Patterns.lean:

/-- Check party authorization -/
def isAuthorized (authorizedParties : List PartyId) (party : PartyId) : Bool

/-- Extract unique values by key (replaces map+dedup pattern) -/
def extractUnique (key : α → β) (items : List α) : List β

External Protocol Integration

The Protocol/Core/Core.lean and Protocol/Sign/Types.lean modules provide integration points for external consensus and coordination protocols.

External Context

All signing messages can carry optional context for binding to external protocol instances:

structure ExternalContext where
  consensusId : Option ByteArray := none    -- external session identifier
  resultId : Option ByteArray := none       -- what is being signed (e.g., H(op, prestate))
  prestateHash : Option ByteArray := none   -- state validation
  evidenceDelta : Option ByteArray := none  -- CRDT evidence for propagation

Evidence Carrier Typeclass

The EvidenceCarrier typeclass enables piggybacking CRDT evidence deltas on signing messages:

class EvidenceCarrier (M : Type*) where
  attachEvidence : M → ByteArray → M
  extractEvidence : M → Option ByteArray
  hasEvidence : M → Bool := fun m => (extractEvidence m).isSome

Instances are provided for all signing message types (SignCommitMsg, SignRevealWMsg, SignShareMsg) and Signature.

Self-Validating Shares

For gossip-based fallback protocols, ValidatableShare bundles a share with enough context for independent validation:

structure ValidatableShare (S : Scheme) where
  share : SignShareMsg S
  nonceCommit : S.Commitment
  publicHiding : S.Public
  publicBinding : S.Public
  pkShare : S.Public

Recipients can validate the share without full session state using ValidatableShare.isWellFormed.

Context-Aware Aggregation

When aggregating shares from multiple parties, context consistency can be validated:

def aggregateWithContextValidation (S : Scheme) (c : S.Challenge)
    (Sset : List S.PartyId) (commits : List S.Commitment)
    (shares : List (SignShareMsg S))
    : Except String (Signature S)

This catches Byzantine behavior where a party signs different operations.

See Protocol Integration for detailed usage patterns.

Serialization

The Protocol/Core/Serialize.lean module provides type-safe serialization for network transport.

Serialization Headers

Messages include headers for version and scheme identification:

inductive SchemeId
  | mlDsa44 | mlDsa65 | mlDsa87 | custom (id : ByteArray)

structure SerializationHeader where
  version : UInt8       -- protocol version (currently 1)
  schemeId : SchemeId   -- parameter set identifier

class SerializableWithHeader (α : Type*) extends Serializable α where
  header : SerializationHeader
  toBytesWithHeader : α → ByteArray
  fromBytesWithHeader : ByteArray → Option α

This enables:

  • Version negotiation: Detect incompatible protocol versions
  • Parameter validation: Verify scheme compatibility before deserializing
  • Safe upgrades: Add new schemes without breaking existing deployments

Serializable Typeclass

class Serializable (α : Type*) where
  toBytes : α → ByteArray
  fromBytes : ByteArray → Option α

Implementations are provided for primitives (Nat, Int, ByteArray), compound types (Option, List), and all protocol messages.

Wire Format

The wire format uses little-endian encoding with explicit length prefixes:

TypeFormat
Nat8-byte little-endian
Int8-byte little-endian (two's complement)
List α4-byte length + concatenated elements
Option α1-byte tag (0=none, 1=some) + element if some

Message Wrapping

Messages are wrapped with type tags for network transport:

inductive MessageTag
  | dkgCommit | dkgReveal
  | signCommit | signReveal | signShare
  | signature
  | abort

structure WrappedMessage where
  tag : MessageTag
  payload : ByteArray

def WrappedMessage.toBytes (msg : WrappedMessage) : ByteArray :=
  -- tag (1 byte) + length (4 bytes) + payload

Security Note: Deserialization is a potential attack surface. All fromBytes implementations validate input lengths and return none on malformed input.

Complaints

The complaint mechanism identifies misbehaving parties. It activates when protocol invariants are violated.

Complaint Types

Two complaint variants cover the main failure modes.

inductive Complaint (PartyId : Type) where
  | openingMismatch (accused : PartyId)
  | missingReveal (accused : PartyId)

Complaint Triggers

A party files a complaint under the following conditions.

Invalid commitment opening. Party reveals a pair that does not match its commitment . Generates openingMismatch.

Missing message. Party fails to send a required message within the timeout. Generates missingReveal.

Inconsistent session. The participant set differs from the expected set.

Invalid partial signature. Party produces a partial signature that fails verification.

Aggregation with Complaints

DKG aggregation returns either the public key or a list of complaints identifying misbehaving parties.

def dkgAggregateWithComplaints
  (S : Scheme) [DecidableEq S.PartyId]
  (commits : List (DkgCommitMsg S))
  (reveals : List (DkgRevealMsg S))
  : Except (List (Complaint S.PartyId)) S.Public

Complaint Resolution

When a party files a complaint the protocol may proceed in different ways.

Abort. All parties discard the current session. They may retry with a new session excluding the accused party.

Exclude. The accused party is removed from the participant set. The remaining parties continue if the threshold is still met.

Penalize. In applications with economic stakes the accused party may lose a deposit or face other penalties.

Complaint Verification

A complaint is verifiable when it includes evidence. For a commitment opening failure the evidence is the commitment, the revealed pair, and a proof that they do not match. Other parties can independently verify the complaint.

Handling Byzantine Signers

In threshold signing, Byzantine signers may attempt to disrupt the protocol by sending invalid partial signatures. The protocol handles such behavior through per-share validation, allowing honest parties to detect and exclude misbehavior without halting progress.

Per-Share Validation

The aggregator treats each partial signature as an independent object. For each partial from signer , the aggregator performs three validation checks.

Structural validity checks that the partial is well-formed and consistent with commitments. The partial must have correct dimension, correct modulus, and match the commitment structure.

Norm bound checking verifies . This ensures each partial satisfies the local bound that, when combined with other valid partials, produces a globally valid signature.

Algebraic validity verifies the partial against the signer's public share . This is analogous to FROST's per-share verification relation. The partial must satisfy a relation like where is derived from the commitment.

Invalid Share Handling

If any check fails, the aggregator discards that partial share. The aggregator does not attempt to include that share in the aggregate. Instead, it records evidence of misbehavior. This evidence can support later exclusion decisions or penalties in consensus protocols.

The aggregator then waits for valid shares from other signers, continuing until it has collected valid partials (where under the BFT assumption).

Byzantine Resilience Guarantee

Because there are at least honest validators, there exist at least honest signers whose partials will be valid. The aggregator can always collect valid partials and complete the signature despite up to malicious validators.

This ensures that no single participant, nor any coalition of size , can prevent the threshold signature from being formed. Malicious signers cannot force global rejection or restart. They can only delay the process by not participating or by sending invalid shares that are simply rejected.

Misbehavior Detection

Per-share validation enables attribution. Each invalid partial can be tied to a specific signer via its ID. This makes misbehavior identifiable and attributable even though the final signature itself remains a standard Dilithium signature under the group key.

The protocol transcript contains all per-signer checks, allowing external auditors to verify which parties sent invalid shares. This supports automated exclusion policies or manual review in consensus systems.

Share Refresh

Share refresh updates the shares without changing the public key. This procedure provides proactive security. Even if an adversary compromises old shares the refreshed shares are independent.

Mask Function Structure

Masks are represented as functions from party identifier to secret.

structure MaskFn (S : Scheme) :=
  (mask : S.PartyId → S.Secret)

instance (S : Scheme) : Join (MaskFn S) :=
  ⟨fun a b => { mask := fun pid => a.mask pid + b.mask pid }⟩

The join merges masks by pointwise addition.

Zero-Sum Mask Invariant

The zero-sum property is captured in a dependent structure.

structure ZeroSumMaskFn (S : Scheme) (active : Finset S.PartyId) :=
  (fn : MaskFn S)
  (sum_zero : (active.toList.map (fun pid => fn.mask pid)).sum = 0)

instance (S : Scheme) (active : Finset S.PartyId) : Join (ZeroSumMaskFn S active)

Merging zero-sum masks preserves zero-sum on the same active set.

Refresh Overview

Each party holds a share . After refresh each party holds a new share . The masks sum to zero.

This ensures the master secret is unchanged.

Refresh Function

Applies a mask to each share and recomputes the corresponding public share.

def refreshShares
  (S : Scheme)
  (m : MaskFn S)
  (shares : List (KeyShare S)) : List (KeyShare S) :=
  shares.map (fun ks =>
    let sk' := ks.secret + m.mask ks.pid
    let pk' := S.A sk'
    KeyShare.create S ks.pid sk' pk' ks.pk)

Mask Generation

The masks must sum to zero. One approach is as follows.

  1. Each party samples a random mask contribution .
  2. Parties run a protocol to compute a common offset .
  3. Each party sets in a way that ensures .

An alternative uses pairwise contributions. Party sends to party and receives from . The net contribution is . These contributions cancel pairwise.

Public Share Update

After refresh each party computes its new public share.

The global public key remains unchanged because

Refresh Protocol

A full refresh protocol runs as follows.

  1. Each party generates its mask contribution.
  2. Parties exchange commitments to their contributions.
  3. Parties reveal their contributions.
  4. Each party verifies all commitments.
  5. Each party computes its new share.
  6. Parties publish new commitments to their public shares.
  7. External verifiers can check the consistency of the new public shares.

Refresh Coordination Protocol

The Protocol/Shares/RefreshCoord.lean module provides a distributed protocol for generating zero-sum masks without a trusted coordinator.

Phases:

  1. Commit: Each party commits to their random mask
  2. Reveal: Parties reveal masks after all commits received
  3. Adjust: Coordinator computes adjustment to achieve zero-sum
  4. Apply: All parties apply masks to their shares

CRDT Design: Uses MsgMap for commits and reveals to ensure at most one message per party. This makes conflicting messages un-expressable at the type level.

abbrev MaskCommitMap (S : Scheme) [BEq S.PartyId] [Hashable S.PartyId] :=
  MsgMap S (MaskCommitMsg S)

abbrev MaskRevealMap (S : Scheme) [BEq S.PartyId] [Hashable S.PartyId] :=
  MsgMap S (MaskRevealMsg S)

structure RefreshRoundState (S : Scheme) [BEq S.PartyId] [Hashable S.PartyId] where
  phase : RefreshPhase
  parties : List S.PartyId
  roundNum : Nat
  coordStrategy : CoordinatorStrategy S.PartyId
  maskCommits : MaskCommitMap S    -- at most one per party
  maskReveals : MaskRevealMap S    -- at most one per party
  adjustment : Option (AdjustmentMsg S)

Commitment Design: The protocol commits to A(mask) (the public image) rather than the mask itself. This enables:

  • Public verifiability without revealing the secret mask
  • Consistency with DKG (which commits to public shares)
  • Simple zero-sum verification via A(Σ m_i)

Architecture: Separated CRDT and Validation

The protocol separates concerns into two layers:

  1. CRDT Layer - Pure merge semantics for replication/networking
  2. Validation Layer - Conflict detection for auditing/security
-- CRDT merge: idempotent, always succeeds, silently ignores duplicates
def processCommit (S : Scheme) [BEq S.PartyId] [Hashable S.PartyId]
    (st : RefreshRoundState S) (msg : MaskCommitMsg S)
    : RefreshRoundState S

def processReveal (S : Scheme) [BEq S.PartyId] [Hashable S.PartyId]
    (st : RefreshRoundState S) (msg : MaskRevealMsg S)
    : RefreshRoundState S

-- Validation: detect conflicts without modifying state
def detectCommitConflict (S : Scheme) [BEq S.PartyId] [Hashable S.PartyId]
    (st : RefreshRoundState S) (msg : MaskCommitMsg S)
    : Option (MaskCommitMsg S)

def validateCommit (S : Scheme) [BEq S.PartyId] [Hashable S.PartyId]
    (st : RefreshRoundState S) (msg : MaskCommitMsg S)
    : Option (CommitValidationError S)

-- Combined: merge + validation in one call
def processCommitValidated (S : Scheme) [BEq S.PartyId] [Hashable S.PartyId]
    (st : RefreshRoundState S) (msg : MaskCommitMsg S)
    : RefreshRoundState S × Option (CommitValidationError S)

Design Rationale:

  • CRDT functions can be used directly for networking without error handling in merge path
  • Validation is explicit and composable. Call it when you need it.
  • Follows "make illegal states unrepresentable" at CRDT level, "detect suspicious patterns" separately

Coordinator Selection:

inductive CoordinatorStrategy (PartyId : Type*)
  | fixed (pid : PartyId)      -- always same coordinator
  | roundRobin (round : Nat)   -- rotate based on round number
  | random (seed : Nat)        -- pseudo-random selection

Zero-Sum Verification: The coordinator computes adjustment ensuring :

def computeAdjustment (S : Scheme) [BEq S.PartyId] [Hashable S.PartyId]
    [DecidableEq S.PartyId] [Inhabited S.PartyId]
    (st : RefreshRoundState S) : Option S.Secret :=
  if st.phase = .adjust then
    let coord := st.coordinator
    let otherMasks := st.maskReveals.toList.filter (fun r => r.sender ≠ coord)
    let sumOthers := (otherMasks.map (·.mask)).sum
    some (-sumOthers)
  else none

DKG-Based Refresh Protocol

The Protocol/Shares/RefreshDKG.lean module provides a fully distributed refresh protocol without a trusted coordinator. Each party contributes a zero-polynomial (polynomial with constant term 0), ensuring the master secret remains unchanged while all shares are updated.

Three-Round Protocol:

  1. Round 1 (Commit): Each party samples a random polynomial with and broadcasts a commitment to its polynomial coefficients.

  2. Round 2 (Share): Each party evaluates its polynomial at all other parties' identifiers and sends the shares privately.

  3. Round 3 (Verify & Apply): Each party verifies received shares against commitments and computes its refresh delta as the sum of all received shares.

structure RefreshLocalState (S : Scheme) where
  pid : S.PartyId
  coefficients : List S.Secret  -- [0, a₁, a₂, ..., a_{t-1}]
  opening : S.Opening

structure RefreshDelta (S : Scheme) where
  pid : S.PartyId
  delta : S.Secret  -- sum of all f_j(pid)

def refreshRound1 (S : Scheme) (pid : S.PartyId)
    (randomCoeffs : List S.Secret) (opening : S.Opening)
    : RefreshLocalState S × RefreshCommitMsg S

def refreshRound3 (S : Scheme)
    (st : RefreshLocalState S)
    (commits : List (RefreshCommitMsg S))
    (shares : List (RefreshShareMsg S))
    : Except (RefreshDKGError S.PartyId) (RefreshDelta S)

Security Properties:

  • No trusted coordinator: Unlike RefreshCoord.lean, no single party learns all masks
  • Verifiable: Polynomial commitments allow verification of received shares
  • Zero-sum by construction: Since each , the sum of deltas preserves the master secret:

Verification:

def verifyZeroSum (S : Scheme) (deltas : List (RefreshDelta S)) : Bool :=
  (deltas.map (·.delta)).sum = 0

Public Key Invariance Theorem

If the mask sums to zero over the participant list, the global public key is unchanged.

lemma refresh_pk_unchanged
  (S : Scheme)
  (m : MaskFn S)
  (shares : List (KeyShare S))
  (hsum : (shares.map (fun ks => m.mask ks.pid)).sum = 0) :
  let refreshed := refreshShares S m shares
  (refreshed.foldl (fun acc ks => acc + ks.pk_i) (0 : S.Public)) =
  (shares.foldl (fun acc ks => acc + ks.pk_i) (0 : S.Public))

Refresh State (CRDT)

Refresh state forms a semilattice for conflict-free replication.

structure RefreshState (S : Scheme) :=
  (mask : MaskFn S)

instance (S : Scheme) : Join (RefreshState S)
instance (S : Scheme) : SemilatticeSup (RefreshState S)

Security Properties

Forward secrecy. Old shares reveal nothing about new shares.

Backward secrecy. New shares reveal nothing about old shares.

Invariant preservation. The master secret and global public key are unchanged.

Share Repair

Share repair allows a party to recover a lost share. The recovering party receives help from other parties.

Repair Request Structure

Messages for initiating and responding to repair requests. Bundles merge via list append.

structure RepairRequest (S : Scheme) :=
  (requester : S.PartyId)
  (knownPk_i : S.Public)  -- The public share is still known

structure RepairMsg (S : Scheme) where
  sender : S.PartyId  -- helper party
  target : S.PartyId  -- requester (for routing)
  delta  : S.Secret   -- weighted share contribution

structure RepairBundle (S : Scheme) :=
  (msgs : List (RepairMsg S))

instance (S : Scheme) : Join (RepairBundle S) := ⟨fun a b => ⟨a.msgs ++ b.msgs⟩⟩

Repair Session State

The repair session tracks protocol progress with CRDT merge semantics.

structure RepairSession (S : Scheme) :=
  (request   : RepairRequest S)
  (helpers   : Finset S.PartyId)
  (received  : RepairBundle S)
  (threshold : Nat)

instance (S : Scheme) : Join (RepairSession S)

Repair Scenario

Party has lost its share . It wants to recover the share without other parties learning it.

Helper Protocol

Each helper sends a masked delta to . The deltas are structured so that their sum equals .

Let denote the message from to . The messages satisfy

Repair Function

Sums the deltas from all helper messages to reconstruct the lost share.

def repairShare
  (S : Scheme)
  (msgs : List (RepairMsg S))
  : S.Secret :=
  (msgs.map (·.delta)).sum

Correctness

The repaired share equals the original when deltas sum correctly.

lemma repair_correct
  (S : Scheme)
  (msgs : List (RepairMsg S))
  (sk_i : S.Secret)
  (h : msgs.foldl (fun acc m => acc + m.delta) (0 : S.Secret) = sk_i) :
  repairShare S msgs = sk_i

Privacy (Zero-Sum Masks)

Additional zero-sum masks do not change the repaired value:

lemma repair_masked_zero
  (S : Scheme)
  (msgs : List (RepairMsg S))
  (mask : List S.Secret)
  (hmask : mask.sum = 0)
  (hlen : mask.length = msgs.length) :
  repairShare S (List.zipWith (fun m z => { m with delta := m.delta + z }) msgs mask)
    = repairShare S msgs

Construction

One construction uses Lagrange interpolation. If the shares are polynomial shares evaluated at distinct points then any parties can reconstruct the share at any point.

The Lagrange coefficient for party evaluating at 0 over set is:

These coefficients satisfy the partition of unity property: .

The implementation verifies these properties against Mathlib's Lagrange module. These lemmas confirm equivalence with the standard formulation and the partition of unity property.

-- Our formula equals Mathlib's basis polynomial evaluated at 0
lemma lagrangeCoeffAtZero_eq_basis_eval :
  lagrangeCoeffAtZero S pidToScalar Sset i = (Lagrange.basis s v i).eval 0

-- Lagrange coefficients sum to 1 (partition of unity)
lemma lagrangeCoeffs_sum_one (hs : s.Nonempty) :
  (∑ i ∈ s, (Lagrange.basis s v i).eval 0) = 1

-- Weighted sum reconstructs polynomial at 0
lemma lagrange_interpolation_at_zero :
  (∑ i ∈ s, (Lagrange.basis s v i).eval 0 * r i) = p.eval 0

For additive shares a simpler approach works. Each helper sends a random share of its own share . The recovering party combines these to reconstruct its share through a pre-established linear relation.

Helper Contribution Function

Each helper scales its share by a Lagrange coefficient and sends the result.

def helperContribution
  (S : Scheme)
  (helper : KeyShare S)
  (requester : S.PartyId)
  (coefficient : S.Scalar)
  : RepairMsg S :=
  { sender := helper.pid
  , target := requester
  , delta  := coefficient • helper.secret }

Verification

Checks that the repaired secret maps to the expected public share.

def verifyRepairedShare
  (S : Scheme)
  (repairedSk : S.Secret)
  (expectedPk : S.Public)
  : Prop :=
  S.A repairedSk = expectedPk

Threshold Completion

Checks whether enough helpers have responded and attempts to complete the repair.

def hasEnoughHelpers
  (S : Scheme) [DecidableEq S.PartyId]
  (session : RepairSession S)
  : Bool :=
  (helperPids S session.received.msgs).length ≥ session.threshold

def tryCompleteRepair
  (S : Scheme) [DecidableEq S.PartyId]
  (session : RepairSession S)
  : Option S.Secret

Security

Privacy. No single helper learns . Zero-sum masking ensures that additional randomness does not affect the reconstructed value.

Correctness. The recovered share equals the original share. Verified via verifyRepairedShare.

Threshold. At least helpers must participate. Enforced by hasEnoughHelpers.

Robustness. If threshold is met, tryCompleteRepair returns Some.

Repair Protocol Steps

  1. Party broadcasts a RepairRequest with its known public share.
  2. Each helper computes its delta via helperContribution.
  3. Each helper sends the RepairMsg to over a secure channel.
  4. Party collects messages in a RepairSession (CRDT-mergeable).
  5. When hasEnoughHelpers returns true, party calls tryCompleteRepair.
  6. Party verifies via verifyRepairedShare.

Repair Coordination Protocol

The Protocol/Shares/RepairCoord.lean module provides a commit-reveal protocol for coordinating repair contributions.

Phases:

  1. Request: Requester broadcasts repair request with known
  2. Commit: Helpers commit to their Lagrange-weighted contributions
  3. Reveal: Helpers reveal contributions after all commits received
  4. Verify: Requester verifies

CRDT Design: Uses MsgMap for commits and reveals to ensure at most one message per helper. This makes conflicting messages un-expressable at the type level.

abbrev ContribCommitMap (S : Scheme) [BEq S.PartyId] [Hashable S.PartyId] :=
  MsgMap S (ContribCommitMsg S)

abbrev ContribRevealMap (S : Scheme) [BEq S.PartyId] [Hashable S.PartyId] :=
  MsgMap S (ContribRevealMsg S)

inductive RepairPhase
  | request | commit | reveal | verify | done

structure RepairCoordState (S : Scheme) [BEq S.PartyId] [Hashable S.PartyId] where
  phase : RepairPhase
  request : RepairRequest S
  helpers : List S.PartyId
  threshold : Nat
  commits : ContribCommitMap S    -- at most one per helper
  reveals : ContribRevealMap S    -- at most one per helper
  repairedShare : Option S.Secret

Architecture: Separated CRDT and Validation

Like RefreshCoord, RepairCoord separates CRDT merge from validation:

-- CRDT merge: idempotent, always succeeds, silently ignores duplicates/non-helpers
def processContribCommit (S : Scheme) [BEq S.PartyId] [Hashable S.PartyId]
    (st : RepairCoordState S) (msg : ContribCommitMsg S)
    : RepairCoordState S

def processContribReveal (S : Scheme) [BEq S.PartyId] [Hashable S.PartyId]
    (st : RepairCoordState S) (msg : ContribRevealMsg S)
    : RepairCoordState S

-- Validation: detect conflicts without modifying state
def detectContribCommitConflict (S : Scheme) [BEq S.PartyId] [Hashable S.PartyId]
    (st : RepairCoordState S) (msg : ContribCommitMsg S)
    : Option (ContribCommitMsg S)

def validateContribCommit (S : Scheme) [BEq S.PartyId] [Hashable S.PartyId]
    (st : RepairCoordState S) (msg : ContribCommitMsg S)
    : Option (ContribCommitValidationError S)  -- wrongPhase | notHelper | conflict

-- Combined: merge + validation in one call
def processContribCommitValidated (S : Scheme) [BEq S.PartyId] [Hashable S.PartyId]
    (st : RepairCoordState S) (msg : ContribCommitMsg S)
    : RepairCoordState S × Option (ContribCommitValidationError S)

Usage Patterns:

  • Networking: Use processContribCommit/processContribReveal directly for replication
  • Auditing: Call detectContribCommitConflict to check for duplicate messages
  • Strict mode: Use processContribCommitValidated for both merge and error reporting

Helper State: Each helper maintains local state for participation:

structure HelperLocalState (S : Scheme) where
  helperId : S.PartyId
  keyShare : KeyShare S
  lagrangeCoeff : S.Scalar
  contribution : S.Secret        -- λ_j · sk_j
  commitment : S.Commitment
  opening : S.Opening

Lagrange Coefficient Computation: Uses the unified Protocol/Core/Lagrange.lean module:

def lagrangeCoefficient [Field F] [DecidableEq F]
    (partyScalar : F) (otherScalars : List F) : F :=
  otherScalars.foldl (fun acc m =>
    if m = partyScalar then acc
    else acc * (m / (m - partyScalar))) 1

Transcript Creation: The createTranscript function extracts ordered lists from MsgMap for verification:

def createTranscript (S : Scheme) [BEq S.PartyId] [Hashable S.PartyId]
    (st : RepairCoordState S) : Option (RepairTranscript S) :=
  if st.phase = .done then
    some { commits := st.commits.toList.map (·.2)
           reveals := st.reveals.toList.map (·.2)
           repairedShare := st.repairedShare.getD default }
  else none

Verification: The repaired share is verified against the known public share:

def completeRepair (S : Scheme) [BEq S.PartyId] [Hashable S.PartyId]
    [DecidableEq S.Public]
    (st : RepairCoordState S) : RepairCoordState S :=
  if st.phase = .verify then
    let repairedSk := aggregateContributions S st.reveals.toList
    if S.A repairedSk = st.request.knownPk_i then
      { st with repairedShare := some repairedSk, phase := .done }
    else st
  else st

Rerandomization

Rerandomization adds privacy by masking shares and nonces. It prevents linking between signing sessions.

Rerandomization Masks Structure

Holds zero-sum masks for both shares and nonces over a specific signer set. Masks merge by pointwise addition.

/-- Raw mask functions without zero-sum proof. -/
structure RawRerandMasks (S : Scheme) :=
  (shareMask : S.PartyId → S.Secret)
  (nonceMask : S.PartyId → S.Secret)

instance (S : Scheme) : Join (RawRerandMasks S)

/-- Zero-sum masks for rerandomizing signatures over a specific signer set. -/
structure RerandMasks (S : Scheme) (Sset : List S.PartyId) :=
  (masks : RawRerandMasks S)
  (shareSumZero : (Sset.map masks.shareMask).sum = 0)
  (nonceSumZero : (Sset.map masks.nonceMask).sum = 0)

instance (S : Scheme) (Sset : List S.PartyId) : Join (RerandMasks S Sset)

Motivation

Without rerandomization the same shares produce related signatures. An observer might link signatures to the same key or signer set. Rerandomization breaks this linkage.

State Rerandomization

Applies masks to both the nonce and share within a signing state.

/-- Apply raw rerandomization masks to signing local state. -/
def rerandStateRaw
  (S : Scheme)
  (masks : RawRerandMasks S)
  (st : SignLocalState S) : SignLocalState S :=
let newSk := st.share.secret + masks.shareMask st.share.pid
let newPk := S.A newSk
{ st with
  y_i := st.y_i + masks.nonceMask st.share.pid,
  w_i := S.A (st.y_i + masks.nonceMask st.share.pid),
  share := KeyShare.create S st.share.pid newSk newPk st.share.pk }

/-- Apply zero-sum rerandomization masks to signing local state. -/
def rerandState
  (S : Scheme)
  (Sset : List S.PartyId)
  (masks : RerandMasks S Sset)
  (st : SignLocalState S) : SignLocalState S :=
  rerandStateRaw S masks.masks st

Share Rerandomization

Before signing each party adds a mask to its share.

The masks must sum to zero as in refresh. The signing protocol uses the masked shares. The verification equation remains valid because the master secret is unchanged.

Nonce Rerandomization

Similarly parties can add masks to their nonces.

If then the aggregated nonce is unchanged.

Signature Preservation Theorem

Zero-sum masks cancel out in the aggregated signature.

lemma rerand_preserves_sig
  (S : Scheme)
  (masks : RerandMasks S)
  (Sset : List S.PartyId)
  (c : S.Challenge)
  (commits : List S.Commitment)
  (shares : List (SignShareMsg S)) :
  (shares.map (fun sh => masks.shareMask sh.sender)).sum = 0 →
  aggregateSignature S c Sset commits
    (shares.map (fun sh => { sh with z_i := sh.z_i + masks.shareMask sh.sender }))
    = aggregateSignature S c Sset commits shares

Combined Rerandomization

A full rerandomization applies masks to both shares and nonces. The masks must satisfy the zero-sum property independently.

Unlinkability

Rerandomization provides unlinkability. An adversary who sees multiple signatures cannot determine if they were produced by the same set of parties. Each session appears independent.

Protocol Integration

Rerandomization integrates into the signing protocol as follows.

  1. Before round one parties agree on a session identifier.
  2. Each party derives its masks from the session identifier and randomness.
  3. Parties exchange commitments to their masked values.
  4. The signing protocol proceeds with masked shares and nonces.
  5. The final signature is identical to one produced without rerandomization.

Party Addition and Removal

The protocol can add or remove parties from the signer set.

Party Removal

To remove party the remaining parties redistribute its share. This is equivalent to a repair followed by a refresh. The share is recovered and then split among the remaining parties.

Party Addition

To add party the existing parties run a refresh that creates a new share. The new share is generated such that the shares still sum to the master secret.

Threshold Adjustment

Changing the threshold requires restructuring the shares. Moving from -of- to -of- requires a new secret sharing polynomial of degree . This is more complex than simple refresh.

Key Rotation

Key rotation changes the master secret and public key. Unlike refresh which preserves the key rotation generates a new key.

Rotation Procedure

  1. Parties run a new DKG protocol.
  2. The new protocol produces fresh shares for a new secret .
  3. The new public key is .
  4. External systems update to use the new public key.

Migration

Applications must migrate from the old key to the new key. Signatures under the old key remain valid. New signatures use the new key.

Composite State (CRDT)

Protocol phase state combines with refresh, repair, and rerandomization state in a product semilattice. Merge is provided by the product semilattice instances.

abbrev CompositeState (S : Scheme) (p : Phase) :=
  (State S p) × RefreshState S × RepairBundle S × RerandMasks S

Threshold-Aware Merge

For threshold protocols, the ShareWithCtx structure couples share state with threshold context. When two replicas merge, the active signer sets may differ. The merge must:

  1. Union the active sets: merged.active = a.active ∪ b.active
  2. Preserve threshold: merged.t = max(a.t, b.t)
  3. Prove validity: merged.t ≤ |merged.active|
  4. Recompute coefficients: Lagrange depend on the signer set

Merge Strategies

Three merge strategies are provided:

Conservative merge (ones): Falls back to n-of-n with simple sum. No field required but loses threshold efficiency.

def mergeShareWithCtxOnes (S : Scheme) [DecidableEq S.PartyId]
  (a b : ShareWithCtx S) : ShareWithCtx S

Full merge with recompute: Recomputes Lagrange coefficients for merged set. Preserves t-of-n semantics but requires a Field instance.

def mergeShareWithCtx
  (S : Scheme) [Field S.Scalar] [DecidableEq S.PartyId]
  (pidToScalar : S.PartyId → S.Scalar)
  (a b : ShareWithCtx S) : ShareWithCtx S

Auto merge: Selects strategy based on field availability. Pass some inferInstance for field-backed schemes or none for lattice-only builds.

def mergeShareWithCtxAuto
  (S : Scheme) [DecidableEq S.PartyId]
  (fieldInst : Option (Field S.Scalar))
  (pidToScalar : S.PartyId → S.Scalar)
  (a b : ShareWithCtx S) : ShareWithCtx S

Cardinality Proof Maintenance

The merge must prove that max(t_a, t_b) ≤ |a.active ∪ b.active|. This follows from:

  • a.ctx.t ≤ |a.active| ≤ |a.active ∪ b.active| (subset monotonicity)
  • b.ctx.t ≤ |b.active| ≤ |a.active ∪ b.active| (subset monotonicity)
  • Therefore max(t_a, t_b) ≤ |merged.active|

Type-Indexed Phase Transitions

The Protocol/State/PhaseIndexed.lean module encodes protocol phases at the type level. Invalid phase transitions become compile-time errors rather than runtime failures.

Phase-Indexed State

State is indexed by the current phase, ensuring only valid operations are available:

inductive PhaseState (S : Scheme) : Phase → Type where
  | commit  : CommitPhaseData S → PhaseState S .commit
  | reveal  : RevealPhaseData S → PhaseState S .reveal
  | shares  : SharePhaseData S → PhaseState S .shares
  | done    : DonePhaseData S → PhaseState S .done

Each phase carries phase-specific data. The runtime phase state (Protocol/State/Phase.lean) uses MsgMap for conflict-free message storage, while the type-indexed version (Protocol/State/PhaseIndexed.lean) uses lists for simpler type signatures:

-- Type-indexed version (PhaseIndexed.lean) - simpler types
structure CommitPhaseData (S : Scheme) where
  commits : List (DkgCommitMsg S)
  expectedParties : Nat

structure RevealPhaseData (S : Scheme) where
  commits : List (DkgCommitMsg S)
  reveals : List (DkgRevealMsg S)
  commitsComplete : commits.length = expectedParties
  expectedParties : Nat

-- Runtime version (Phase.lean) - conflict-free by construction
structure CommitState (S : Scheme) [BEq S.PartyId] [Hashable S.PartyId] where
  commits : CommitMap S   -- MsgMap: at most one per sender

structure RevealState (S : Scheme) [BEq S.PartyId] [Hashable S.PartyId] where
  commits : CommitMap S
  reveals : RevealMap S   -- MsgMap: at most one per sender

Type-Safe Transitions

Transitions are functions between specific phase types:

-- Can only be called when in commit phase
def transitionToReveal (S : Scheme)
    (st : PhaseState S .commit)
    (hready : match st with | .commit data => data.commits.length = data.expectedParties)
    : PhaseState S .reveal

-- Can only be called when in reveal phase
def transitionToShares (S : Scheme)
    (st : PhaseState S .reveal)
    (activeSigners : Finset S.PartyId)
    (message : S.Message)
    (threshold : Nat)
    (hready : match st with | .reveal data => data.reveals.length = data.expectedParties)
    : PhaseState S .shares

Phase Monotonicity

Phase ordering is proven at the type level:

def Phase.toNat : Phase → Nat
  | .commit => 0 | .reveal => 1 | .shares => 2 | .done => 3

theorem reveal_advances : Phase.commit < Phase.reveal
theorem shares_advances : Phase.reveal < Phase.shares
theorem done_advances : Phase.shares < Phase.done

Benefits

  1. Compile-time safety: Cannot call addReveal in commit phase
  2. Self-documenting: Type signatures show valid transitions
  3. Proof automation: Phase ordering proofs are trivial
  4. Extraction functions: Data access is phase-aware
def getCommits (S : Scheme) : {p : Phase} → PhaseState S p → List (DkgCommitMsg S)
  | .commit, .commit data => data.commits
  | .reveal, .reveal data => data.commits
  | .shares, .shares data => data.commits
  | .done, .done _ => []

Error Recovery Patterns

This section describes common error scenarios and recovery strategies across protocol phases.

DKG Failures

DKG failures occur when key generation cannot complete successfully.

Insufficient participants. If fewer than threshold parties complete the protocol, DKG cannot produce a valid key. Recovery requires restarting DKG with a different participant set or waiting for offline parties.

Too many complaints. If more than f parties receive valid complaints, the faulty party count exceeds tolerance. Recovery options include identifying and excluding faulty parties then restarting, or aborting if the faulty count exceeds trust assumptions.

Commitment mismatch. When a party's reveal does not match their commitment, they are excluded. If exclusion leaves fewer than threshold honest parties, DKG must restart.

-- Check if DKG can continue after exclusions
def canContinueDkg (totalParties exclusions threshold : Nat) : Bool :=
  totalParties - exclusions >= threshold

-- Recovery action based on complaint count
def dkgRecoveryAction (complaints faultTolerance : Nat) : DkgRecoveryAction :=
  if complaints = 0 then .continue
  else if complaints <= faultTolerance then .excludeAndContinue
  else .abort

Signing Session Failures

Signing sessions can fail for liveness or security reasons.

Timeout waiting for commitments. Some parties did not send commitments within the deadline. If at least threshold parties committed, the session can proceed without the missing parties. Otherwise, abort and retry with explicit participant selection.

Timeout waiting for shares. Parties committed but did not send shares. Same recovery as commitment timeout. Parties who committed but did not reveal are identified for potential exclusion.

Norm bound exceeded in aggregation. With local rejection sampling, individual shares exceeding bounds are rejected rather than triggering session restart. If fewer than threshold valid shares remain after rejection, the session fails.

-- Determine next action based on valid share count
def signingRecoveryAction (validShares threshold : Nat) : SigningRecoveryAction :=
  if validShares >= threshold then .aggregateValidShares
  else .retryWithDifferentSigners

Refresh Protocol Failures

Refresh failures affect share updates without changing the underlying key.

Coordinator failure. If the coordinator becomes unresponsive, select a new coordinator using the rotation strategy and restart the refresh round. Partial state is discarded.

Zero-sum violation. If masks do not sum to zero, the coordinator computed an incorrect adjustment. This indicates coordinator malfunction. Restart with a different coordinator.

Insufficient mask commits. If parties do not commit masks, refresh cannot proceed. Wait for additional commits or abort if the deadline passes.

-- Coordinator selection on failure
def selectNewCoordinator (strategy : CoordinatorStrategy PartyId)
    (failed : PartyId) (round : Nat) : PartyId :=
  match strategy with
  | .fixed pid => pid  -- Single point of failure
  | .roundRobin _ => computeNextCoordinator (round + 1)
  | .random seed => computeRandomCoordinator (seed + 1)

Repair Protocol Failures

Repair failures occur when recovering a lost share.

Insufficient helpers. Fewer than threshold parties responded with contributions. Request assistance from additional parties or abort if insufficient parties are available.

Verification failure. The repaired share does not match the known public share. This indicates either corrupted contributions or an incorrect public share. Retry with a different helper set.

Commitment mismatch during reveal. A helper's contribution reveal does not match their commitment. Exclude that helper and request a replacement if available.

-- Check if repair can complete
def canCompleteRepair (receivedHelpers threshold : Nat) : Bool :=
  receivedHelpers >= threshold

-- Retry strategy for repair
def repairRetryStrategy (helpers : List PartyId) (failed : List PartyId)
    (available : List PartyId) : Option (List PartyId) :=
  let remaining := helpers.filter (fun h => h ∉ failed)
  let needed := threshold - remaining.length
  if needed <= available.length then
    some (remaining ++ available.take needed)
  else
    none

Abort Coordination

When recovery is not possible, sessions must abort cleanly.

Liveness aborts require f+1 votes before taking effect. This prevents a minority from disrupting the protocol. Collect abort proposals and check against the abort threshold.

Security aborts take effect immediately without voting. When a trust violation or other security issue is detected, any party can trigger immediate abort.

Post-abort state should be cleaned up. Consumed nonces are invalidated. Session state is marked as aborted. New sessions require fresh nonces.

-- Determine abort handling
def handleAbort (reason : AbortReason PartyId) (votes : Nat)
    (cfg : ThresholdConfig) : AbortHandling :=
  if reason.isImmediate then .abortNow
  else if votes >= cfg.abortThreshold then .abortNow
  else .collectMoreVotes

-- Post-abort cleanup
def cleanupAbortedSession (session : Nat) (st : NonceState S) : NonceState S :=
  -- Mark session as used to prevent nonce reuse
  { st with usedSessions := Insert.insert session st.usedSessions }

General Recovery Principles

Follow these principles when implementing error recovery.

Fail fast for security violations. Security issues like trust violations or nonce reuse should trigger immediate abort without waiting for consensus.

Retry with exclusion for misbehaving parties. When specific parties cause failures, exclude them from retry attempts rather than retrying with the same set.

Preserve nonce safety across retries. Never reuse a nonce even if the session failed. Generate fresh nonces for each signing attempt.

Log blame for auditing. Use BlameableError and collectBlame to track which parties caused failures. This information supports manual review and automated exclusion policies.

Prefer local recovery over distributed coordination. Local rejection sampling handles norm bounds locally. Only escalate to session abort for failures that require distributed consensus.