First Principles Framework

Structured reasoning plugin that makes AI decision-making transparent and auditable through hypothesis generation, logical verification, and evidence-based validation.

Focused on:

  • Transparent reasoning - All decisions documented with full audit trails

  • Hypothesis-driven analysis - Generate competing alternatives before evaluating

  • Evidence-based validation - Computed reliability scores, not estimates

  • Human-in-the-loop - AI generates options; humans decide (Transformer Mandate)

Plugin Target

  • Make AI reasoning auditable - full trail from hypothesis to decision

  • Prevent premature conclusions - enforce systematic evaluation of alternatives

  • Build project knowledge over time - decisions become reusable knowledge

  • Enable informed decision-making - trust scores based on evidence quality

Overview

The FPF plugin implements structured reasoning using the First Principles Framework methodology developed by Anatoly Levenchuk a methodology for rigorous, auditable reasoning. The killer feature is turning the black box of AI reasoning into a transparent, evidence-backed audit trail.

The core cycle follows three modes of inference:

  • Abduction — Generate competing hypotheses (don't anchor on the first idea).

  • Deduction — Verify logic and constraints (does the idea make sense?).

  • Induction — Gather evidence through tests or research (does the idea work in reality?).

Then, audit for bias, decide, and document the rationale in a durable record.

The framework addresses a fundamental challenge in AI-assisted development: making decision-making processes transparent and auditable. Rather than having AI jump to solutions, FPF enforces generating competing hypotheses, checking them logically, testing against evidence, then letting developers choose the path forward.

Warning: This plugin loads the core FPF specification into context, which is large (~600k tokens). As a result it loaded into a subagent with Sonnet[1m] model. But such agent can consume your token limit quickly.

Implementation based on quint-code by m0n0x41d.

Quick Start

Workflow Diagram

Commands Overview

/fpf:propose-hypotheses - Decision Cycle

Execute the complete FPF cycle from hypothesis generation through evidence validation to decision.

  • Purpose - Make architectural decisions with full audit trail

  • Output - .fpf/decisions/DRR-<date>-<topic>.md with winner and rationale

Arguments

Natural language description of the decision or problem. Examples: "What caching strategy should we use?" or "How should we deploy our application?"

How It Works - ADI Cycle

The workflow follows three inference modes:

  1. Initialize Context - Creates .fpf/ directory structure and captures problem constraints

  2. Abduction: Generate Hypotheses - FPF agent generates 3-5 generate plausible, diverse, and competing hypotheses in L0 folder. How it works:

    • You pose a problem or question

    • The AI (as Abductor persona) generates 3-5 candidate explanations or solutions

    • Each hypothesis is stored in L0/ (unverified observations)

    • No hypothesis is privileged — anchoring bias is the enemy

    Output: Multiple L0 claims, each with:

    • Clear statement of the hypothesis

    • Initial reasoning for plausibility

    • Identified assumptions and constraints

  3. User Input - Presents hypothesis table, allows user to add alternatives

  4. Deduction: Verify Logic (Parallel) - Checks each hypothesis against constraints and typing, promotes to L1 or invalidates How it works:

    • The AI (as Verifier persona) checks each L0 hypothesis for:

    • Internal logical consistency

    • Compatibility with known constraints

    • Type correctness (does the solution fit the problem shape?)

    • Hypotheses that pass are promoted to L1/

    • Hypotheses that fail are moved to invalid/ with explanation

    Output: L1 claims (logically sound) or invalidation records.

  5. Induction: Validate Evidence (Parallel) - Gather empirical evidence through tests or research, promotes L1 hypotheses to L2 How it works:

    • For internal claims: run tests, measure performance, verify behavior

    • For external claims: research documentation, benchmarks, case studies

    • Evidence is attached with:

    • Source and date (for decay tracking)

    • Congruence rating (how well does external evidence match our context?)

    • Claims that pass validation are promoted to L2/

    Output: L2 claims (empirically verified) with evidence chain.

  6. Audit(Parallel) - Compute trust score R_eff using

    • WLNK (Weakest Link): Assurance = min(evidence levels)

    • Congruence Check: Is external evidence applicable to our context?

    • Bias Detection: Are we anchoring on early hypotheses?

  7. Make Decision: Presents comparison table, selects winner, creates Design Rationale Record (DRR) which captures:

    • decision

    • alternatives considered

    • evidence

    • expiry conditions

  8. Present Summary: Shows DRR, winner rationale, and next steps

Usage Examples

When to Use

Use it for:

  • Architectural decisions with long-term consequences

  • Multiple viable approaches requiring systematic evaluation

  • Decisions that need an auditable reasoning trail

  • Building up project knowledge over time Skip it for:

Quick fixes with obvious solutions Easily reversible decisions Time-critical situations where the overhead isn't justified

Best practices

  • Frame as decisions - "What X should we use?" or "How should we Y?"

  • Be specific about constraints - Include performance, cost, or time requirements

  • Add your own hypotheses - Don't rely only on AI-generated options

  • Review verification failures - Failed hypotheses reveal hidden constraints

  • Document for future reference - DRRs become project knowledge


/fpf:status - Check Progress

Show current FPF phase, hypothesis counts, and any warnings about stale evidence.

  • Purpose - Understand current state of reasoning process

  • Output - Status table with phase, counts, and warnings

Arguments

None required.

How It Works

  1. Phase Detection: Identifies current ADI cycle phase (IDLE, ABDUCTION, DEDUCTION, INDUCTION, DECISION)

  2. Hypothesis Count: Reports counts per knowledge layer (L0, L1, L2, Invalid)

  3. Evidence Status: Lists evidence files and their freshness

  4. Warning Detection: Identifies stale evidence, orphaned hypotheses, or incomplete cycles

Usage Examples

Example Output:

Best practices

  • Check before continuing - Know your current phase before proceeding

  • Address warnings - Stale evidence affects trust scores

  • Review invalid hypotheses - Understand why they failed


/fpf:query - Search Knowledge Base

Search the FPF knowledge base for hypotheses, evidence, or decisions with assurance information.

  • Purpose - Find and review stored knowledge with trust scores

  • Output - Search results with layer, R_eff, and evidence counts

Arguments

Keyword to search for, specific hypothesis name, or "DRR" to list decisions.

How It Works

  1. Keyword Search: Searches hypothesis titles, descriptions, and evidence

  2. Hypothesis Details: Returns full hypothesis info including layer, kind, scope, and R_eff

  3. DRR Listing: Shows all Design Rationale Records with winner and rejected alternatives

Usage Examples

Example Output (keyword search):

Example Output (specific hypothesis):

Example Output (DRR listing):

Best practices

  • Search before starting new decisions - Reuse existing knowledge

  • Check R_eff scores - Higher scores indicate more reliable hypotheses

  • Review DRRs - Past decisions inform future choices


/fpf:decay - Manage Evidence Freshness

Check for stale evidence and choose how to handle it: refresh, deprecate, or waive.

  • Purpose - Maintain evidence validity over time

  • Output - Updated evidence status and trust scores

Evidence expires. A benchmark from six months ago might not reflect current performance. /fpf:decay shows you what's stale and gives you three options:

  • Refresh — Re-run tests to get fresh evidence

  • Deprecate — Downgrade the hypothesis if the decision needs rethinking

  • Waive — Accept the risk temporarily with documented rationale

Arguments

None required. Command is interactive.

How It Works

  1. Staleness Check: Identifies evidence files past their freshness threshold

  2. Options Presented: For each stale evidence:

    • Refresh: Re-run tests for fresh evidence

    • Deprecate: Downgrade hypothesis, flag decision for review

    • Waive: Accept risk temporarily with documented rationale

  3. Trust Recalculation: Updates R_eff scores based on evidence changes

Usage Examples

Best practices

  • Run periodically - Evidence expires; benchmarks from 6 months ago may not reflect current performance

  • Document waivers - Always include rationale and expiration date

  • Refresh critical evidence - High-impact decisions deserve fresh data


/fpf:actualize - Reconcile with Codebase

Update the knowledge base to reflect codebase changes that may affect existing hypotheses.

  • Purpose - Keep knowledge synchronized with implementation

  • Output - Updated hypothesis validity and evidence relevance

This command serves as the Observe phase of the FPF's Canonical Evolution Loop (B.4). It reconciles your documented knowledge with the current state of the codebase by:

  • Detecting Context Drift: Checks if project files (like package.json) have changed, potentially making your context.md stale.

  • Finding Stale Evidence: Finds evidence whose carrier_ref (the file it points to) has been modified in git.

  • Flagging Outdated Decisions: Identifies decisions whose underlying evidence chain has been impacted by recent code changes.

How It Works

  1. Change Detection: Identifies code changes since last actualization

  2. Impact Analysis: Determines which hypotheses and evidence are affected

  3. Validity Update: Marks affected hypotheses for re-verification if needed

  4. Report Generation: Summarizes changes and recommended actions

Usage Examples

Best practices

  • Run after major changes - Refactoring may invalidate previous assumptions

  • Review impact report - Some hypotheses may need re-evaluation

  • Update evidence - Changed code may need new benchmarks


/fpf:reset - Start Fresh

Archive the current session and return to IDLE state for a new reasoning cycle.

  • Purpose - Clear current state while preserving history

  • Output - Archived session in .fpf/sessions/

Arguments

None required. Command is interactive.

How It Works

  1. Reset Type Selection:

    • Soft Reset: Archive current session, start fresh (recommended)

    • Hard Reset: Delete all FPF data (cannot be undone)

    • Decision Reset: Keep hypotheses, re-evaluate from earlier phase

  2. Session Archive: Creates timestamped archive in .fpf/sessions/

  3. State Clear: Clears knowledge directories based on reset type

Usage Examples

When to Reset

Scenario
Recommended Action

Starting a new problem

Soft reset (archive)

Wrong direction, start over

Soft reset

Testing/learning FPF

Hard reset

Re-evaluate with new info

Decision reset

Context changed significantly

Soft reset + update context

Best practices

  • Prefer soft reset - Always preserve history for reference

  • Hard reset only for testing - Production knowledge is valuable

  • Decision reset for pivots - When new information changes the equation


Available Agents

Agent
Description
Used By

fpf-agent

FPF reasoning specialist for hypothesis generation, verification, validation, and trust calculus using the ADI cycle

All commands

fpf-agent

Purpose: Executes FPF reasoning tasks with file operations for persisting knowledge state.

Tools: Read, Write, Glob, Grep, Bash (mkdir, mv, touch)

Responsibilities:

  • Create hypothesis files in knowledge layers

  • Move files between L0/L1/L2/invalid directories

  • Create evidence files and audit reports

  • Generate Design Rationale Records (DRRs)

Key Concepts

Concept
Description

ADI Cycle

Abduction-Deduction-Induction reasoning loop

Knowledge Layers

L0 (Conjecture) -> L1 (Substantiated) -> L2 (Corroborated)

WLNK

Weakest Link principle: R_eff = min(evidence_scores)

Holon

Knowledge unit with identity, layer, kind, and assurance scores

DRR

Design Rationale Record documenting decisions

Transformer Mandate

AI generates options; humans decide

Knowledge Layers (Epistemic Status)

Layer
Name
Meaning
How to reach

L0

Conjecture

Unverified hypothesis

Generate hypotheses

L1

Substantiated

Passed logical check

Verify logic

L2

Corroborated

Empirically validated

Validate evidence

Invalid

Falsified

Failed verification

FAIL verdict

Congruence Levels

Level
Context Match
Penalty

CL3

Same (internal test)

None

CL2

Similar (related project)

Minor

CL1

Different (external docs)

Significant

The Transformer Mandate

A core FPF principle: A system cannot transform itself.

  • AI generates options with evidence

  • Human decides

  • Making architectural choices autonomously is a PROTOCOL VIOLATION

This ensures accountability and prevents AI from making unsupervised decisions.

Directory Structure

The FPF plugin creates and manages this directory structure:

When to Use FPF

Use it for:

  • Architectural decisions with long-term consequences

  • Multiple viable approaches requiring systematic evaluation

  • Decisions needing auditable reasoning trails

  • Building project knowledge over time

Skip it for:

  • Quick fixes with obvious solutions

  • Easily reversible decisions

  • Time-critical situations

Theoretical Foundation

Core Methodology

Supporting Concepts

  • Abduction-Deduction-Induction Cycle - Classical scientific reasoning methodology

  • Weakest Link Principle - Trust computation based on minimum evidence quality

  • Design Rationale - Documenting not just decisions but the reasoning behind them

Last updated