RS

WAL journal in a Git client: making rebase atomic

Invalid Date min read

Git rebase can crash midway. Merge can be interrupted by a power outage. Cherry-pick can hang and leave the repository in detached HEAD. The standard git CLI handles this via .git/rebase-merge/ and .git/MERGE_HEAD, but for a desktop client this is not enough.

The problem

When a user clicks "Rebase" in the GUI, a chain of 5-15 git commands executes. If the process crashes between command 7 and 8, the repository is left in an inconsistent state. Git CLI survives: git rebase --abort. But a UI client must:

  1. Know the operation is incomplete
  2. Show the user what happened
  3. Offer to continue or rollback
  4. Guarantee no data loss

The solution — Write-Ahead Log

WAL is a pattern from the database world (SQLite, PostgreSQL). The idea: write the intention TO A JOURNAL before executing. If the process crashes — on next launch, read the journal and understand what was interrupted.

type WalEntry = {
  id: string;
  operation: "rebase" | "merge" | "cherry-pick" | "reset";
  steps: WalStep[];
  currentStep: number;
  status: "in_progress" | "completed" | "failed";
  rollbackInfo: RollbackData;
};

type WalStep = {
  command: string[];
  preState: GitState;
  postState?: GitState;
};

Operation lifecycle

  1. Write to WAL — before the first git command. Save: operation type, step list, current HEAD, working tree
  2. Execute steps — each step records postState after success
  3. Completion — mark the entry as completed, clean up
  4. Failure — entry stays in_progress. On next launch RecoveryManager picks it up

RecoveryManager

On application start — first thing, check the WAL:

class RecoveryManager {
  async checkOnStartup(): Promise<RecoveryAction | null> {
    const pending = await this.wal.getPendingEntries();
    if (pending.length === 0) return null;

    const entry = pending[0];
    const lastCompleted = entry.steps.findIndex(s => !s.postState);

    return {
      entry,
      lastCompletedStep: lastCompleted - 1,
      canContinue: this.isResumable(entry),
      canRollback: true,
    };
  }
}

The user sees a dialog: "Unfinished rebase operation detected (step 7 of 12). Continue or rollback?"

5 layers of protection

WAL is one of five layers. The full architecture:

  1. Auto-reflog — before any dangerous operation, save the ref to reflog with a label
  2. Auto-stash — if there are unsaved changes, stash before the operation, pop after
  3. WAL journal — atomic operation recording
  4. RecoveryManager — crash recovery
  5. Snapshot undo — UI level: every action can be undone via Ctrl+Z

Each layer works independently. Even if the WAL itself is corrupted — auto-reflog allows manual state recovery via git reflog.

Storage format

The WAL is stored as a JSON file in ~/.gitbor/wal/. One file per operation. Operation files older than 7 days are cleaned automatically. The format is human-readable — in an emergency it can be parsed manually.

Result

Over 4 months of testing — zero data loss during simulated crashes. Including:

  • Kill -9 of the process during rebase
  • Power cut (VM) during merge
  • Segfault in a native module during cherry-pick

Every time RecoveryManager correctly picked up and offered recovery.