Gap Analysis: Cleanroom Whisper Deployment

This document analyzes the gaps between AirGap Deploy’s current design and the requirements of its primary use case: packaging Cleanroom Whisper for air-gapped deployment.

Current Design Coverage

Requirement	Component	Status	Notes
Package Rust source + vendored deps	`RustAppComponent`	✅ Supported	`vendor = true`
Include Rust toolchain installer	`RustAppComponent`	✅ Supported	`include_toolchain = true`
Package whisper.cpp source	`ExternalBinaryComponent`	✅ Supported	Git clone
Download Whisper models	`ModelFileComponent`	✅ Supported	With checksums
Include ALSA packages (Linux)	`SystemPackageComponent`	⚠️ Partial	Exists but not in example
Multi-platform packages	Platform abstraction	⚠️ Deferred	Cross-compilation in v0.2
Installation script generation	Template system	✅ Supported	Bash/PowerShell
Post-install configuration	Install steps	✅ Supported	Via `[install.config]` section
Interactive installation	Install scripts	✅ Supported	Via `mode = "interactive"`
Multiple model selection	Manifest	⚠️ Unclear	Can add multiple components, but UX?

Critical Gaps Identified

Gap 1: Post-Installation Configuration

Problem: Cleanroom Whisper needs to know where whisper.cpp and models are installed.

Current Plan: No mechanism for post-install configuration.

Solution:

[install.config]
config_file = "~/.config/cleanroom-whisper/config.toml"
config_template = """
# Cleanroom Whisper auto-discovers binary and models from this path
whisper_path = "{{ install_prefix }}"
"""

Install script should:

Build and install whisper.cpp to known location Copy all models to known location ({{ install_prefix }}/share/cleanroom-whisper/models/) Generate config file with install prefix Install Cleanroom Whisper binary

Cleanroom Whisper runtime auto-discovery:

Binary: Search whisper_path/bin/ for whisper-main, main, whisper-cli, etc.
Models: Scan whisper_path/share/cleanroom-whisper/models/*.bin
No need for explicit paths in config

Gap 2: Multiple Model Support

Problem: Users may want different model sizes (tiny, base, small, medium, large).

Current Plan: Can list multiple [[components]] of type model-file, but:

All models included = large package (3+ GB)
No way to make models optional/selectable

Options:

Option A: Multiple Manifests

# Developer creates multiple packages
airgap-deploy prep --manifest AirGapDeploy.base.toml   # Just base.en (140MB)
airgap-deploy prep --manifest AirGapDeploy.full.toml   # All models (3GB)

Option B: Component Selection at Prep Time

[[components]]
type = "model-file"
name = "base.en"
url = "..."
required = true  # Always included

[[components]]
type = "model-file"
name = "small.en"
url = "..."
required = false  # Optional, include with --include small.en

airgap-deploy prep --include small.en --include medium.en

Option C: Interactive Installation

# Install script prompts:
# "Which models do you want to install?"
# [x] base.en (140MB) - Recommended
# [ ] small.en (460MB)
# [ ] medium.en (1.5GB)

Gap 3: Cross-Platform Packaging

Problem: Developer on macOS wants to create packages for Linux and Windows.

Current Plan: Deferred to v0.2 (cross-compilation).

Impact: Developer must:

Run AirGap Deploy on each target platform, OR
Use CI/CD with multiple platform runners, OR
Wait for v0.2

Recommendation: This is acceptable for v0.1, use GitHub Actions matrix builds.

Gap 4: Installation Locations & Permissions

Problem: Where do things get installed?

Current Plan:

[install]
install_to = "user"  # or "system"

Questions:

User install: ~/.local/bin (Linux/macOS), %LOCALAPPDATA%\Programs (Windows)?
System install: /usr/local/bin (needs sudo)?
Models: ~/.local/share/cleanroom-whisper/models or /usr/share/cleanroom-whisper/models?
Config: ~/.config/cleanroom-whisper/config.toml or /etc/cleanroom-whisper/config.toml?

Needed: Platform-specific path resolution in install scripts.

Gap 5: Dependency Verification

Problem: Install script should verify dependencies before building.

Current Plan: Mentioned in Phase 4 (“Dependency checking”), but not detailed.

Needed:

# Generated install script should check:
- Rust toolchain (or install from included installer)
- C compiler (gcc/clang/MSVC) for whisper.cpp
- make (for whisper.cpp build)
- ALSA headers (on Linux, from included .deb/.rpm)
- Sufficient disk space

Use Case Matrix

Use Case 1: Developer Creating Release (Primary)

Actor: Cleanroom Whisper maintainer Environment: macOS laptop with internet Goal: Create release packages for Linux, macOS, Windows

Workflow:

Update AirGapDeploy.toml with new version Run CI/CD that executes on Linux, macOS, Windows runners:

- name: Package for air-gap
  run: airgap-deploy prep --target ${{ matrix.platform }} --output dist/

Upload artifacts to GitHub releases Users download pre-built packages

Current Plan Support: ✅ Fully supported (with GitHub Actions)

Use Case 2: End User Installing on Air-Gapped System (Primary)

Actor: Security researcher on air-gapped workstation Environment: Ubuntu 22.04 with no internet, ALSA installed Goal: Install and run Cleanroom Whisper

Workflow:

Download cleanroom-whisper-linux-x86_64.tar.gz via USB Extract: tar -xzf cleanroom-whisper-linux-x86_64.tar.gz Run: cd cleanroom-whisper-linux-x86_64 && ./install.sh Install script:

Checks Rust (installs from included installer if missing)

Checks ALSA (installs from included .deb if missing)

Builds whisper.cpp

Builds cleanroom-whisper

Installs to ~/.local/bin

Generates ~/.config/cleanroom-whisper/config.toml

Run: cleanroom-whisper

Current Plan Support: ⚠️ Mostly supported, gaps in config generation

Use Case 3: Advanced User Custom Build (Secondary)

Actor: Developer customizing Cleanroom Whisper Environment: Arch Linux with internet Goal: Create custom package with specific models

Workflow:

Clone cleanroom-whisper repo Edit AirGapDeploy.toml to include only desired models Run: airgap-deploy prep --target linux-x86_64 Transfer to air-gapped system Install as normal

Current Plan Support: ✅ Fully supported

Use Case 4: Enterprise Deployment (Future)

Actor: IT admin deploying to 100 air-gapped workstations Environment: Mixed Windows/Linux fleet Goal: Automated installation without interaction

Workflow:

Download pre-built packages Create deployment script:

# Unattended install
./install.sh --non-interactive --prefix /opt/cleanroom-whisper

Deploy via configuration management (Ansible, GPO, etc.)

Current Plan Support: ⚠️ Partially supported — Automatic Installation Mode (FR-DEPLOY-068) provides automatic (unattended) installation mode via MODE=automatic

Architectural Recommendations

Recommendation 1: Add Post-Install Configuration with Auto-Discovery

Extend Manifest:

[install]
method = "build-from-source"
install_to = "user"  # or "system"

# Simple post-install configuration - let app auto-discover details
[install.config]
config_file = "~/.config/cleanroom-whisper/config.toml"
config_template = """
# Cleanroom Whisper auto-discovers binary and models from this path
whisper_path = "{{ install_prefix }}"

[audio]
sample_rate = 16000
channels = 1
"""

# Custom installation steps
[install.steps]
whisper_cpp = [
    "cd whisper.cpp",
    "make",
    "mkdir -p {{ install_prefix }}/bin",
    "cp main {{ install_prefix }}/bin/whisper-main"
]
models = [
    "mkdir -p {{ install_prefix }}/share/cleanroom-whisper/models",
    "cp models/*.bin {{ install_prefix }}/share/cleanroom-whisper/models/"
]
cleanroom_whisper = [
    "cd cleanroom-whisper",
    "cargo build --release --offline",
    "cp target/release/cleanroom-whisper {{ install_prefix }}/bin/"
]

Cleanroom Whisper Auto-Discovery:

Discovers whisper binary by searching whisper_path/bin/ for known names
Discovers all models by scanning whisper_path/share/cleanroom-whisper/models/*.bin
No explicit paths needed in config, improving UX

Implementation: Phase 4 (Install Script Generation)

Recommendation 2: Optional Components

Extend Component Definition:

[[components]]
type = "model-file"
name = "base.en"
url = "https://huggingface.co/..."
checksum = "sha256:..."
required = true  # Always included
default = true

[[components]]
type = "model-file"
name = "small.en"
url = "https://huggingface.co/..."
checksum = "sha256:..."
required = false  # Optional
default = false

CLI:

# Include optional components
airgap-deploy prep --include small.en --include medium.en

# Or use interactive mode
airgap-deploy prep --interactive
# Prompts: "Include small.en (460MB)? [y/N]"

Implementation: Phase 2 (Component System)

Recommendation 3: Installation Modes

Extend Install Configuration:

[install]
method = "build-from-source"
install_to = "user"
mode = "interactive"  # or "automatic"

# Interactive prompts
[install.prompts]
install_location = "Where should Cleanroom Whisper be installed?"
install_location_default = "~/.local"
install_system_wide = "Install system-wide (requires sudo)?"
install_system_wide_default = false

Generated Install Script:

#!/bin/bash
set -e

# Installation mode
MODE="${MODE:-interactive}"

if [ "$MODE" = "interactive" ]; then
    read -p "Where should Cleanroom Whisper be installed? [~/.local]: " INSTALL_PREFIX
    INSTALL_PREFIX="${INSTALL_PREFIX:-$HOME/.local}"
else
    INSTALL_PREFIX="${INSTALL_PREFIX:-$HOME/.local}"
fi

# Non-interactive mode for enterprise
# ./install.sh MODE=automatic INSTALL_PREFIX=/opt/cleanroom-whisper

Implementation: Phase 4 (Install Script Generation)

Recommendation 4: Dependency Verification

Install Script Should Check:

#!/bin/bash
set -e

echo "=== Cleanroom Whisper Installation ==="
echo

# Check for required tools
echo "Checking dependencies..."

# Check Rust
if ! command -v rustc &> /dev/null; then
    echo "  Installing Rust toolchain..."
    cd rust-installer && ./install.sh --prefix=$INSTALL_PREFIX
fi

# Check C compiler (for whisper.cpp)
if ! command -v gcc &> /dev/null && ! command -v clang &> /dev/null; then
    echo "ERROR: C compiler not found. Please install gcc or clang."
    exit 1
fi

# Check make
if ! command -v make &> /dev/null; then
    echo "ERROR: make not found. Please install make."
    exit 1
fi

# Linux: Check ALSA
if [ "$(uname)" = "Linux" ]; then
    if ! ldconfig -p | grep -q libasound; then
        echo "  Installing ALSA libraries..."
        # Install from included .deb/.rpm
    fi
fi

# Check disk space
REQUIRED_SPACE=500000  # 500MB in KB
AVAILABLE_SPACE=$(df "$INSTALL_PREFIX" | tail -1 | awk '{print $4}')
if [ "$AVAILABLE_SPACE" -lt "$REQUIRED_SPACE" ]; then
    echo "ERROR: Insufficient disk space. Need 500MB, have $(($AVAILABLE_SPACE/1024))MB"
    exit 1
fi

echo "All dependencies satisfied."
echo

Implementation: Phase 4 (Install Script Generation)

Phase Priority Adjustments

Given the gaps, we recommend adjusting the MVP scope:

Current MVP (v1.0)

✅ Phase 1: Core infrastructure
✅ Phase 2: Built-in components (RustApp, ExternalBinary, ModelFile)
✅ Phase 3: Packaging
✅ Phase 4: Install scripts (basic)
✅ Phase 5: Basic CLI
❌ Phase 6: Partial tests/docs
❌ Phase 7: Plugin system (skip)

Recommended MVP (v1.0)

✅ Phase 1: Core infrastructure
✅ Phase 2: Built-in components + optional components
✅ Phase 3: Packaging
✅ Phase 4: Install scripts + config generation + dependency checks + installation modes
✅ Phase 5: Basic CLI + –include flag
❌ Phase 6: Partial tests/docs
❌ Phase 7: Plugin system (skip)
❌ SystemPackageComponent (defer to v0.2)

Rationale: Post-install configuration and dependency checking are critical for the Cleanroom Whisper use case to work smoothly.

Example: Complete Cleanroom Whisper Manifest

[package]
name = "cleanroom-whisper"
version = "0.1.0"
description = "Offline audio transcription"

[targets]
platforms = ["linux-x86_64", "macos-aarch64", "windows-x86_64"]
default = "linux-x86_64"

# Rust application
[[components]]
type = "rust-app"
source = "."
vendor = true
include_toolchain = true
prebuild = false  # Build on target system

# whisper.cpp dependency
[[components]]
type = "external-binary"
name = "whisper.cpp"
repo = "https://github.com/ggerganov/whisper.cpp.git"
branch = "master"
build_instructions = "make"

# Models (base is required, others optional)
[[components]]
type = "model-file"
name = "base.en"
url = "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin"
checksum = "sha256:..."
size = "140MB"
required = true
default = true

[[components]]
type = "model-file"
name = "small.en"
url = "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.en.bin"
checksum = "sha256:..."
size = "460MB"
required = false
default = false

[[components]]
type = "model-file"
name = "medium.en"
url = "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-medium.en.bin"
checksum = "sha256:..."
size = "1.5GB"
required = false
default = false

# Installation configuration
[install]
method = "build-from-source"
install_to = "user"  # ~/.local on Linux/macOS, %LOCALAPPDATA% on Windows
mode = "interactive"

# Post-install configuration
# Cleanroom Whisper will automatically discover models and binary from whisper_path
[install.config]
config_file = "~/.config/cleanroom-whisper/config.toml"
config_template = """
# Cleanroom Whisper looks for whisper.cpp installation here
# The tool will automatically discover:
#   - Binary: <whisper_path>/bin/whisper-main (or main, whisper-cli)
#   - Models: <whisper_path>/share/cleanroom-whisper/models/*.bin
whisper_path = "{{ install_prefix }}"

[audio]
sample_rate = 16000
channels = 1

[hotkeys]
record = "Ctrl+Alt+R"
copy_last = "Ctrl+Alt+C"
"""

# Custom installation steps
[install.steps]
whisper_cpp = [
    "cd whisper.cpp",
    "make",
    "mkdir -p {{ install_prefix }}/bin",
    "cp main {{ install_prefix }}/bin/whisper-main",
]

models = [
    "mkdir -p {{ install_prefix }}/share/cleanroom-whisper/models",
    "cp models/*.bin {{ install_prefix }}/share/cleanroom-whisper/models/",
]

cleanroom_whisper = [
    "cd cleanroom-whisper",
    "cargo build --release --offline",
    "mkdir -p {{ install_prefix }}/bin",
    "cp target/release/cleanroom-whisper {{ install_prefix }}/bin/",
]

config = [
    "mkdir -p ~/.config/cleanroom-whisper",
    "# Config file already generated by template",
]

# Dependency verification
[install.dependencies]
rust = { required = true, install_if_missing = true }
gcc = { required = true, install_if_missing = false }
make = { required = true, install_if_missing = false }

[install.dependencies.linux]
alsa = { required = true, install_if_missing = true, packages = ["libasound2-dev"] }

Cleanroom Whisper Runtime Behavior

With this simpler configuration, Cleanroom Whisper’s runtime logic:

// src/whisper.rs - Cleanroom Whisper code

pub struct WhisperConfig {
    pub whisper_path: PathBuf,
    // Binary and models auto-discovered
}

impl WhisperConfig {
    pub fn from_config_file() -> Result<Self> {
        let config = read_config("~/.config/cleanroom-whisper/config.toml")?;
        Ok(Self {
            whisper_path: config.whisper_path,
        })
    }

    /// Auto-discover whisper binary
    pub fn binary_path(&self) -> Result<PathBuf> {
        // Try common binary names in order
        for name in ["whisper-main", "main", "whisper-cli", "whisper"] {
            let path = self.whisper_path.join("bin").join(name);
            if path.exists() {
                return Ok(path);
            }
        }
        Err(Error::WhisperBinaryNotFound)
    }

    /// Auto-discover all available models
    pub fn available_models(&self) -> Result<Vec<ModelInfo>> {
        let models_dir = self.whisper_path.join("share/cleanroom-whisper/models");
        let mut models = Vec::new();

        for entry in std::fs::read_dir(models_dir)? {
            let entry = entry?;
            let path = entry.path();

            // Find all .bin files
            if path.extension() == Some(OsStr::new("bin")) {
                let name = path.file_stem()
                    .and_then(|s| s.to_str())
                    .ok_or(Error::InvalidModelName)?;

                models.push(ModelInfo {
                    name: name.to_string(),
                    path: path.clone(),
                    size: std::fs::metadata(&path)?.len(),
                });
            }
        }

        Ok(models)
    }

    /// Get default model (first available, or user-specified)
    pub fn default_model(&self) -> Result<PathBuf> {
        let models = self.available_models()?;
        models.first()
            .map(|m| m.path.clone())
            .ok_or(Error::NoModelsFound)
    }
}

Benefits:

User only specifies one path: whisper_path
All models in models directory are automatically available
No need to update config when adding new models
Binary name detection handles different whisper.cpp versions
Simpler mental model for users

Summary

Does Current Plan Support Cleanroom Whisper?

Yes, but with critical gaps:

✅ Supported:

Packaging Rust app with vendored dependencies
Including Rust toolchain
Packaging external binaries (whisper.cpp)
Downloading models with verification
Generating installation scripts
Multi-platform targeting (with CI/CD)

⚠️ Remaining Gaps:

Cross-platform packaging from single system (deferred)

Recommended Actions

Immediate (Phase 1-2): Add optional component support to manifest schema Phase 4 Enhancement: Implement config generation, dependency checks, installation modes Documentation: Create complete Cleanroom Whisper example in examples/cleanroom-whisper/ Testing: Validate on actual air-gapped VMs before v1.0 release

The foundation is solid, but these enhancements are needed for a smooth Cleanroom Whisper deployment experience.