Software Design Document
Introduction
This SDD describes the architecture and design of Cleanroom Whisper’s MVP.
Guiding document: Principles
Architecture Overview
System Context
┌─────────────────────────────────────────────────────────────────┐
│ Pure Rust Application │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ System Tray │ │ Hotkeys │ │ Audio │ │
│ │ (tray-icon) │ │(global-hotkey│ │ (cpal) │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
│ └────────────────────┼─────────────────────┘ │
│ ▼ │
│ ┌──────────────────┐ │
│ │ Event Loop │ │
│ │ (main.rs) │ │
│ └────────┬─────────┘ │
│ │ │
│ ┌──────────────┼──────────────┐ │
│ ▼ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ whisper.rs │ │ db.rs │ │ Native │ │
│ │ (subprocess) │ │ (SQLite) │ │ Dialogs │ │
│ └──────┬───────┘ └──────────────┘ └──────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ whisper.cpp │ │
│ │(user-managed)│ │
│ └──────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Design Rationale
Decision |
Rationale |
|---|---|
Pure Rust (no WebView) |
Smaller binary, simpler deployment, air-gap friendly |
System tray UI |
Native look, minimal footprint, keyboard-first workflow |
tray-icon + global-hotkey |
Same crates Tauri uses, without the WebView overhead |
SQLite |
ACID transactions, single file backup |
Subprocess to whisper.cpp |
Simplicity, user controls version |
No trait abstractions |
Only concrete types — one implementation exists (YAGNI) |
File Structure
Per Principles: 5 Rust files, flat structure. No frontend.
cleanroom-whisper/
├── src/
│ ├── main.rs # Entry point, event loop, tray setup
│ ├── audio.rs # Record WAV from microphone
│ ├── whisper.rs # Subprocess to whisper.cpp
│ ├── db.rs # SQLite CRUD
│ └── tray.rs # Tray menu construction and handlers
├── Cargo.toml
├── vendor/ # Vendored dependencies (for air-gap builds)
└── .cargo/
└── config.toml # Points to vendor directory
Data Design
Database Schema
2 tables. No indexes. No FTS. No migrations.
CREATE TABLE transcriptions (
id TEXT PRIMARY KEY, -- UUID v4
created_at INTEGER NOT NULL, -- Unix timestamp (ms)
duration_seconds REAL, -- Audio duration
text TEXT NOT NULL, -- Transcription content
audio_path TEXT -- NULL if audio deleted
);
CREATE TABLE settings (
key TEXT PRIMARY KEY,
value TEXT NOT NULL
);
Settings Keys
Key |
Type |
Default |
|---|---|---|
|
string |
— |
|
string |
— |
|
string |
|
|
string |
|
|
string |
(system default) |
|
integer |
120 |
Settings Validation
Setting |
Validation Rules |
|---|---|
|
File exists, is executable |
|
File exists, has |
|
Valid key combination, no conflict with other app hotkeys |
|
Device exists in system (or empty for default) |
|
Integer 1-480 (exceeds NFR-WHISPER-023 minimum of 120 minutes to allow power-user configuration) |
Path validation: - Reject paths containing .. (path traversal) - Expand ~ to home directory - Convert to absolute path before storing
On validation failure: Show inline error message in settings dialog, prevent save until fixed.
File Storage
{APP_DATA_DIR}/
├── cleanroom-whisper.db # SQLite database
├── audio/ # Audio files
│ └── {uuid}.wav
└── temp/ # Recording in progress
└── recording.wav
Schema Migration Strategy
MVP approach: No migrations. Schema is simple and stable.
If schema changes are needed later: - Add columns with ALTER TABLE and defaults - Store schema version in settings table - Run migrations sequentially on startup if version < current - Never delete data during migration - Always add columns with NULL or default values
Data Retention Policy
Audio files:
Policy |
Behavior |
|---|---|
Default |
Keep audio indefinitely with transcription |
Manual delete |
User deletes from history dialog; sets |
Temp files |
Delete |
Transcriptions: - Retained indefinitely until user deletes - Deletion removes database row and associated audio file (if exists) - No automatic cleanup or archival
Database: - Single file, no automatic backup - User responsible for backup (copy cleanroom-whisper.db) - No size limits enforced (SQLite handles large datasets)
Cleanup on delete: Delete audio file if exists, then delete database row.
Component Design
audio.rs
Record WAV from microphone using cpal + hound.
pub struct AudioCapture {
stream: Option<cpal::Stream>,
is_recording: Arc<AtomicBool>,
start_time: Option<Instant>,
}
impl AudioCapture {
pub fn new(device_id: Option<&str>) -> Result<Self, AudioError>;
pub fn start(&mut self, output_path: &Path) -> Result<(), AudioError>;
pub fn stop(&mut self) -> Result<PathBuf, AudioError>;
pub fn elapsed(&self) -> Duration;
pub fn is_recording(&self) -> bool;
pub fn get_level(&self) -> f32;
}
pub fn list_input_devices() -> Result<Vec<AudioDevice>, AudioError>;
whisper.rs
Subprocess to whisper.cpp using simple function (no trait).
Core function: transcribe(whisper_binary, model_path, audio_path) -> Result<String>
Behavior: - Invokes whisper.cpp with --no-timestamps flag for clean text output - Parses stdout for transcription text - Returns error on non-zero exit code with stderr message
db.rs
SQLite CRUD operations.
pub struct Database {
conn: Connection,
}
impl Database {
pub fn open(path: &Path) -> Result<Self, DbError>;
pub fn initialize(&self) -> Result<(), DbError>;
pub fn save_transcription(&self, t: &Transcription) -> Result<(), DbError>;
pub fn list_transcriptions(&self) -> Result<Vec<Transcription>, DbError>;
pub fn get_transcription(&self, id: &str) -> Result<Option<Transcription>, DbError>;
pub fn delete_transcription(&self, id: &str) -> Result<(), DbError>;
pub fn get_setting(&self, key: &str) -> Result<Option<String>, DbError>;
pub fn set_setting(&self, key: &str, value: &str) -> Result<(), DbError>;
}
tray.rs
System tray menu and event handlers.
pub struct TrayApp {
tray: TrayIcon,
state: AppState,
}
pub enum AppState {
Idle,
Recording { start: Instant },
Transcribing,
}
impl TrayApp {
pub fn new() -> Result<Self, TrayError>;
pub fn update_icon(&mut self, state: &AppState);
pub fn rebuild_menu(&mut self, recent: &[Transcription]);
pub fn show_notification(&self, title: &str, body: &str);
}
// Menu items
pub fn build_menu(state: &AppState, recent: &[Transcription]) -> Menu {
// - Start/Stop Recording
// - separator
// - Recent transcriptions (click to copy)
// - separator
// - View History...
// - Settings...
// - Quit
}
Tray Icon States
State |
Icon |
Color |
|---|---|---|
Idle |
Microphone |
Default/gray |
Recording |
Microphone |
Red |
Transcribing |
Microphone |
Yellow/orange |
Icons should be simple, monochrome-friendly for macOS menu bar.
Interaction Flows
Record and Transcribe
User App whisper.cpp
│ │ │
│ Press hotkey │ │
│─────────────────────►│ │
│ │ Start cpal stream │
│ │ Update tray icon (red) │
│ Tray shows red icon │ │
│◄─────────────────────│ │
│ │ │
│ Press hotkey │ │
│─────────────────────►│ │
│ │ Stop stream, save WAV │
│ │ Update tray icon (busy) │
│ │ Spawn whisper.cpp │
│ │──────────────────────────►│
│ │ stdout text │
│ │◄──────────────────────────│
│ │ Save to SQLite │
│ │ Update tray menu │
│ │ Show notification │
│ Notification shown │ │
│◄─────────────────────│ │
Settings Dialog
Native dialog with the following layout:
┌─────────────────────────────────────────────────────┐
│ Settings [X] │
├─────────────────────────────────────────────────────┤
│ │
│ Whisper Binary │
│ ┌─────────────────────────────────────────┐ [Browse]│
│ │ /usr/local/bin/whisper │ │
│ └─────────────────────────────────────────┘ │
│ ✓ Valid executable │
│ │
│ Model File │
│ ┌─────────────────────────────────────────┐ [Browse]│
│ │ ~/models/ggml-base.en.bin │ │
│ └─────────────────────────────────────────┘ │
│ ✓ Valid model file │
│ │
│ ─────────────────────────────────────────────────── │
│ │
│ Hotkeys │
│ Toggle Recording [ Ctrl+Alt+R ] [Set] │
│ Copy Last [ Ctrl+Alt+C ] [Set] │
│ │
│ Audio Input │
│ Device [ System Default ▼] │
│ │
│ ─────────────────────────────────────────────────── │
│ │
│ [Cancel] [Save] │
└─────────────────────────────────────────────────────┘
Validation feedback:
Green checkmark: Valid path/setting
Red X with message: Invalid (e.g., “File not found”, “Not executable”)
Save button disabled until all required fields valid
Hotkey capture:
Click [Set] button, dialog shows “Press hotkey…”
User presses key combination
If conflicts with other app hotkey: “Conflicts with Toggle Recording”
If conflicts with system: “May conflict with system shortcut” (warning, not error)
First-Run Flow
┌───────────────────────────────────────────────────────┐
│ Welcome to Cleanroom Whisper │
├───────────────────────────────────────────────────────┤
│ │
│ To get started, configure your whisper.cpp paths: │
│ │
│ 1. Whisper Binary │
│ ┌─────────────────────────────────┐ [Browse] │
│ │ │ │
│ └─────────────────────────────────┘ │
│ ⚠ Required │
│ │
│ 2. Model File │
│ ┌─────────────────────────────────┐ [Browse] │
│ │ │ │
│ └─────────────────────────────────┘ │
│ ⚠ Required │
│ │
│ Need whisper.cpp? │
│ https://github.com/ggerganov/whisper.cpp │
│ (air-gap: see local docs for installation) │
│ │
│ [Continue] │
└───────────────────────────────────────────────────────┘
First-run detection:
Check if
whisper_binary_pathsetting exists and is validIf not, show first-run dialog before tray menu is active
[Continue] disabled until both paths valid
After setup, show “Ready! Press Ctrl+Alt+R to record.”
History Dialog
┌─────────────────────────────────────────────────────┐
│ Transcription History [X] │
├─────────────────────────────────────────────────────┤
│ ┌─────────────────────────────────────────────────┐ │
│ │ ● Today, 2:34 PM (1m 23s) │ │
│ │ Meeting notes about the quarterly review... │ │
│ ├─────────────────────────────────────────────────┤ │
│ │ ○ Today, 11:15 AM (45s) │ │
│ │ Remember to call the dentist tomorrow... │ │
│ ├─────────────────────────────────────────────────┤ │
│ │ ○ Yesterday, 4:30 PM (2m 10s) │ │
│ │ The project deadline has been moved to... │ │
│ └─────────────────────────────────────────────────┘ │
│ │
│ ─────────────────────────────────────────────────── │
│ Selected: Today, 2:34 PM │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Meeting notes about the quarterly review. │ │
│ │ Sales are up 15% compared to last quarter. │ │
│ │ Action items: update forecast, schedule... │ │
│ └─────────────────────────────────────────────────┘ │
│ │
│ [Copy Text] [Export .txt] [Delete] [Close] │
└─────────────────────────────────────────────────────┘
Behavior:
List shows all transcriptions, newest first
Click to select and show full text in preview pane
[Copy Text]: Copy to clipboard, show confirmation
[Export .txt]: Save dialog, default filename from timestamp
[Delete]: Confirm dialog, then remove transcription and audio file
Dependencies
8 crates. Pure Rust, no WebView, no npm, no frontend build.
See Principles (Current Minimal Set section) for the authoritative dependency list with versions.
Security & Privacy
Privacy by architecture: No network code exists in the application. Voice recordings and transcriptions never leave the user’s machine.
Threat |
Mitigation |
|---|---|
Data exfiltration |
No network crates in dependency tree |
Path traversal |
Reject paths containing |
Command injection |
No shell execution, explicit args only |
SQL injection |
Parameterized queries |
Deployment
Air-Gap Support
The application supports deployment on air-gapped systems (no internet access).
Architecture requirements:
Pure Rust, no npm, no frontend build step
Vendored dependencies via
cargo vendorOffline build:
cargo build --release --offline
See README.md Air-Gapped Installation for complete deployment procedures including whisper.cpp, Rust toolchain, and platform-specific packages.
Deployment Package
Typical deployment structure:
Application binary (~10-15 MB)
whisper.cpp binary (user-provided or bundled)
Whisper model file (user-provided or bundled)
Setup instructions
See README.md for detailed package creation.
Platform Packages
Platform |
Format |
Notes |
|---|---|---|
macOS |
.app bundle |
Code-signed and notarized |
Windows |
.exe / .msi |
Code-signed |
Linux |
.deb, .rpm, Flatpak |
GNOME requires AppIndicator extension |
Release Process
Version numbering: Semantic versioning (MAJOR.MINOR.PATCH)
MAJOR: Breaking changes
MINOR: New features, backward compatible
PATCH: Bug fixes
Release checklist:
[ ] Update version in Cargo.toml
[ ] Update CHANGELOG.md
[ ] Run all tests: cargo test
[ ] Build release: cargo build --release
[ ] Test on each platform (macOS, Windows, Linux)
[ ] Code sign (see platform-specific steps below)
[ ] Create GitHub release with binaries
[ ] Update download links on website
Code Signing
Platform |
Requirements |
Notes |
|---|---|---|
macOS |
Apple Developer account, codesign, notarization |
Required for distribution outside App Store |
Windows |
Code signing certificate, signtool |
Required to avoid security warnings |
Linux |
Optional GPG signing |
For package repositories only |
Distribution Channels
Channel |
Platform |
Notes |
|---|---|---|
GitHub Releases |
All |
Primary distribution, free |
Gumroad/Paddle |
All |
For paid distribution |
Homebrew |
macOS |
|
AUR |
Arch Linux |
Community maintained (future) |
GitHub Release structure:
cleanroom-whisper-v1.0.0/
├── cleanroom-whisper-v1.0.0-macos-arm64.dmg
├── cleanroom-whisper-v1.0.0-macos-x64.dmg
├── cleanroom-whisper-v1.0.0-windows-x64.msi
├── cleanroom-whisper-v1.0.0-linux-x64.tar.gz
├── cleanroom-whisper-v1.0.0-linux-x64.deb
├── CHANGELOG.md
└── SHA256SUMS.txt
Platform Considerations
Build Requirements
Platform |
Toolchain |
System Libraries |
|---|---|---|
macOS |
Xcode Command Line Tools |
None (uses CoreAudio) |
Windows |
Visual Studio Build Tools |
None (uses WASAPI) |
Linux |
|
|
All platforms require the Rust toolchain (rustc, cargo).
Audio Backends
Platform |
Backend |
Notes |
|---|---|---|
macOS |
CoreAudio |
System framework, no extra dependencies |
Windows |
WASAPI |
System API, no extra dependencies |
Linux |
ALSA |
Requires development headers at compile time |
The cpal crate abstracts these differences. Audio code is platform-agnostic.
System Tray Behavior
Platform |
Behavior |
Notes |
|---|---|---|
macOS |
Menu bar icon |
Native support |
Windows |
System tray icon |
Native support |
Linux (KDE, XFCE, etc.) |
System tray icon |
Native support via StatusNotifierItem |
Linux (GNOME) |
Requires extension |
Install AppIndicator/KStatusNotifierItem extension |
The tray-icon crate handles platform differences. Left-click and right-click behavior is consistent.
Global Hotkeys
Platform |
Support |
Notes |
|---|---|---|
macOS |
Full |
Native support |
Windows |
Full |
Native support |
Linux (X11) |
Full |
Native support |
Linux (Wayland) |
Limited |
Requires XWayland fallback |
Wayland limitation: Pure Wayland does not support global hotkey capture by design (security model). Most distributions run XWayland for compatibility, which enables hotkey support.
whisper.cpp
whisper.cpp must be compiled or obtained separately for each target platform:
Platform |
Build Command |
Notes |
|---|---|---|
macOS |
|
Uses Accelerate framework |
Windows |
CMake + MSVC |
Or use pre-built releases |
Linux |
|
Optional CUDA/OpenBLAS support |
For air-gapped deployment, include pre-compiled whisper.cpp binaries for each target platform.
Localization Strategy
MVP Approach
MVP is English-only. Localization is deferred until post-MVP based on user demand.
This aligns with Principles: ship working software first, add features based on actual need.
Localization-Ready Architecture
String externalization: Use a strings module instead of hardcoded strings throughout codebase.
UI text inventory: ~60 total strings across tray menu, settings, dialogs, notifications, and error messages. Small enough for manual translation.
Future Localization Approaches
Three options when localization is needed:
Compile-time: Separate binary per language (recommended for air-gap simplicity)
Runtime embedded: All languages in binary, select at runtime (single binary, larger size)
External files: Locale files in app data (flexible, but not air-gap friendly)
Localization Priority
Based on whisper.cpp model availability and likely user base:
Priority |
Language |
Code |
Notes |
|---|---|---|---|
1 |
English |
en |
Default, MVP |
2 |
German |
de |
Strong Rust/privacy community |
3 |
Japanese |
ja |
Whisper has good Japanese support |
4 |
Spanish |
es |
Large user base |
5 |
French |
fr |
Large user base |
6 |
Chinese (Simplified) |
zh-CN |
Large user base |
Note: Transcription language (whisper.cpp) is independent of UI language. Users can transcribe any language whisper.cpp supports regardless of UI locale.
What NOT to Localize
Log messages, settings keys, file paths, error codes (localize descriptions only), and technical documentation.