Skip to content

mbailey/sqldown

Repository files navigation

☠️ SQLDown - pre-alpha - USE AT OWN RISK

PyPI version Python versions CI/CD

Bidirectional markdown ↔ SQLite conversion - Load markdown files into SQLite, query with SQL, and export back to markdown.

Features

☠️ Pre-Alpha Software (v0.1.0) - USE AT YOUR OWN RISK:

  • Dynamic schema generation from YAML frontmatter and markdown structure
  • Column limit protection with intelligent section extraction
  • Import markdown collections into queryable SQLite databases
  • Export database rows back to markdown files
  • Watch mode for auto-refresh on file changes
  • Gitignore-aware file filtering
  • Smart change detection (skip unchanged files)

Installation

# Install from PyPI
pip install sqldown

# Or use uv for faster installation
uv pip install sqldown

Quick Start

# Load markdown files into SQLite
sqldown load ~/tasks

# Query with sqlite3
sqlite3 sqldown.db "SELECT * FROM docs WHERE status='active'"

# Export back to markdown
sqldown dump -d sqldown.db -o ~/restored

# Get database info
sqldown info

# Show table details
sqldown info -t docs

Load Command

sqldown load PATH [OPTIONS]

What it does:

  • Scans markdown files recursively
  • Parses YAML frontmatter → database columns
  • Extracts H2 sections → section_* columns
  • Creates schema dynamically based on discovered fields
  • Upserts into SQLite (idempotent - safe to run multiple times)
  • Respects .gitignore patterns automatically

Options:

  • -d, --db PATH - Database file (default: sqldown.db)
  • -t, --table NAME - Table name (default: docs)
  • -p, --pattern GLOB - File pattern (default: **/*.md)
  • -m, --max-columns N - Maximum allowed columns (default: 1800, SQLite limit: 2000)
  • -n, --top-sections N - Extract only top N most common sections (default: 20, 0=all)
  • -w, --watch - Watch for file changes and auto-update
  • -v, --verbose - Show detailed progress

Dump Command

sqldown dump -d DATABASE -o OUTPUT_DIR [OPTIONS]

What it does:

  • Exports database rows back to markdown files
  • Reconstructs original markdown structure with frontmatter
  • Preserves file paths from original import
  • Skips unchanged files (smart change detection)
  • Supports SQL filtering to export subsets

Options:

  • -d, --db PATH - Database file (required)
  • -t, --table NAME - Table name (default: docs)
  • -o, --output PATH - Output directory (required)
  • -f, --filter WHERE - SQL WHERE clause to filter rows
  • --force - Always write files, even if unchanged
  • -n, --dry-run - Preview what would be exported without writing
  • -v, --verbose - Show detailed progress

Examples:

# Export all documents
sqldown dump -d cache.db -o ~/restored

# Export only active tasks
sqldown dump -d cache.db -t tasks -o ~/active --filter "status='active'"

# Preview export without writing files
sqldown dump -d cache.db -o ~/export --dry-run

Info Command

sqldown info [OPTIONS]

What it does:

  • Shows database statistics and table information
  • Lists all tables with document counts
  • Displays column breakdown (frontmatter vs sections)
  • Provides schema details for specific tables

Options:

  • -d, --db PATH - Database file (default: sqldown.db if exists)
  • -t, --table NAME - Show detailed info for specific table

Examples:

# Show database overview (uses sqldown.db if present)
sqldown info

# Show info for specific database
sqldown info -d cache.db

# Show detailed table information
sqldown info -t tasks

Using SQLite3

Once imported, use sqlite3 directly for all queries:

# List tables
sqlite3 cache.db ".tables"

# Show schema
sqlite3 cache.db ".schema tasks"

# Query
sqlite3 cache.db "SELECT title, status FROM tasks WHERE status='active' LIMIT 10"

# Aggregate
sqlite3 cache.db "SELECT status, COUNT(*) FROM tasks GROUP BY status"

# Complex queries
sqlite3 cache.db "
  SELECT project, COUNT(*) as active_count
  FROM tasks
  WHERE status='active'
  GROUP BY project
  ORDER BY active_count DESC
"

# Export to CSV
sqlite3 -csv cache.db "SELECT * FROM tasks WHERE status='active'" > active.csv

# Interactive mode
sqlite3 cache.db

Dynamic Schema Example

From this markdown:

---
status: active
project: agents
priority: high
---

# Add SQLite caching

## Objective
Create a cache layer...

## Implementation Plan
1. Parser
2. Schema

Creates these columns automatically:

  • Core: _id, _path, _sections, title, body, lead, file_modified
  • Frontmatter: status, project, priority
  • Sections: section_objective, section_implementation_plan

Real example: 87 tasks → 181 columns (no schema design needed!)

Column Limit Protection

SQLite has a hard limit of 2000 columns per table. With diverse markdown documents, you can easily hit this limit:

Problem: 5,225 tasks with diverse sections = 6,694 columns (💥 exceeds limit!)

Solution: Use --top-sections to extract only the most common sections:

# Extract only top 20 most common sections (default)
sqldown load ~/tasks -d cache.db --top-sections 20

# Result: 5,225 tasks → 116 columns ✅
# - 7 base columns (_id, _path, title, body, lead, _sections, file_modified)
# - 89 frontmatter columns (status, project, type, priority, etc.)
# - 20 section columns (overview, usage, objective, notes, next_steps, etc.)

What happens to other sections?

  • All content is preserved in the body field
  • You can still search across all sections using SQLite FTS5
  • Only the top N sections become queryable columns

Column limit validation:

# Check if your documents will fit before importing
sqldown load ~/docs -d test.db --verbose

# Output shows breakdown:
# 📊 Column breakdown:
#   - Base columns: 7
#   - Frontmatter columns: 89
#   - Section columns: 20
#   - Total: 116 (limit: 1800)

When approaching limit (>90%):

  • Shows warning but continues import
  • Consider: reducing --top-sections or increasing --max-columns

When exceeding limit:

  • Stops before import to prevent database corruption
  • Shows breakdown and suggestions

Multiple Collections

One database, multiple tables:

# Import different document types
sqldown load ~/tasks -d cache.db -t tasks
sqldown load ~/notes -d cache.db -t notes
sqldown load ~/.claude/skills -d cache.db -t skills

# Query them
sqlite3 cache.db "SELECT * FROM tasks WHERE status='active'"
sqlite3 cache.db "SELECT * FROM notes WHERE tags LIKE '%sqlite%'"

# Join across tables
sqlite3 cache.db "
  SELECT t.title as task, n.title as note
  FROM tasks t
  JOIN notes n ON n.tags LIKE '%' || t.project || '%'
  WHERE t.status='active'
"

Refresh Strategy

One-time import:

Import is idempotent - just run it again:

# Add this to cron or a git hook
sqldown load ~/tasks -d cache.db -t tasks

Watch mode (auto-refresh):

Use the --watch / -w flag to automatically update the cache when files change:

# Watch mode: import once, then auto-update on file changes
sqldown load ~/tasks -d cache.db -t tasks --watch

# Output:
# ✅ Imported 87 documents into cache.db:tasks
# 📋 Schema has 181 columns
#
# 👀 Watching /Users/admin/tasks for changes... (Ctrl-C to stop)
# [2025-01-15 10:23:45] Updated: AG-22_feat_add-configuration/README.md
# [2025-01-15 10:24:12] Added: AG-31_feat_new-feature/README.md

Watch mode is ideal for development workflows where you want the cache to stay in sync with your files.

Common Queries

# Active tasks
sqlite3 cache.db "SELECT title FROM tasks WHERE status='active'"

# Recent updates
sqlite3 cache.db "SELECT title, updated FROM tasks ORDER BY updated DESC LIMIT 10"

# By project
sqlite3 cache.db "SELECT project, COUNT(*) FROM tasks GROUP BY project"

# Search content
sqlite3 cache.db "SELECT title FROM tasks WHERE body LIKE '%cache%'"

# High priority incomplete
sqlite3 cache.db "SELECT title FROM tasks WHERE priority='high' AND status != 'completed'"

Philosophy

SQLDown follows the Unix philosophy: do one thing well.

  • Load: SQLDown handles the complex markdown → SQLite conversion
  • Query: sqlite3 provides perfect SQL interface (no wrapper needed)
  • Dump: SQLDown reconstructs markdown from database rows

Why not wrap sqlite3? Because it's already perfect for queries:

  • Full SQL power without wrapper limitations
  • Standard tool with excellent documentation
  • Multiple output formats (CSV, JSON, column, etc.)
  • Interactive shell with history and completion
  • Zero overhead for read operations

Requirements

  • Python 3.8+ (includes sqlite3 module)
  • sqlite3 CLI (built-in on macOS/Linux)

Python Dependencies (installed automatically):

  • click >= 8.0 - CLI framework
  • sqlite-utils >= 3.30 - SQLite schema management
  • PyYAML >= 6.0 - YAML frontmatter parsing
  • pathspec >= 0.11 - Gitignore pattern support
  • watchdog >= 3.0 - File system monitoring

Integration

SQLDown is designed for both human and AI use:

For Developers:

  • Simple CLI with sensible defaults
  • Watch mode for development workflows
  • Smart change detection saves time
  • Direct sqlite3 access for queries

For AI Assistants:

  • Efficient token usage via SQL queries
  • Progressive disclosure pattern
  • Query metadata first, read files only when needed
  • Structured data extraction from markdown

Contributing

Contributions welcome! Please check out the issues on GitHub.

License

MIT

Changelog

v0.1.0 (2025-01-14)

Initial PyPI Release 🎉

  • Full bidirectional markdown ↔ SQLite conversion
  • Dynamic schema generation from YAML frontmatter
  • Intelligent column limit protection (SQLite 2000 column limit)
  • Top-N section extraction for diverse document collections
  • Watch mode for automatic file sync
  • Smart change detection on export
  • Comprehensive CLI with load, dump, and info commands
  • Python 3.8+ support

Development History

See SPECIFICATION.md for the complete design. See REVIEW.md for architectural decisions and trade-offs.

About

SQLite is the new markdown. Bidirectional markdown-sqlite converter - query your docs with SQL, export results as markdown.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors