Skip to content

mrdulasolutions/docdeploy

Repository files navigation

DocDeploy

Enterprise document intelligence platform. Upload docs, PDFs, manuals, and notes — DocDeploy converts them to indexed Markdown and serves them via per-tenant MCP servers so AI agents can query your knowledge base.

How It Works

Upload → Convert to Markdown → Inject Section Markers → Index → Serve via MCP
  1. Upload any document (PDF, DOCX, PPTX, XLSX, HTML, TXT, CSV, JSON, XML)
  2. Convert to clean, structured Markdown with images preserved
  3. Index sections with navigational markers and full-text search
  4. Connect AI agents via MCP — they can search, browse, and read your docs

Architecture

┌─────────────────────────────────────────┐
│  apps/app        Next.js dashboard      │  ← Vercel
│  apps/web        Marketing site         │  ← Vercel
├─────────────────────────────────────────┤
│  packages/lib         Converter pipeline│
│  packages/db          Drizzle + Neon    │
│  packages/workers     Background jobs   │  ← Render
│  packages/mcp-server  MCP per tenant    │  ← Render
└─────────────────────────────────────────┘

Tech Stack

Layer Technology
Frontend Next.js 16, Tailwind CSS, Clerk Auth
Database PostgreSQL (Neon), Drizzle ORM
Storage Cloudflare R2
Queue pg-boss
MCP @modelcontextprotocol/sdk (Streamable HTTP)
Hosting Vercel (frontend), Render (workers + MCP)

MCP Tools

Each tenant's MCP server exposes 5 tools:

Tool Description
get_manifest Full document index — start here
search_documents Full-text search across all docs and sections
get_document Load a full document with section markers
get_section Load a specific section by marker
list_documents List all available documents

Document Processing Pipeline

  • PDF — pdf-parse v2 with structure inference (chapters, numbered sections, all-caps headings)
  • DOCX — mammoth with image extraction to R2
  • PPTX — adm-zip XML parsing with slide image extraction
  • XLSX — SheetJS with multi-sheet markdown table output
  • HTML — Turndown with table preservation and nav/footer stripping
  • CSV/TSV — SheetJS for robust quoted field handling
  • TXT — Structure inference for unstructured text
  • JSON/XML — Formatted code block wrapping

Monorepo Structure

docdeploy/
├── apps/
│   ├── app/              # Dashboard (Next.js)
│   └── web/              # Marketing site (Next.js)
├── packages/
│   ├── db/               # Schema + Drizzle config
│   ├── lib/              # Converter, markers, R2, token counter
│   ├── mcp-server/       # MCP server (stdio + HTTP)
│   └── workers/          # Background document processor
├── render.yaml           # Render blueprint
├── turbo.json            # Turborepo config
└── package.json          # Workspace root

Development

# Install dependencies
npm install

# Run the dashboard locally
npm run dev:app

# Run the marketing site
npm run dev:web

Environment Variables

See .env.example for the full list. Required:

  • DATABASE_URL — Neon PostgreSQL connection string
  • NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY / CLERK_SECRET_KEY — Clerk auth
  • R2_ACCOUNT_ID / R2_ACCESS_KEY_ID / R2_SECRET_ACCESS_KEY — Cloudflare R2
  • R2_BUCKET_NAME / R2_PUBLIC_URL — R2 bucket config
  • MCP_API_KEY — Per-tenant MCP server auth (Render only)

License

Proprietary — Copyright (c) 2026 MR Dula Enterprise, LLC. All rights reserved.

Contact: [email protected]

Releases

No releases published

Packages

 
 
 

Contributors