Skip to content
View PrayerQX's full-sized avatar
  • China

Block or report PrayerQX

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
PrayerQX/README.md
PrayerQX banner

PrayerQX

Building OCR, document parsing, and PDF-to-Markdown workflows for search, RAG, and knowledge systems.

Python OCR Document Parsing PDF to Markdown RAG Computer Vision

About

I work on practical document AI pipelines:

  • OCR and document parsing for scanned and born-digital PDFs
  • PDF to Markdown conversion with layout, table, formula, and reading-order recovery
  • benchmark tooling for comparing document parsing models under unified scoring
  • computer vision projects that aim for usable outputs rather than demos only

Current Focus

  • Structure-preserving PDF parsing for downstream LLM workflows
  • Benchmarking document parsing models on OmniDocBench and MDPBench
  • Building local, reproducible pipelines for Markdown-first document recovery

Featured Projects

Benchmark and deployment toolkit for document parsing models, with unified outputs, official-rule evaluation, and reproducible lite/full benchmark workflows.

Tech: Python PaddleOCR Benchmarking Markdown OmniDocBench MDPBench

A PP-StructureV3-based PDF to Markdown project focused on preserving titles, tables, formulas, images, and reading order from complex documents.

Tech: Python PPStructureV3 OCR PDF Markdown

A practical computer vision project for garbage recognition and classification based on YOLOv5.

Tech: Python YOLOv5 OpenCV Computer Vision

What I Care About

  • outputs that can be used directly in RAG and knowledge workflows
  • reproducible local deployment, not just benchmark screenshots
  • structured results instead of plain text dumps
  • engineering tradeoffs: quality, speed, deployment cost, and controllability

GitHub Activity

GitHub streak

Contact

If you are working on OCR, document parsing, PDF processing, or Markdown-oriented document recovery, feel free to connect on GitHub.

Popular repositories Loading

  1. pyfi-paddleocrvl-eval pyfi-paddleocrvl-eval Public

    Python 2 1

  2. yolov5-garbage-classification yolov5-garbage-classification Public

    用于识别垃圾,进行分类

    Python

  3. PPStructureV3-PDF-to-Markdown PPStructureV3-PDF-to-Markdown Public

    一个基于 PPStructureV3 的 PDF 转 Markdown 项目,重点解决复杂文档中的标题层级丢失、表格结构破坏、公式识别不稳、图片与图表信息缺失、以及阅读顺序错乱等问题,输出更适合检索、RAG、知识库构建和二次处理的结构化 Markdown。

    Python

  4. PrayerQX PrayerQX Public

  5. claude-code-open claude-code-open Public

    Forked from Aidenwu0209/claude-code-open

    Unified Claude Code workspace with an optional local launcher

    TypeScript

  6. cc1.0-skill1.0 cc1.0-skill1.0 Public

    Python