Skip to content

feat(scraper): optional flag to respect .gitignore when scraping local directories #438

@pjgoodall

Description

@pjgoodall

When scraping a local directory tree with file://, there is no way to honour .gitignore. The only filters are includeHidden and user-supplied --exclude-pattern globs.

For the common case of indexing a tree of real git repositories, this means every non-hidden ignored path (node_modules/, dist/, build output, and any non-hidden secret listed in .gitignore) gets crawled unless manually re-enumerated as exclude patterns. That is both a noise problem and a safety one — a deliberately gitignored file is silently indexed and becomes searchable.

Request: an opt-in flag / config (e.g. --respect-gitignore, default off for backward compatibility) that skips gitignored paths during local directory traversal. Per-directory .gitignore cascade would cover the large majority of cases.

Opt-in because pure doc bundles often have no .gitignore, so defaulting off preserves current behaviour.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions