Skip to content

Feature: Native script support (Bash/Python/PowerShell) via Tree-sitter AST #2988

@sashwathsubra

Description

@sashwathsubra

Description

Current static analysis in capa relies heavily on regex for script files, which can be fragile and easily bypassed by basic obfuscation. I am proposing to extend capa’s capabilities to natively support scripting languages—specifically Bash, Python, and PowerShell.

By integrating Tree-sitter, we can leverage Abstract Syntax Tree (AST) analysis. This allows capa to identify malicious capabilities within scripts with the same structural precision it currently provides for compiled binaries (PE, ELF, .NET).

Proposed Change

  • Integrate Tree-sitter library for robust script parsing.
  • Implement AST-based feature extraction for Python, Bash, and PowerShell.
  • Enable rule matching against structural code elements rather than simple strings.

Additional Context

This approach will bridge the gap between binary and script analysis, making capa a more comprehensive tool for modern malware research where script-based loaders and droppers are prevalent.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesttree-sitterrelated to tree-sitter feature extraction

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions