Description
Current static analysis in capa relies heavily on regex for script files, which can be fragile and easily bypassed by basic obfuscation. I am proposing to extend capa’s capabilities to natively support scripting languages—specifically Bash, Python, and PowerShell.
By integrating Tree-sitter, we can leverage Abstract Syntax Tree (AST) analysis. This allows capa to identify malicious capabilities within scripts with the same structural precision it currently provides for compiled binaries (PE, ELF, .NET).
Proposed Change
- Integrate Tree-sitter library for robust script parsing.
- Implement AST-based feature extraction for Python, Bash, and PowerShell.
- Enable rule matching against structural code elements rather than simple strings.
Additional Context
This approach will bridge the gap between binary and script analysis, making capa a more comprehensive tool for modern malware research where script-based loaders and droppers are prevalent.
Description
Current static analysis in capa relies heavily on regex for script files, which can be fragile and easily bypassed by basic obfuscation. I am proposing to extend capa’s capabilities to natively support scripting languages—specifically Bash, Python, and PowerShell.
By integrating Tree-sitter, we can leverage Abstract Syntax Tree (AST) analysis. This allows capa to identify malicious capabilities within scripts with the same structural precision it currently provides for compiled binaries (PE, ELF, .NET).
Proposed Change
Additional Context
This approach will bridge the gap between binary and script analysis, making capa a more comprehensive tool for modern malware research where script-based loaders and droppers are prevalent.