Skip to content

feat: Support GDB index#1990

Merged
lapla-cogito merged 32 commits into
wild-linker:mainfrom
lapla-cogito:gdb_idx
Jun 12, 2026
Merged

feat: Support GDB index#1990
lapla-cogito merged 32 commits into
wild-linker:mainfrom
lapla-cogito:gdb_idx

Conversation

@lapla-cogito

Copy link
Copy Markdown
Member

This patch adds support for --gdb-index and --no-gdb-index. Although we support version 9 here, the test environment may produce a version 7 GDB index when using lld. This patch also adds a test that checks whether specific symbols are included in the symbol table of the .gdb_index section. Because of this requirement, the integration test includes logic to support versions earlier than version 9 as well.

Closes #811

Comment thread libwild/src/gdb_index.rs Outdated
@lapla-cogito

Copy link
Copy Markdown
Member Author

I did git commit --amend --allow-empty because the release workflow was triggered by the previous commit (which I canceled)...
https://github.com/wild-linker/wild/actions/runs/26509302406

I have no idea what happened... 🤷

@marxin marxin left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the numerous findings I created - can you explain what was the genesis of the PR and how much was a LLM involved? I know we had a PR for the very same functionality that was closed right after it was opened. Is it an incarnation of the changes, or was it created completely independently?

Comment thread libwild/src/gdb_index.rs Outdated
Comment thread libwild/src/gdb_index.rs
Comment thread libwild/src/gdb_index.rs Outdated
Comment thread libwild/src/gdb_index.rs Outdated
Comment thread libwild/src/gdb_index.rs Outdated
Comment thread libwild/src/gdb_index.rs Outdated
Comment thread libwild/src/gdb_index.rs Outdated
Comment thread libwild/src/gdb_index.rs Outdated
Comment thread libwild/src/gdb_index.rs Outdated
Comment thread libwild/src/gdb_index.rs Outdated
@lapla-cogito

Copy link
Copy Markdown
Member Author

@marxin (I haven't seen the review details yet)

we had a PR for the very same functionality that was closed right after it was opened

I think you mean #1605. I used that as a reference when implementing this, since I had also reviewed it at the time and therefore already had some familiarity with the surrounding context and implementation details. However, I did not reuse it directly for licensing concerns (and in any case, it does not support version 9 as-is). I mainly used LLMs to help refine the implementation through review and iterative revisions.

@lapla-cogito lapla-cogito marked this pull request as draft May 27, 2026 21:30
@marxin

marxin commented May 28, 2026

Copy link
Copy Markdown
Collaborator

I think you mean #1605. I used that as a reference when implementing this, since I had also reviewed it at the time and therefore already had some familiarity with the surrounding context and implementation details.

Yes - fine, that's a good approach you chose ;)

However, I did not reuse it directly for licensing concerns (and in any case, it does not support version 9 as-is). I mainly used LLMs to help refine the implementation through review and iterative revisions.

All good! Let's iterate on the PR, given you knowledge, I guess you can address the findings pretty fast. Happy to review and help you with the feature.

@lapla-cogito lapla-cogito force-pushed the gdb_idx branch 2 times, most recently from a848771 to 9de1cc6 Compare May 31, 2026 02:38
Comment thread libwild/src/gdb_index.rs Outdated
@lapla-cogito lapla-cogito marked this pull request as ready for review June 1, 2026 05:07
@lapla-cogito lapla-cogito requested a review from marxin June 1, 2026 05:07

@davidlattimore davidlattimore left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran a quick benchmark to see what effect adding the flag --gdb-index has on the performance of the different linkers. With lld and mold, the performance difference between --gdb-index and --no-gdb-index was negligible. With wild however, it slows down linking by about 2x. Benchmark was with 32 threads linking zed.

I expect that the main problem is that there's lots of work being done single-threaded. e.g. iterating through all the objects. Also, looking up relevant sections by name is likely to be slow - as opposed to our usual approach where SectionRules determines what we should do with each input section.

Comment thread libwild/src/gdb_index.rs
Comment thread libwild/src/gdb_index.rs Outdated
Comment thread libwild/src/gdb_index.rs Outdated
Comment thread libwild/src/gdb_index.rs Outdated
sorted_symbols: sorted_names,
ht_slots,
..
} = scan_objects_for_gdb_index(objects)?;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice that this function is called twice. Is it expensive? If so, would it make sense to store the result from the first call?

@lapla-cogito lapla-cogito Jun 11, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed this call is expensive (and naturally contributes to performance degradation). Therefore, I also believe it should be cached in the future, but achieving this would require layout refactoring (as we discussed during our previous meeting, parallelizing object traversal (mentioned in above) also requires such changes). I'm unsure whether we should include these in this PR.

edit: After actually trying it out, I realized the change weren't as extensive as I thought, so I added it: 0c0b335

@marxin

marxin commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator

After spending a little bit of time researching the area, I noticed there's a systematic DWARF extension Lookup by Name aka .debug_names (DWARF 5 6.1.1): https://dwarfstd.org/doc/DWARF5.pdf. It's pretty similar to .gdbindex, but supported by both lldb and gdb. Clang can emit the format with -g -gpubnames (can be dumped with llvm-dwarfdump --debug-names) and lld has an option that builds one global cache (--debug-names). Moreover, what's interesting, gdb itself can generate (and consume) the index format: gdb-add-index --dwarf-5. If I'm not mistaken, it's only GCC that lacks the support.

With that said, shouldn't the long-term, future-focused approach be to build out the Debug Names format rather than introducing an ad hoc, GDB-specific index file format?

Comment thread libwild/src/gdb_index.rs
break;
}
let len = u64_from_slice(&data[pos + 4..]);
let dio = u64_from_slice(&data[pos + 14..]);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we don't parse the cu size field here, and use it instead of the size from parse_cu_boundaries? And if we do use the cu size from here, is there any reason to do parse_cu_boundaries at all? I think the cu offset can be adjusted with a simple section base addition.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pubnames/pubtypes sets only cover CUs that have public names. CUs with no exported symbols won't have a set. The gdb index CU list have to include all CUs (the address table indexes into it), so we need parse_cu_boundaries() to discover the complete set and assign correct global indices.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding was that .debug_gnu_pubnames does include local symbols, so would the problem instead be that some files were not compiled with -ggnu-pubnames?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would the problem instead be that some files were not compiled with -ggnu-pubnames?

Ah, right. Since we need to store all CUs in the CU list, we can't rely on .debug_gnu_pubnames to enumerate them. parse_cu_boundaries() walks .debug_info directly, which is always present with -g, so it handles this correctly.

@marxin marxin removed their request for review June 6, 2026 14:47
@lapla-cogito lapla-cogito marked this pull request as draft June 9, 2026 05:39
@lapla-cogito

Copy link
Copy Markdown
Member Author

After spending a little bit of time researching the area, I noticed there's a systematic DWARF extension Lookup by Name aka .debug_names (DWARF 5 6.1.1): https://dwarfstd.org/doc/DWARF5.pdf. It's pretty similar to .gdbindex, but supported by both lldb and gdb. Clang can emit the format with -g -gpubnames (can be dumped with llvm-dwarfdump --debug-names) and lld has an option that builds one global cache (--debug-names). Moreover, what's interesting, gdb itself can generate (and consume) the index format: gdb-add-index --dwarf-5. If I'm not mistaken, it's only GCC that lacks the support.

With that said, shouldn't the long-term, future-focused approach be to build out the Debug Names format rather than introducing an ad hoc, GDB-specific index file format?

Interesting. While we could indeed consider adding .debug_names support, I don't think it would be mutually exclusive with .gdb_index support. Moreover, considering implementation complexity and usage frequency, I don't think this would serve as a blocking factor for supporting GDB index.

@lapla-cogito lapla-cogito marked this pull request as ready for review June 11, 2026 03:09
Comment thread libwild/src/gdb_index.rs Outdated
entries.push(GdbIndexAddressEntry {
low_address: addr,
high_address: addr + section.size,
cu_index: base_cu,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a file contains multiple CUs, does this assign all addresses to the first CU?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Is this something that can happen in scenarios like --relocatable? Anyay, I was able to reproduce it with a test case and have fixed it. Thanks.

Comment thread libwild/src/gdb_index.rs Outdated
return Ok(());
}

timing_phase!("Write GDB index");

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance has improved. This time I benchmarked linking of zed-release and zed-debug. zed-release slowed down 12% when --gdb-index was passed, while zed-debug slowed down by 52%. The slowest phase is definitely "Write GDB index". It looks like write_gdb_index does a few different things. write_hash_table is already a separate function. Would any of the other things in here make sense to separate out to separate functions? Could you also add timing phases to any bits that seem like they might be significant?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I was already looking into optimizing the write phase, so while I was at it, I added a few more profiling points as well. fb3ccdf

Comment thread libwild/src/args/elf.rs
self.should_output_partial_object
}

fn should_write_gdb_index(&self) -> bool {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like lld and mold, if --strip-debug is supplied (or presumably --strip, but I didn't try that) won't write the gdb-index. Could we do the same (and have a test for it)? As a separate PR is fine, but also fine in this PR.

@lapla-cogito lapla-cogito Jun 11, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing locally, neither lld nor mold emits a GDB index when either --strip-all or --strip-debug is passed, so I changed the implementation to match that behavior. aa4f1c3

@marxin marxin left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reworked implementation is much better 👍 one thing I would consider: given the close connection to gdb, I would consider the integration of gdb into our testing harness.
There might be some debugging output that will show if the index was read and then we can inspect it? Just something for a consideration.

Comment thread libwild/src/gdb_index.rs
Comment thread libwild/src/gdb_index.rs
Comment thread libwild/src/gdb_index.rs Outdated
Comment thread libwild/src/gdb_index.rs Outdated
@lapla-cogito

Copy link
Copy Markdown
Member Author

Thank you for all the reviews. I'm going to merge this. For test integration, I'm also in favor of extending the tests to use gdb, but at least at this stage, I believe the current integration tests and mold tests (which already use gdb internally) are sufficient.

@lapla-cogito lapla-cogito merged commit 222ad1a into wild-linker:main Jun 12, 2026
25 checks passed
@lapla-cogito lapla-cogito deleted the gdb_idx branch June 12, 2026 07:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for --gdb-index

4 participants