feat: Support GDB index by lapla-cogito · Pull Request #1990 · wild-linker/wild

lapla-cogito · 2026-05-27T11:20:16Z

This patch adds support for --gdb-index and --no-gdb-index. Although we support version 9 here, the test environment may produce a version 7 GDB index when using lld. This patch also adds a test that checks whether specific symbols are included in the symbol table of the .gdb_index section. Because of this requirement, the integration test includes logic to support versions earlier than version 9 as well.

Closes #811

lapla-cogito · 2026-05-27T12:02:43Z

I did git commit --amend --allow-empty because the release workflow was triggered by the previous commit (which I canceled)...
https://github.com/wild-linker/wild/actions/runs/26509302406

I have no idea what happened... 🤷

marxin

Given the numerous findings I created - can you explain what was the genesis of the PR and how much was a LLM involved? I know we had a PR for the very same functionality that was closed right after it was opened. Is it an incarnation of the changes, or was it created completely independently?

lapla-cogito · 2026-05-27T21:09:05Z

@marxin (I haven't seen the review details yet)

we had a PR for the very same functionality that was closed right after it was opened

I think you mean #1605. I used that as a reference when implementing this, since I had also reviewed it at the time and therefore already had some familiarity with the surrounding context and implementation details. However, I did not reuse it directly for licensing concerns (and in any case, it does not support version 9 as-is). I mainly used LLMs to help refine the implementation through review and iterative revisions.

marxin · 2026-05-28T03:28:46Z

I think you mean #1605. I used that as a reference when implementing this, since I had also reviewed it at the time and therefore already had some familiarity with the surrounding context and implementation details.

Yes - fine, that's a good approach you chose ;)

However, I did not reuse it directly for licensing concerns (and in any case, it does not support version 9 as-is). I mainly used LLMs to help refine the implementation through review and iterative revisions.

All good! Let's iterate on the PR, given you knowledge, I guess you can address the findings pretty fast. Happy to review and help you with the feature.

davidlattimore

I ran a quick benchmark to see what effect adding the flag --gdb-index has on the performance of the different linkers. With lld and mold, the performance difference between --gdb-index and --no-gdb-index was negligible. With wild however, it slows down linking by about 2x. Benchmark was with 32 threads linking zed.

I expect that the main problem is that there's lots of work being done single-threaded. e.g. iterating through all the objects. Also, looking up relevant sections by name is likely to be slow - as opposed to our usual approach where SectionRules determines what we should do with each input section.

davidlattimore · 2026-06-01T10:11:16Z

+        sorted_symbols: sorted_names,
+        ht_slots,
+        ..
+    } = scan_objects_for_gdb_index(objects)?;


I notice that this function is called twice. Is it expensive? If so, would it make sense to store the result from the first call?

Indeed this call is expensive (and naturally contributes to performance degradation). Therefore, I also believe it should be cached in the future, but achieving this would require layout refactoring (as we discussed during our previous meeting, parallelizing object traversal (mentioned in above) also requires such changes). I'm unsure whether we should include these in this PR.

edit: After actually trying it out, I realized the change weren't as extensive as I thought, so I added it: 0c0b335

marxin · 2026-06-01T18:53:19Z

After spending a little bit of time researching the area, I noticed there's a systematic DWARF extension Lookup by Name aka .debug_names (DWARF 5 6.1.1): https://dwarfstd.org/doc/DWARF5.pdf. It's pretty similar to .gdbindex, but supported by both lldb and gdb. Clang can emit the format with -g -gpubnames (can be dumped with llvm-dwarfdump --debug-names) and lld has an option that builds one global cache (--debug-names). Moreover, what's interesting, gdb itself can generate (and consume) the index format: gdb-add-index --dwarf-5. If I'm not mistaken, it's only GCC that lacks the support.

With that said, shouldn't the long-term, future-focused approach be to build out the Debug Names format rather than introducing an ad hoc, GDB-specific index file format?

philipc · 2026-06-02T02:24:58Z

+                break;
+            }
+            let len = u64_from_slice(&data[pos + 4..]);
+            let dio = u64_from_slice(&data[pos + 14..]);


Is there a reason we don't parse the cu size field here, and use it instead of the size from parse_cu_boundaries? And if we do use the cu size from here, is there any reason to do parse_cu_boundaries at all? I think the cu offset can be adjusted with a simple section base addition.

pubnames/pubtypes sets only cover CUs that have public names. CUs with no exported symbols won't have a set. The gdb index CU list have to include all CUs (the address table indexes into it), so we need parse_cu_boundaries() to discover the complete set and assign correct global indices.

My understanding was that .debug_gnu_pubnames does include local symbols, so would the problem instead be that some files were not compiled with -ggnu-pubnames?

would the problem instead be that some files were not compiled with -ggnu-pubnames?

Ah, right. Since we need to store all CUs in the CU list, we can't rely on .debug_gnu_pubnames to enumerate them. parse_cu_boundaries() walks .debug_info directly, which is always present with -g, so it handles this correctly.

This reverts commit 638e550.

lapla-cogito · 2026-06-11T03:09:07Z

After spending a little bit of time researching the area, I noticed there's a systematic DWARF extension Lookup by Name aka .debug_names (DWARF 5 6.1.1): https://dwarfstd.org/doc/DWARF5.pdf. It's pretty similar to .gdbindex, but supported by both lldb and gdb. Clang can emit the format with -g -gpubnames (can be dumped with llvm-dwarfdump --debug-names) and lld has an option that builds one global cache (--debug-names). Moreover, what's interesting, gdb itself can generate (and consume) the index format: gdb-add-index --dwarf-5. If I'm not mistaken, it's only GCC that lacks the support.

With that said, shouldn't the long-term, future-focused approach be to build out the Debug Names format rather than introducing an ad hoc, GDB-specific index file format?

Interesting. While we could indeed consider adding .debug_names support, I don't think it would be mutually exclusive with .gdb_index support. Moreover, considering implementation complexity and usage frequency, I don't think this would serve as a blocking factor for supporting GDB index.

philipc · 2026-06-11T04:01:05Z

+                    entries.push(GdbIndexAddressEntry {
+                        low_address: addr,
+                        high_address: addr + section.size,
+                        cu_index: base_cu,


If a file contains multiple CUs, does this assign all addresses to the first CU?

I see. Is this something that can happen in scenarios like --relocatable? Anyay, I was able to reproduce it with a test case and have fixed it. Thanks.

davidlattimore · 2026-06-11T06:52:15Z

+        return Ok(());
+    }
+
+    timing_phase!("Write GDB index");


Performance has improved. This time I benchmarked linking of zed-release and zed-debug. zed-release slowed down 12% when --gdb-index was passed, while zed-debug slowed down by 52%. The slowest phase is definitely "Write GDB index". It looks like write_gdb_index does a few different things. write_hash_table is already a separate function. Would any of the other things in here make sense to separate out to separate functions? Could you also add timing phases to any bits that seem like they might be significant?

Actually I was already looking into optimizing the write phase, so while I was at it, I added a few more profiling points as well. fb3ccdf

davidlattimore · 2026-06-11T07:03:16Z

        self.should_output_partial_object
    }
+
+    fn should_write_gdb_index(&self) -> bool {


It looks like lld and mold, if --strip-debug is supplied (or presumably --strip, but I didn't try that) won't write the gdb-index. Could we do the same (and have a test for it)? As a separate PR is fine, but also fine in this PR.

Testing locally, neither lld nor mold emits a GDB index when either --strip-all or --strip-debug is passed, so I changed the implementation to match that behavior. aa4f1c3

marxin

The reworked implementation is much better 👍 one thing I would consider: given the close connection to gdb, I would consider the integration of gdb into our testing harness.
There might be some debugging output that will show if the index was read and then we can inspect it? Just something for a consideration.

lapla-cogito · 2026-06-12T07:03:10Z

Thank you for all the reviews. I'm going to merge this. For test integration, I'm also in favor of extending the tests to use gdb, but at least at this stage, I believe the current integration tests and mold tests (which already use gdb internally) are sufficient.

lapla-cogito commented May 27, 2026

View reviewed changes

Comment thread libwild/src/gdb_index.rs Outdated

lapla-cogito force-pushed the gdb_idx branch from e1c6be7 to aa288f0 Compare May 27, 2026 11:55

marxin requested changes May 27, 2026

View reviewed changes

lapla-cogito marked this pull request as draft May 27, 2026 21:30

lapla-cogito force-pushed the gdb_idx branch 2 times, most recently from a848771 to 9de1cc6 Compare May 31, 2026 02:38

marxin reviewed Jun 1, 2026

View reviewed changes

Comment thread libwild/src/gdb_index.rs Outdated

lapla-cogito marked this pull request as ready for review June 1, 2026 05:07

lapla-cogito requested a review from marxin June 1, 2026 05:07

davidlattimore reviewed Jun 1, 2026

View reviewed changes

philipc reviewed Jun 2, 2026

View reviewed changes

marxin removed their request for review June 6, 2026 14:47

Add section name constant and part/section IDs

8cf5980

lapla-cogito force-pushed the gdb_idx branch from 1aa313a to d365583 Compare June 9, 2026 05:38

lapla-cogito marked this pull request as draft June 9, 2026 05:39

lapla-cogito added 11 commits June 9, 2026 14:45

Add --gdb-index and --no-gdb-index

d1ba635

Implement .gdb_index section generation

935b190

Update skip tests list

3db1ae5

Add integration tests

a845911

Eliminate duplicate parse_cu_boundaries in compute_gdb_index_size

c17257a

Merge address and symbol table construction into single object scan

469eb9d

clippy fix

224d518

Extract shared pubnames/pubtypes iteration into a helper

9d3f0ff

Use more idiomatic APIs and fix step calc

b31d144

Calculate hashslots using zerocopy

c6aaf5b

Add a comment for cv_bytes addition

52a5131

lapla-cogito added 11 commits June 9, 2026 14:45

Write constant pool directly to output buffer

1b3070c

Use itertools sorted_unstable_by_key()

b935f8f

Unify object scannig logic

82fb063

Refactor pubnames collection

00ab232

Use BTreeSet for cv_entries

f3c060f

Improve error handling

ebf6270

Use sorted_names rather than sorted as a variable name

d35f6b8

Emit errors when GDB indices are corrupted

1c1550f

Add some comments and warnings

1084243

Revert "Emit errors when GDB indices are corrupted"

a682194

This reverts commit 638e550.

Add a comment about CU

f6f856c

lapla-cogito force-pushed the gdb_idx branch from d365583 to f6f856c Compare June 9, 2026 05:45

lapla-cogito added 2 commits June 11, 2026 11:27

Add some timing_phase!s

c56a5be

Return an error if gdb index hash table has no empty slots

b80b0c5

lapla-cogito force-pushed the gdb_idx branch from 06f3f9a to b80b0c5 Compare June 11, 2026 02:48

lapla-cogito marked this pull request as ready for review June 11, 2026 03:09

Parallelize GDB index scan and cache the result across phases

0c0b335

lapla-cogito force-pushed the gdb_idx branch from 43cda60 to 0c0b335 Compare June 11, 2026 03:36

philipc reviewed Jun 11, 2026

View reviewed changes

davidlattimore reviewed Jun 11, 2026

View reviewed changes

lapla-cogito added 5 commits June 11, 2026 16:30

Reduce reparsing .debug_info

fb3ccdf

Handle compressed debug sections in GDB index generation

1011cec

Replace most of breaks with ensure!

373a70a

Don't generate GDB index with --strip-debug or --strip-all

aa4f1c3

Fix address entries for objects with multiple CUs

b05456a

marxin approved these changes Jun 11, 2026

View reviewed changes

Comment thread libwild/src/gdb_index.rs

Comment thread libwild/src/gdb_index.rs

Comment thread libwild/src/gdb_index.rs Outdated

Comment thread libwild/src/gdb_index.rs Outdated

lapla-cogito merged commit 222ad1a into wild-linker:main Jun 12, 2026
25 checks passed

lapla-cogito deleted the gdb_idx branch June 12, 2026 07:03

Uh oh!

Conversation

lapla-cogito commented May 27, 2026

Uh oh!

Uh oh!

lapla-cogito commented May 27, 2026

Uh oh!

marxin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lapla-cogito commented May 27, 2026

Uh oh!

marxin commented May 28, 2026

Uh oh!

Uh oh!

davidlattimore left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lapla-cogito Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marxin commented Jun 1, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lapla-cogito commented Jun 11, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lapla-cogito Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marxin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lapla-cogito commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

lapla-cogito Jun 11, 2026 •

edited

Loading

lapla-cogito Jun 11, 2026 •

edited

Loading