Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .jules/bolt.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,7 @@
## 2026-03-05 - Hot Loop Allocations in Token Sampling
**Learning:** Allocating memory in the hot path of LLM token generation (e.g., `logits.to_vec()` or creating `HashMap`s per token) significantly degrades performance due to repeated allocation overhead of vocabulary-sized vectors (often 128K+ elements). Additionally, mathematically equivalent iterative multiplication (`logit *= inv_penalty`) can replace `HashMap` counting and `.powi(count)`, completely eliminating O(N) memory allocations per token.
**Action:** When working on generation loops, use buffer pooling (e.g. storing a `Vec` in the generator state and using `std::mem::take` to bypass borrow checker limitations) and avoid `HashMap` allocations for simple counting if an iterative scalar approach is mathematically equivalent.

## 2026-03-05 - Repetition Penalty Power Calculation Optimization
**Learning:** Calculating `powi` unconditionally before an `if/else` block introduces significant overhead by performing work for branches that aren't taken. Furthermore, executing division (e.g., `logit /= penalty`) in a hot loop is computationally expensive and can be safely eliminated by calculating the inverse first (`1.0 / penalty`) and multiplying with the inverse `inv_penalty.powi(count)`.
**Action:** When working on math operations in conditional branches inside hot loops, pre-calculate values if possible outside the loop and lazily calculate values inside the branches that actually use them to avoid wasted cycles and unnecessary expensive operations like division.
10 changes: 8 additions & 2 deletions crates/bitnet-sampling/src/strategies.rs
Original file line number Diff line number Diff line change
Expand Up @@ -264,6 +264,9 @@
///
/// `token_counts` is a slice of `(token_id, occurrence_count)` pairs.
pub fn apply(&self, logits: &mut [f32], token_counts: &[(u32, usize)]) {
// ⚑ Bolt: Hoist the inverse calculation entirely outside the loop to avoid redundant operations
let inv_count_penalty = 1.0 / self.count_penalty;

Check notice on line 268 in crates/bitnet-sampling/src/strategies.rs

View workflow job for this annotation

GitHub Actions / ripr

RIPR static_unknown

Escalate to real mutation testing or deep static analysis for this probe. | Expression: let inv_count_penalty = 1.0 / self.count_penalty; | Confidence: 0.5

for &(token_id, count) in token_counts {
let idx = token_id as usize;
if idx >= logits.len() || count == 0 {
Expand All @@ -279,11 +282,14 @@
// Count penalty: multiplicative
if self.count_penalty.to_bits() != 1.0f32.to_bits() {
let count = i32::try_from(count).unwrap_or(i32::MAX);
let penalty = self.count_penalty.powi(count);

Check notice on line 285 in crates/bitnet-sampling/src/strategies.rs

View workflow job for this annotation

GitHub Actions / ripr

RIPR exposed

| Expression: let penalty = self.count_penalty.powi(count); | Confidence: 1.0
if logits[idx] > 0.0 {
logits[idx] /= penalty;
// ⚑ Bolt: Lazily compute inverse power inside branch and multiply to safely eliminate division overhead

Check notice on line 287 in crates/bitnet-sampling/src/strategies.rs

View workflow job for this annotation

GitHub Actions / ripr

RIPR static_unknown

Escalate to real mutation testing or deep static analysis for this probe. | Expression: logits[idx] /= penalty; | Confidence: 0.5
let inv_penalty = inv_count_penalty.powi(count);

Check notice on line 288 in crates/bitnet-sampling/src/strategies.rs

View workflow job for this annotation

GitHub Actions / ripr

RIPR exposed

| Expression: let inv_penalty = inv_count_penalty.powi(count); | Confidence: 1.0
logits[idx] *= inv_penalty;

Check notice on line 289 in crates/bitnet-sampling/src/strategies.rs

View workflow job for this annotation

GitHub Actions / ripr

RIPR static_unknown

Escalate to real mutation testing or deep static analysis for this probe. | Expression: logits[idx] *= inv_penalty; | Confidence: 0.5
} else {
// ⚑ Bolt: Lazily compute power inside branch to avoid unnecessary work when this branch is not taken
let penalty = self.count_penalty.powi(count);

Check notice on line 292 in crates/bitnet-sampling/src/strategies.rs

View workflow job for this annotation

GitHub Actions / ripr

RIPR exposed

| Expression: let penalty = self.count_penalty.powi(count); | Confidence: 1.0
logits[idx] *= penalty;
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,6 @@ expression: logits
1.0,
0.15026294,
3.0,
2.909091,
2.9090908,
5.0,
]
Loading