Skip to content

Delay updateAll() & only update subset immediately#815

Open
lejeunerenard wants to merge 14 commits into
mainfrom
min-delay-update-all
Open

Delay updateAll() & only update subset immediately#815
lejeunerenard wants to merge 14 commits into
mainfrom
min-delay-update-all

Conversation

@lejeunerenard

Copy link
Copy Markdown
Contributor

Update all is too aggressive at the moment and tries to bump any pending ranges etc every time. This is meant to cover edgecases but that can be done via a delay instead.

If the number of peers is greater than or equal to MIN_PEER_UPDATE_ALL (currently 10), only a subset of the peers are scanned immediately. A full scan is schedule later with a delay based on the number of peers. Math.sqrt(this.peers.length) * 100ms

updateAll() now takes an optional limit arg so the code is reused between the subset & complete scans.

All calls to updateAll() were refactor to queueUpdateAll() except in core.truncate() as that is assumed to be strongly propagated.

If the number of peers is greater than or equal to `MIN_PEER_UPDATE_ALL`
(currently `10`), only a subset of the peers are scanned immediately.
A full scan is schedule later with a delay based on the number of peers.
`Math.sqrt(this.peers.length) * 100ms`

`updateAll()` now takes an optional `limit` arg so the code is reused
between the subset & complete scans.

All calls to `updateAll()` were refactor to `queueUpdateAll()` except in
`core.truncate()` as that is assumed to be strongly propagated.
@lejeunerenard lejeunerenard requested a review from a team May 19, 2026 21:54
Comment thread lib/replicator.js Outdated
this._notDownloadingTimer = null

this._updateAllBump = null
this._updateAllBound = () => {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a bind instead on an instance method

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed to set _updateAllBump to null so it would reschedule. would normally put in updateAll but wanted to reuse it for both subset and full scan.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thats ok, just define this as _something and then bind that here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually much easier if you add to updateAll no?

if (this._updateBump !== null) {
clearTimeout(this._updateBump)
this._updateBump = null
}

then my below comment on the clear is irrelevant and you can rmeove the clearTimeotu down there also from the else

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted so the bound version is just a .bind() and moved the clear to updateAll(). It checks if a full scan before clearing since the method is used for the subset as well.

Comment thread lib/replicator.js Outdated

queueUpdateAll() {
if (this.peers.length >= MIN_PEER_UPDATE_ALL) {
const MIN_DELAY_UPDATE_ALL = Math.sqrt(this.peers.length) * 100

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

normal case, its not a constant. I think you should define a function isntead that just makes this static

getDelay(peers) {
if (peers < 10) return 100
if (peers < 50) return 200
if (peers < 100) return 300
if (peers < 400) return 400
return 1000
}

or whatever we want

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still want it to be square root? I like that it doesn't grow linear just in case.

Comment thread lib/replicator.js Outdated
const LAST_BLOCKS = 256

const MAX_RANGES = 64
const MIN_PEER_UPDATE_ALL = 10

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should prob set a bit higher, like

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

30-40 i think

Comment thread lib/replicator.js Outdated
if (this._updateAllBump !== null) return //skip if already scheduled
this._updateAllBump = setTimeout(this._updateAllBound, this.getUpdateAllDelay(this.peers))
} else {
clearTimeout(this._updateAllBump)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to set bump to null post this btw, otherwise live lock per above

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmm, you are cancelling that call here tho

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, you're right sorry. i have fixed per the other comment.

Comment thread lib/replicator.js

getUpdateAllDelay(peers) {
return Math.sqrt(peers.length) * 100
return Math.min(3000, Math.max(100, (peers.length - MIN_PEER_UPDATE_ALL) * 5))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok if MIN_PEER_UPDATE is relatively non small, like 40-50

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated MIN_PEER_UPDATE_ALL to 40. Previously was 30.

@lejeunerenard

Copy link
Copy Markdown
Contributor Author

Merged main in as it had a conflict.

@marcus-pousette-hp

marcus-pousette-hp commented May 25, 2026

Copy link
Copy Markdown
Contributor

I got curious about this PR,
I think this can leave _updateAllBump stuck non-null. If the delayed full scan fires while _applyingReorg is active, updateAll() returns before clearing the timer handle. After that, queueUpdateAll() keeps doing only the limited scan and never schedules another full scan

@marcus-pousette-hp

Copy link
Copy Markdown
Contributor

The regression I am analysing:

test('delayed updateAll clears timer after reorg skip', async function (t) {
  const Replicator = require('../lib/replicator')
  const r = new Replicator({})

  t.teardown(() => {
    if (r._updateAllBump) clearTimeout(r._updateAllBump)
  })

  r.peers = Array.from({ length: 40 }, () => ({}))
  r.getUpdateAllDelay = () => 1

  // Simulate the delayed full scan firing while a reorg is active
  r._applyingReorg = {}

  r.queueUpdateAll()
  await new Promise(resolve => setTimeout(resolve, 10))

  t.is(r._updateAllBump, null, 'timer handle should be cleared after firing')
})

@marcus-pousette-hp

Copy link
Copy Markdown
Contributor

Sanity check, I am investigating side effects that

core.upgrade({ wait: true }) 

can resolve early false because in the case if there is 41 peers, and we do

queueUpdateAll() calls updateAll(40) which call _checkUpgradeIfAvailable() in the end, and from that we can end up calling u.resolve(false)

If I understand it from previous behaviour, this is different, and unexpected. Since we should not resolve early false we we acutally have not waited for the full update.

I could try to produce a regression test

Comment thread lib/replicator.js
@marcus-pousette-hp

Copy link
Copy Markdown
Contributor

Sanity check, I am investigating side effects that

core.upgrade({ wait: true }) 

can resolve early false because in the case if there is 41 peers, and we do

queueUpdateAll() calls updateAll(40) which call _checkUpgradeIfAvailable() in the end, and from that we can end up calling u.resolve(false)

If I understand it from previous behaviour, this is different, and unexpected. Since we should not resolve early false we we acutally have not waited for the full update.

I could try to produce a regression test

It becomes a bit messy but, this seems to catch the regression (Co-authored with AI)

test('wait update does not resolve false before delayed full scan with many peers', async function (t) {
  const writer = await create(t)
  const blocks = []
  for (let i = 0; i < 100; i++) blocks.push('block-' + i)
  await writer.append(blocks)

  const reader = await create(t, writer.key, { eagerUpgrade: false })

  // Bootstrap reader to the stale length, then disconnect so only the explicit
  // update below can observe the later writer append.
  let streams = replicate(writer, reader, t, { teardown: false })
  await reader.update({ wait: true, force: true })
  await reader.download({ start: 0, end: 100 }).done()
  await unreplicate(streams)

  const stalePeers = []
  for (let i = 0; i < 40; i++) {
    const peer = await create(t, writer.key)

    streams = replicate(writer, peer, t, { teardown: false })
    await peer.update({ wait: true, force: true })
    await peer.download({ start: 0, end: 100 }).done()
    await unreplicate(streams)

    stalePeers.push(peer)
  }

  await writer.append('new-block')

  // Keep real non-primary work pending. Without this, the skipped writer can
  // remain in the first 3 peers that _checkUpgradeIfAvailable() samples.
  await reader.clear(0, 100)
  const range = reader.download({ start: 0, end: 100, linear: true })

  // These live streams can use normal test teardown.
  replicate(writer, reader, t)
  for (const peer of stalePeers) replicate(peer, reader, t)

  while (reader.replicator.peers.length !== 41) {
    await new Promise((resolve) => setImmediate(resolve))
  }

  const oldRandom = Math.random
  let update = null

  try {
    // RandomIterator otherwise makes whether the writer is skipped probabilistic.
    Math.random = () => 0

    update = reader.update({ wait: true, force: true })

    let settled = false
    let result = null
    update.then((value) => {
      settled = true
      result = value
    })

    await new Promise((resolve) => setImmediate(resolve))

    t.is(settled, false, `partial scan should not resolve wait update as ${result}`)
  } finally {
    Math.random = oldRandom

    // In the fixed case, update is intentionally still pending at the assertion.
    // Cancel it so the test does not leave an active upgrade ref behind.
    if (update !== null) {
      reader.core.replicator.clearRequests(reader.activeRequests, null)
      await update.catch(() => {})
    }

    range.destroy()

    // When the cleanup issue in PR is fixed this can be removed
    if (reader.replicator._updateAllBump) {
      clearTimeout(reader.replicator._updateAllBump)
      reader.replicator._updateAllBump = null
    }
  }
})

@lejeunerenard

Copy link
Copy Markdown
Contributor Author

@marcus-pousette-hp thanks for checking this. This tests also doesn't pass on main. This is because of the limit MAX_PEERS_UPGRADE which constrains how many peers we check when looking for an upgrade. In the test case, it checks the max (3) and determines there is no upgrade available atm so the core is updated.

The docs are a bit confusing as they say:

Use core.findingPeers() or { wait: true } to make await core.update() blocking.

Which does sound like it would be fully blocking by exhaustively checking all peers, but instead is more of a 'blocking, if available' where 'if available' is a check against a random subset of peers.

@marcus-pousette-hp

Copy link
Copy Markdown
Contributor

@marcus-pousette-hp thanks for checking this. This tests also doesn't pass on main. This is because of the limit MAX_PEERS_UPGRADE which constrains how many peers we check when looking for an upgrade. In the test case, it checks the max (3) and determines there is no upgrade available atm so the core is updated.

The docs are a bit confusing as they say:

Use core.findingPeers() or { wait: true } to make await core.update() blocking.

Which does sound like it would be fully blocking by exhaustively checking all peers, but instead is more of a 'blocking, if available' where 'if available' is a check against a random subset of peers.

Then I am happy with the PR! I guess this is either a documentation bug, or a behaivour bug that would need a follow up in another issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants