Skip to content

feat(bigtable): add ClientConfigurationManager + EnableSession config#19986

Open
sushanb wants to merge 1 commit into
googleapis:mainfrom
sushanb:client-config-manager
Open

feat(bigtable): add ClientConfigurationManager + EnableSession config#19986
sushanb wants to merge 1 commit into
googleapis:mainfrom
sushanb:client-config-manager

Conversation

@sushanb

@sushanb sushanb commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add ClientConfigurationManager in bigtable/internal/transport: polls GetClientConfiguration on a fixed interval, parses the response into a typed clientConfig, and fans changes out to registered listeners. Supports RPC retry, validity-window fallback, and Close() shutdown that waits for in-flight polls.
  • Test suite covering construction, polling, listener fan-out, transient-error retry, and Close semantics.
  • Wire it into Client behind a new ClientConfig.EnableSession flag (off by default). When set, NewClientWithConfig constructs the manager using the data-plane BigtableClient and the instance-scoped request metadata, then starts it; Close stops the manager first so it cannot fire listener callbacks against state that is about to be torn down.

Future session-pool PRs will key off the configuration surface this manager exposes. For now it only polls.

Part of the series of small PRs replacing #19980.

Test plan

  • go build ./... in bigtable/
  • go vet ./... in bigtable/
  • go test -short ./internal/transport/... . in bigtable/ — all pass
  • CI green

Introduce the ClientConfigurationManager, which polls the service via
GetClientConfiguration on a fixed interval and fans the parsed config
out to registered listeners. Includes the manager (Start / Close /
listener registration / RPC retry / validity-window fallback) and a
test suite covering construction, polling, listener fan-out, retry on
transient errors, and Close shutdown semantics.

Wire it into Client behind a new ClientConfig.EnableSession flag (off
by default). When set, NewClientWithConfig constructs the manager
using the data-plane BigtableClient and the instance-scoped request
metadata, then starts it; Close stops the manager first so it cannot
fire listener callbacks against state that is about to be torn down.

Future session-pool PRs will key off the configuration surface this
manager exposes. For now it only polls.

Part of the series of small PRs replacing googleapis#19980.
@sushanb sushanb requested review from a team as code owners June 16, 2026 04:53
@product-auto-label product-auto-label Bot added the api: bigtable Issues related to the Bigtable API. label Jun 16, 2026
@sushanb sushanb requested review from mutianf and nimf June 16, 2026 04:55

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a ClientConfigurationManager to dynamically poll and manage client configurations from the Bigtable control plane. The review feedback identifies several critical issues: the background polling loop is bound to a potentially short-lived context in Start which could prematurely terminate polling; a race condition in addListener can allow stale configurations to overwrite newer ones; an integer overflow in the exponential backoff calculation (1<<i) can cause a runtime panic in rand.Intn; and the fallback mechanism fails to reset the validity window (m.validUntil), resulting in redundant fallback notifications during prolonged outages.

Comment on lines +183 to +197
func (m *ClientConfigurationManager) Start(ctx context.Context) {
btopt.Debugf(m.logger, "bigtable: starting client configuration manager for instance %q, app profile %q", m.instanceName, m.appProfileId)
// We need a context for the initial poll.
m.pollsWG.Add(1)
go func() {
defer m.pollsWG.Done()
pollCtx, cancel := context.WithTimeout(ctx, 5*time.Second)
defer cancel()
m.poll(pollCtx)
}()

// Start background polling
go m.pollingLoop(ctx)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The background polling loop and the initial poll are currently bound to the lifecycle of the context passed to Start(ctx). In typical usage, the context passed to client initialization (NewClientWithConfig) is short-lived and canceled shortly after client creation. If this context is canceled, the background polling loop will terminate immediately, and the initial poll may be cut short.

To ensure the configuration manager continues to poll for the lifetime of the client, the background polling loop should be run with context.Background(). Similarly, the initial poll should use context.Background() with a timeout to avoid premature cancellation.

Suggested change
func (m *ClientConfigurationManager) Start(ctx context.Context) {
btopt.Debugf(m.logger, "bigtable: starting client configuration manager for instance %q, app profile %q", m.instanceName, m.appProfileId)
// We need a context for the initial poll.
m.pollsWG.Add(1)
go func() {
defer m.pollsWG.Done()
pollCtx, cancel := context.WithTimeout(ctx, 5*time.Second)
defer cancel()
m.poll(pollCtx)
}()
// Start background polling
go m.pollingLoop(ctx)
}
// Start begins the polling process.
func (m *ClientConfigurationManager) Start(ctx context.Context) {
btopt.Debugf(m.logger, "bigtable: starting client configuration manager for instance %q, app profile %q", m.instanceName, m.appProfileId)
// We need a context for the initial poll.
m.pollsWG.Add(1)
go func() {
defer m.pollsWG.Done()
pollCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
m.poll(pollCtx)
}()
// Start background polling
go m.pollingLoop(context.Background())
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong. ctx is from Newclient so it is long lived.

Comment on lines +234 to +255
func (m *ClientConfigurationManager) addListener(listener configListener) func() {
m.mu.Lock()
id := m.nextListenerID
m.nextListenerID++
if m.listeners == nil {
m.listeners = make(map[int]configListener)
}
m.listeners[id] = listener

cfg := m.currentConfig.Clone()
seq := m.configSeq
m.mu.Unlock()

btopt.Debugf(m.logger, "bigtable: adding configuration listener (id: %d)", id)
listener(cfg, seq)

return func() {
m.mu.Lock()
delete(m.listeners, id)
m.mu.Unlock()
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is a subtle race condition in addListener where a concurrent poll can complete and notify the newly added listener with a newer configuration sequence number before the initial notification listener(cfg, seq) is executed. This results in the listener receiving the older configuration sequence after the newer one, causing it to silently overwrite the newer configuration with the stale one.

While the test TestManagerNotifyListeners_Race implements a sequence check to handle this, the actual production listeners registered via AddSessionPoolListener and AddSessionLoadListener do not have any sequence checks and are vulnerable to this race.

To prevent this, we can wrap the listener in addListener to enforce monotonic sequence numbers using atomic.Int64.

func (m *ClientConfigurationManager) addListener(listener configListener) func() {
	m.mu.Lock()
	id := m.nextListenerID
	m.nextListenerID++

	var lastSeq atomic.Int64
	lastSeq.Store(-1)

	wrappedListener := func(cfg clientConfig, seq int64) {
		for {
			current := lastSeq.Load()
			if seq <= current {
				return
			}
			if lastSeq.CompareAndSwap(current, seq) {
				listener(cfg, seq)
				return
			}
		}
	}

	if m.listeners == nil {
		m.listeners = make(map[int]configListener)
	}
	m.listeners[id] = wrappedListener

	cfg := m.currentConfig.Clone()
	seq := m.configSeq
	m.mu.Unlock()

	btopt.Debugf(m.logger, "bigtable: adding configuration listener (id: %d)", id)
	wrappedListener(cfg, seq)

	return func() {
		m.mu.Lock()
		delete(m.listeners, id)
		m.mu.Unlock()
	}
}

Comment on lines +355 to +362
if i < maxRetries {
delay := time.Duration(rand.Intn(1<<i)) * time.Second
select {
case <-ctx.Done():
return
case <-time.After(delay):
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If maxRetries is configured to a large value (or if the server returns an extreme value), 1<<i can overflow the signed integer type, resulting in a negative value or zero. Passing a non-positive value to rand.Intn will cause a runtime panic, crashing the client.

Additionally, without a cap, the backoff delay can grow excessively large (e.g., hours or days). Capping the backoff limit at a reasonable maximum (e.g., 30 seconds) prevents both integer overflow panics and excessively long retry delays.

Suggested change
if i < maxRetries {
delay := time.Duration(rand.Intn(1<<i)) * time.Second
select {
case <-ctx.Done():
return
case <-time.After(delay):
}
}
if i < maxRetries {
backoffLimit := 1 << i
if backoffLimit > 30 {
backoffLimit = 30
}
delay := time.Duration(rand.Intn(backoffLimit)) * time.Second
select {
case <-ctx.Done():
return
case <-time.After(delay):
}
}

Comment on lines +371 to +382
// Fall back to default configuration if validity window has expired
if time.Now().After(m.validUntil) {
btopt.Debugf(m.logger, "bigtable: client configuration validity window expired, falling back to default config")
m.currentConfig = m.defaultConfig
m.configSeq++
seq = m.configSeq
listeners = make([]configListener, 0, len(m.listeners))
for _, l := range m.listeners {
listeners = append(listeners, l)
}
cfgToNotify = m.currentConfig
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

When the client configuration validity window expires and the poll fails, the manager falls back to the default configuration. However, because m.validUntil is not reset or updated during this fallback, every subsequent failed poll (which occurs on every polling interval) will repeatedly trigger the fallback block. This results in redundant sequence number increments and unnecessary listener notifications with the same default configuration during a prolonged outage.

Resetting m.validUntil to a far-future time (similar to initialization) upon fallback prevents these redundant notifications.

Suggested change
// Fall back to default configuration if validity window has expired
if time.Now().After(m.validUntil) {
btopt.Debugf(m.logger, "bigtable: client configuration validity window expired, falling back to default config")
m.currentConfig = m.defaultConfig
m.configSeq++
seq = m.configSeq
listeners = make([]configListener, 0, len(m.listeners))
for _, l := range m.listeners {
listeners = append(listeners, l)
}
cfgToNotify = m.currentConfig
}
// Fall back to default configuration if validity window has expired
if time.Now().After(m.validUntil) {
btopt.Debugf(m.logger, "bigtable: client configuration validity window expired, falling back to default config")
m.currentConfig = m.defaultConfig
m.configSeq++
seq = m.configSeq
m.validUntil = time.Now().Add(time.Hour * 24 * 365 * 100)
listeners = make([]configListener, 0, len(m.listeners))
for _, l := range m.listeners {
listeners = append(listeners, l)
}
cfgToNotify = m.currentConfig
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: bigtable Issues related to the Bigtable API.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant