Skip to content

fix(vmwarevsphere): use WaitForIP instead of WaitForNetIP to fix hang with multiple NICs#367

Open
scheidet wants to merge 1 commit intorancher:masterfrom
scheidet:fix/vsphere-multi-nic-waitforip-hang
Open

fix(vmwarevsphere): use WaitForIP instead of WaitForNetIP to fix hang with multiple NICs#367
scheidet wants to merge 1 commit intorancher:masterfrom
scheidet:fix/vsphere-multi-nic-waitforip-hang

Conversation

@scheidet
Copy link
Copy Markdown

Problem

When a node pool is configured with two network interfaces, machine creation hangs indefinitely at:

Waiting for VMware Tools to come online...

Root cause

GetIP() calls vm.WaitForNetIP(ctx, false) (govmomi), which monitors the guest.net property and only returns when all NICs have reported an IP address:

for _, ips := range macs {
    if len(ips) == 0 {
        return false // keeps waiting
    }
}

This hangs when the VM has multiple network adapters but only one has a DHCP server — a common setup where a secondary NIC is used for storage or internal traffic with no DHCP. VMware Tools reports the primary IP in vCenter, but WaitForNetIP never completes because the second NIC has no IP.

Fix

Replace WaitForNetIP with WaitForIP, which monitors guest.ipAddress (the primary IPv4 address reported by VMware Tools) and returns as soon as one IPv4 address is available, regardless of how many NICs the VM has.

This also removes a len(ips) >= 0 check that was always true (slice length is never negative) and could mask an empty-slice index panic.

Test

  • Node pool with 1 network: works before and after
  • Node pool with 2 networks (only one with DHCP): hangs before fix, completes after fix

…ng with multiple NICs

WaitForNetIP blocks until ALL NICs report an IP address via guest.net,
which causes an indefinite hang when a VM has multiple network adapters
but only one of them has a DHCP server. This is a common setup where
a secondary NIC is used for internal/storage traffic with no DHCP.

WaitForIP watches guest.ipAddress (the primary IP reported by VMware
Tools) and completes as soon as any IPv4 address is available, making
it the correct choice for retrieving the machine's reachable IP.

Also removes the len(ips) >= 0 check that was always true and could
lead to an index-out-of-bounds on an empty slice.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@scheidet
Copy link
Copy Markdown
Author

another bug here is, when you try delete machine, has a job deendency in a queue and you can not go on and never go back, you are in stuck scenarious.

@scheidet scheidet closed this Apr 16, 2026
@scheidet scheidet reopened this Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant