Bug 1182 - VM failing to boot with pve-qemu-kvm >=2.7.0-3 on certain systems
Summary: VM failing to boot with pve-qemu-kvm >=2.7.0-3 on certain systems
Status: RESOLVED FIXED
Alias: None
Product: pve
Classification: Unclassified
Component: Qemu (show other bugs)
Version: unspecified
Hardware: x86_64 (AMD64) Linux
: --- normal
Assignee: Fabian Grünbichler
URL:
Depends on:
Blocks:
 
Reported: 2016-10-25 14:15 CEST by Michael Prokop
Modified: 2017-01-18 15:44 CET (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Prokop 2016-10-25 14:15:13 CEST
We're using PXE boot for automated installation of VMs with a config looking like:

  bios: seabios
  boot: cn
  bootdisk: virtio0
  cores: 1
  cpu: host
  memory: 2048
  name: foobar
  net0: virtio=3A:B4:5F:22:B4:3F,bridge=vmbr2
  numa: 0 
  ostype: l26
  smbios1: uuid=bb440b56-4b69-44ac-b54b-2a85f7faa1ab
  sockets: 1
  unused0: buildtmpfs:173/vm-173-disk-1.vmdk
  vga: std

(This unused disk for the VM is there as I wanted to make sure the disk isn't responsible and therefore disabled it.)

Now while this worked perfectly fine until and including pve-qemu-kvm 2.6.2-2 it's failing with versions 2.7.0-3 + 2.7.0-4.

We get:

  SeasBIOS (version rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org)
  Machine UUID [....]
  Booting from Hard Disk...

  Booting from ROM...
  iPXE (PCI 00:12.0) starting execution...

and hanging there indefinitely, instead of the working:

  SeasBIOS (version rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org)
  Machine UUID [....]
  Booting from Hard Disk...

  Booting from ROM...
  iPXE (PCI 00:12.0) starting execution...ok
  iPXE initialising devices...ok

  iPXE 1.0.0+ [....]
  [...]

I stumbled upon the issue https://bugzilla.proxmox.com/show_bug.cgi?id=1181 ("BIOS: running inside vmware, booting virtio devices does not work anymore") which looks similar, though we aren't running it inside VMware but on physical bare metal servers. Also pve-qemu-kvm >=2.7.0-3 seems to be working fine on some other machines of us which are set up similar to the system which fails. 

The system where it's *failing* is:

  root@builder1 ~ # lscpu | grep 'Model name'
  Model name:            Intel(R) Xeon(R) CPU           L5420  @ 2.50GHz
  root@builder1 ~ # lshw -C net | grep product
         product: NetXtreme II BCM5708S Gigabit Ethernet
         product: NetXtreme II BCM5708S Gigabit Ethernet

while the same config/setup works on this machine:

  root@builder2 ~ # lscpu | grep 'Model name'
  Model name:            Intel(R) Xeon(R) CPU           L5639  @ 2.13GHz
  root@builder2 ~ # lshw -C net | grep product
         product: NetXtreme II BCM5709S Gigabit Ethernet
         product: NetXtreme II BCM5709S Gigabit Ethernet

(Unsure whether this could be really related to the hardware underneath, but just in case.)

Downgrading to pve-qemu-kvm 2.6.2-2 works around this issue for us.
Any ideas where to continue with debugging or what might be the source of the culprit?
Comment 1 Fabian Grünbichler 2016-10-25 14:59:01 CEST
could you try booting from a virtio disk and removing the network boot?

https://bugs.launchpad.net/qemu/+bug/1623276 seems to describe your symptoms pretty exactly..

if booting from a virtio disk also fails, could you try changing the controller to sata?

thanks in advance!
Comment 2 Michael Prokop 2016-10-25 15:04:13 CEST
Thanks for the suggestion, will try that.

In the meanwhile I tried to debug whether it might be specific to the qemu version (recording detailed steps just-in-case):

  # uname -r -v
  4.4.21-1-pve #1 SMP Thu Oct 20 14:56:39 CEST 2016
  # pveversion 
  pve-manager/4.3-7/db02a4de (running kernel: 4.4.21-1-pve)

  # debootstrap jessie ./jessie_chroot http://debian.inode.at/debian/
  # chroot ./jessie_chroot /bin/bash
  # apt-get install build-essential git debhelper autotools-dev libpci-dev quilt texinfo texi2html libgnutls28-dev libsdl1.2-dev check libaio-dev uuid-dev librbd-dev libiscsi-dev libspice-protocol-dev libspice-server-dev libusbredirparser-dev glusterfs-common libusb-1.0-0-dev xfslibs-dev libnuma-dev libjemalloc-dev libjpeg-dev
  # git clone git://git.qemu.org/qemu.git
  # cd qemu
  # ./configure --with-confsuffix="/kvm" --target-list=x86_64-softmmu --prefix=/usr --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var --disable-xen --enable-gnutls --enable-sdl --enable-uuid --enable-linux-aio --enable-rbd --enable-libiscsi --disable-smartcard --audio-drv-list="alsa" --enable-spice --enable-usb-redir --enable-glusterfs --enable-libusb --disable-gtk --enable-xfsctl --enable-numa --disable-strip --enable-jemalloc --disable-libnfs --disable-fdt
  # make # and then exit chroot

  # apt-get install libiscsi2 # wasn't installed on my host system
  # ./jessie_chroot/root/qemu/x86_64-softmmu/qemu-system-x86_64 $VM_OPTIONS

Using the VM_OPTIONS from `qm showcmd $VM_ID` output (and dropping the unsupported -id $VMID option, added '-enable-kvm' instead to work around "CPU model 'host' requires KVM" failure):

   # ./jessie_chroot/root/qemu/x86_64-softmmu/qemu-system-x86_64 -chardev 'socket,id=qmp,path=/var/run/qemu-server/173.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/173.pid -daemonize -smbios 'type=1,uuid=bb440b56-4b69-44ac-b54b-2a85f7faa1ab' -name foobar -smp '1,sockets=1,cores=1,maxcpus=1' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga std -vnc unix:/var/run/qemu-server/173.vnc,x509,password -cpu host,+kvm_pv_unhalt,+kvm_pv_eoi -m 2048 -k en-us -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -netdev 'type=tap,id=net0,ifname=tap173i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=3A:B4:5F:22:B4:3F,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=200' -enable-kvm

Whereas my qemu git checkout is as of:

  # git log --oneline -1
  4429532 tests: Restore check-qdict unit test
  # git describe
  v2.7.0-1343-g4429532

The .bin + .rom files inside /usr/share/kvm/ which seem to be used by the qemu process (I straced it) also match what's present in my qemu.git checkout then (md5sums):

  a1627577ad0d5e69f9609c23b2a54182  /usr/share/kvm/bios-256k.bin
  b8cec9572e408a3259914f9aba8664cb  /usr/share/kvm/kvmvapic.bin
  7ddb3370d30d8abe53431e2310b23aae  /usr/share/kvm/vgabios-stdvga.bin
  10bd6625271d30edd40182b13c4924cc  /usr/share/kvm/efi-virtio.rom

Now testing with your suggestions.
Comment 3 Fabian Grünbichler 2016-10-25 15:26:44 CEST
(In reply to Michael Prokop from comment #2)
> Thanks for the suggestion, will try that.
> 
> In the meanwhile I tried to debug whether it might be specific to the qemu
> version (recording detailed steps just-in-case):
> 
>   # uname -r -v
>   4.4.21-1-pve #1 SMP Thu Oct 20 14:56:39 CEST 2016
>   # pveversion 
>   pve-manager/4.3-7/db02a4de (running kernel: 4.4.21-1-pve)
> 
>   # debootstrap jessie ./jessie_chroot http://debian.inode.at/debian/
>   # chroot ./jessie_chroot /bin/bash
>   # apt-get install build-essential git debhelper autotools-dev libpci-dev
> quilt texinfo texi2html libgnutls28-dev libsdl1.2-dev check libaio-dev
> uuid-dev librbd-dev libiscsi-dev libspice-protocol-dev libspice-server-dev
> libusbredirparser-dev glusterfs-common libusb-1.0-0-dev xfslibs-dev
> libnuma-dev libjemalloc-dev libjpeg-dev
>   # git clone git://git.qemu.org/qemu.git
>   # cd qemu
>   # ./configure --with-confsuffix="/kvm" --target-list=x86_64-softmmu
> --prefix=/usr --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var
> --disable-xen --enable-gnutls --enable-sdl --enable-uuid --enable-linux-aio
> --enable-rbd --enable-libiscsi --disable-smartcard --audio-drv-list="alsa"
> --enable-spice --enable-usb-redir --enable-glusterfs --enable-libusb
> --disable-gtk --enable-xfsctl --enable-numa --disable-strip
> --enable-jemalloc --disable-libnfs --disable-fdt
>   # make # and then exit chroot
> 
>   # apt-get install libiscsi2 # wasn't installed on my host system
>   # ./jessie_chroot/root/qemu/x86_64-softmmu/qemu-system-x86_64 $VM_OPTIONS
> 
> Using the VM_OPTIONS from `qm showcmd $VM_ID` output (and dropping the
> unsupported -id $VMID option, added '-enable-kvm' instead to work around
> "CPU model 'host' requires KVM" failure):
> 
>    # ./jessie_chroot/root/qemu/x86_64-softmmu/qemu-system-x86_64 -chardev
> 'socket,id=qmp,path=/var/run/qemu-server/173.qmp,server,nowait' -mon
> 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/173.pid -daemonize
> -smbios 'type=1,uuid=bb440b56-4b69-44ac-b54b-2a85f7faa1ab' -name foobar -smp
> '1,sockets=1,cores=1,maxcpus=1' -nodefaults -boot
> 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/
> bootsplash.jpg' -vga std -vnc
> unix:/var/run/qemu-server/173.vnc,x509,password -cpu
> host,+kvm_pv_unhalt,+kvm_pv_eoi -m 2048 -k en-us -device
> 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device
> 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device
> 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device
> 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device
> 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -netdev
> 'type=tap,id=net0,ifname=tap173i0,script=/var/lib/qemu-server/pve-bridge,
> downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device
> 'virtio-net-pci,mac=3A:B4:5F:22:B4:3F,netdev=net0,bus=pci.0,addr=0x12,
> id=net0,bootindex=200' -enable-kvm
> 
> Whereas my qemu git checkout is as of:
> 
>   # git log --oneline -1
>   4429532 tests: Restore check-qdict unit test
>   # git describe
>   v2.7.0-1343-g4429532
> 
> The .bin + .rom files inside /usr/share/kvm/ which seem to be used by the
> qemu process (I straced it) also match what's present in my qemu.git
> checkout then (md5sums):
> 
>   a1627577ad0d5e69f9609c23b2a54182  /usr/share/kvm/bios-256k.bin
>   b8cec9572e408a3259914f9aba8664cb  /usr/share/kvm/kvmvapic.bin
>   7ddb3370d30d8abe53431e2310b23aae  /usr/share/kvm/vgabios-stdvga.bin
>   10bd6625271d30edd40182b13c4924cc  /usr/share/kvm/efi-virtio.rom
> 
> Now testing with your suggestions.

thanks for the quick and detailed response - very much appreciated.

in case the disk-only boot does not work with virtio disks, it would be very helpful if you could test this setup with the "vanilla" qemu as well!
Comment 4 Michael Prokop 2016-10-25 16:20:40 CEST
> could you try booting from a virtio disk and removing the network boot?

Yes, this looks better: http://michael-prokop.at/screeni/screenshot.2016-10-25T16:18:07.png
(The disk is empty, but AFAICT that's fine as-is since it's clearly not stopping/hanging)

> https://bugs.launchpad.net/qemu/+bug/1623276 seems to describe your symptoms pretty exactly..

Thanks for the pointer!

> if booting from a virtio disk also fails, could you try changing the controller to sata?

Booting from sata disk has the same behavior as virtio disk, so looks like it's really just the network boot/PXE problem.
Comment 5 Michael Prokop 2016-10-25 16:23:51 CEST
Following up with the information from https://bugs.launchpad.net/qemu/+bug/1623276 - if I'm dropping the KVM option (-enable-kvm and -cpu host,+kvm_pv_unhalt,+kvm_pv_eoi as a consequence) it indeed boots fine into iPXE with:

./jessie_chroot/root/qemu/x86_64-softmmu/qemu-system-x86_64 -chardev 'socket,id=qmp,path=/var/run/qemu-server/173.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/173.pid -daemonize -smbios 'type=1,uuid=bb440b56-4b69-44ac-b54b-2a85f7faa1ab' -name foobar -smp '1,sockets=1,cores=1,maxcpus=1' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga std -vnc unix:/var/run/qemu-server/173.vnc,x509,password -m 2048 -k en-us -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -netdev 'type=tap,id=net0,ifname=tap173i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=3A:B4:5F:22:B4:3F,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=200'
Comment 6 Fabian Grünbichler 2016-10-28 10:50:17 CEST
so it seems like rebuilt pxe roms should hit the qemu repositories soon, we will build a test package as soon as they are finalized.
Comment 7 Fabian Grünbichler 2016-11-03 10:40:02 CET
test packages with rebuilt PXE roms for SeaBIOS are available at http://download2.proxmox.com/temp/pve/pve-qemu-kvm_2.7.0-5~test1_amd64.deb and http://download2.proxmox.com/temp/pve/pve-qemu-kvm-dbg_2.7.0-5~test1_amd64.deb

if those work for you, I'll see about rebuilding the UEFI roms as well (unless qemu has released their own rebuilt binaries first)
Comment 8 Adrian 2016-11-03 16:57:46 CET
Downloaded and installed the packages, its basically the same, it stops on 


Booting from ROM...
iPXE (PCI 00:12.0) starting execution...

Hope it helps.
Comment 9 Michael Prokop 2016-11-03 22:34:06 CET
Same for us, pve-qemu-kvm 2.7.0-5~test1 doesn't work for us either.
Comment 10 Fabian Grünbichler 2016-11-09 11:55:23 CET
Updated test packages based on our newest 2.7 package (not yet available in the regular repositories, so probably not for production systems ;)) are available at:

http://download2.proxmox.com/temp/pve/pve-qemu-kvm_2.7.0-7~test1_amd64.deb
http://download2.proxmox.com/temp/pve/pve-qemu-kvm-dbg_2.7.0-7~test1_amd64.deb

those now contain rebuilt EFI PXE roms, which are apparently also used in the SeaBIOS case ;)
Comment 11 Fabian Grünbichler 2016-11-10 11:12:10 CET
one user reported on the forums that the -7 packages fix PXE booting for them. @Adrian, Michael: can either of you confirm? If so, I'll try to push the fixes to our regular packages..
Comment 12 Adrian 2016-11-10 19:01:15 CET
Now with the new package pxe boots normally and installs without problems.
Comment 13 Adrian 2016-11-17 10:47:38 CET
Any idea when this fix will be pushed to the repos ?
Comment 14 Fabian Grünbichler 2016-11-17 11:27:30 CET
(In reply to Adrian from comment #13)
> Any idea when this fix will be pushed to the repos ?

it's on pvetest already - so it will probably move soon to pve-no-subscription (the fixed version is 2.7.0-8)