Once a VM starts to generate IO-load, the host starts an unlimited number of threads compared to usual libvirt+kvm. https://forum.proxmox.com/threads/test-luks-softraid-10-f2-slow-vm-performance.34395/ I was able to reporduce this on our live production cluster (14 boxes) without LUKS and with hardware-raid instead! So this bug has nothing todo with softraid at all.
Created attachment 221 [details] Example Video
This bug also affect Proxmox 5.0 beta and 4.4 enterprise (since we are enterprise customer)
could you please post the complete command lines for the vm under proxmox (you can get this with 'qm showcmd ID') and for the vm started with libvirt? without comparing these, we cannot know if the behaviour comes from upstream qemu, or from one of our patches (unlikely, since AFAIK we do not touch the general io code of qemu) also you could test if the commandline from proxmox has the same behaviour when started with upstream qemu?
also please test using a sane benchmark tool (something like fio), because I suspect that you are triggering some strange interaction with the combination of O_DIRECT, huge block size and /dev/zero
Regarding your O_DIRECT flag: This has nothing todo with it. Same situation occurs with reading a file and writing a file like if=/root/testfile of=/root/testfile2. /dev/zero has also nothing todo with it. Concerning libvirt: Its just a "virsh start <ID>" but the VM was created like so: virt-install -n 100 --hvm -r 8192 --cpu=host --vcpus 2 --cpuset=0-1 --os-type=linux --os-variant=generic --disk /dev/raid10f2/100,bus=virtio,sparse=false,cache=none,io=native -w bridge=vmbr0,model=virtio -l http://ftp.de.debian.org/debian/dists/jessie/main/installer-amd64/ --autostart --nographics --console pty,target_type=serial --extra-args 'console=ttyS0,115200n8 serial' ---- Concerning Proxmox: Created using "VM Create" and here is the startup cmdline: /usr/bin/kvm -id 102 -chardev 'socket,id=qmp,path=/var/run/qemu-server/102.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/102.pid -daemonize -smbios 'type=1,uuid=49ad3aa7-f988-4041-8ab3-3e19cef1f7fb' -name MONITORING-PRD-001 -smp '4,sockets=1,cores=4,maxcpus=4' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga cirrus -vnc unix:/var/run/qemu-server/102.vnc,x509,password -cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 8192 -object 'memory-backend-ram,id=ram-node0,size=8192M' -numa 'node,nodeid=0,cpus=0-3,memdev=ram-node0' -k de -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:fc4a93ed73c' -drive 'if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/dev/vm/vm-102-disk-1,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap102i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=96:C8:2B:17:F3:9E,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300'
Update: It seems that if the source-file contains only zeros (doesnt matter if from /dev/zero or a file containing zeros) the host goes nuts. Using FIO the described behavior does not occur.
(In reply to Florian Strankowski from comment #6) > Update: It seems that if the source-file contains only zeros (doesnt matter > if from /dev/zero or a file containing zeros) the host goes nuts. Using FIO > the described behavior does not occur. yes it seems that if the source contains only zeros and the blocksize is huge the problem occurs. my findings so far (tested on LVM) source , bs , count , O_DIRECT, behaviour urandom , bs 1M, count 1024, O_DIRECT: OK file , bs 1M, count 1024, O_DIRECT: OK /dev/zero , bs 1M, count 1024, O_DIRECT: OK zero file , bs 1M, count 1024, O_DIRECT: OK /dev/zero , bs 1G, count 1, O_DIRECT: NOT OK zero file , bs 1G, count 1, O_DIRECT: NOT OK zero file , bs 1G, count 1, no O_DIRECT: NOT OK rand file , bs 1G, count 1, O_DIRECT: OK rand file , bs 1G, count 1, no O_DIRECT: OK discard on: urandom , bs 1M, count 1024, O_DIRECT: OK rand file , bs 1M, count 1024, O_DIRECT: OK /dev/zero , bs 1M, count 1024, O_DIRECT: OK zero file , bs 1M, count 1024, O_DIRECT: OK /dev/zero , bs 1G, count 1, O_DIRECT: NOT OK zero file , bs 1G, count 1, O_DIRECT: NOT OK zero file , bs 1G, count 1, no O_DIRECT: NOT OK rand file , bs 1G, count 1, O_DIRECT: OK rand file , bs 1G, count 1, no O_DIRECT: OK detect_zeros off: urandom , bs 1M, count 1024, O_DIRECT: OK rand file , bs 1M, count 1024, O_DIRECT: OK /dev/zero , bs 1M, count 1024, O_DIRECT: OK zero file , bs 1M, count 1024, O_DIRECT: OK /dev/zero , bs 1G, count 1, O_DIRECT: OK zero file , bs 1G, count 1, O_DIRECT: OK zero file , bs 1G, count 1, no O_DIRECT: OK rand file , bs 1G, count 1, O_DIRECT: OK rand file , bs 1G, count 1, no O_DIRECT: OK we will investigate further, but this behavior should not occur under normal workloads, or is there a program which writes a massive amounts of zeros with a huge blocksize?
> (In reply to Florian Strankowski from comment #6) > > Update: It seems that if the source-file contains only zeros (doesnt matter > > if from /dev/zero or a file containing zeros) the host goes nuts. Using FIO > > the described behavior does not occur. > > yes it seems that if the source contains only zeros and the blocksize is > huge the problem occurs. my findings so far (tested on LVM) > > [...] > > we will investigate further, but this behavior should not occur under normal > workloads, or is there a program which writes a massive amounts of zeros > with a huge blocksize? Actually there is no scenario which comes to my mind triggering this behavior right off the bat. Since we're in the process of migrating from XEN to Proxmox this came to my attention because i usually test harddrive speeds with the method shown above (dd of=/dev/zero ..). I expected this behavior to be related to detect_zeros but did not want to provide false information here, so thank you Dominik for providing this information :-) Further this does (again tested on libvirt+kvm) not occur on stock-jessie (which provides QEMU emulator version 2.1.2). We definitly have to make sure that this only affects the handling of zero files/zero input, otherwise we cant go into production with this cluster due to the nature that this bug might be a complete showstopper in any production enviroment (if an application writes a large amount of zeros for whatever reason we dont know yet).
> We definitly have to make sure that this only affects the handling of zero > files/zero input, otherwise we cant go into production with this cluster due > to the nature that this bug might be a complete showstopper in any > production enviroment (if an application writes a large amount of zeros for > whatever reason we dont know yet). again, this seems only to happen when writing large amounts of zeros in HUGE blocks (using a large buffer in memory), i do not think any sane application would do this using smaller blocks like 1M does not trigger this (for yet unknown reasons)
Created attachment 222 [details] QEMU detect_zeroes bug
Thanks for making this clear. I've now bumped up one of my Jessie-hosts to Stretch to keep digging deeper: Jessie provides Libvirt 1.2.6 which does not yet have detect_zeros implemented. So i went to Stretch which comes with Libvirt 3.0.0 and support for zero-detection. I have to make a 'little' excuse here. QEMU-Upstream comes with exactly the same bug. Do you guys want me to report it to them? Attached: Screenshot.
Some last thing i've tested (all with O_DIRECT, cache=none and io=native and detect_zeroes=on): Host: 2CPU (assigned to 0-1) bs - count - io-threads 512K - 2048 - 2 1M - 1024 - 2 2M - 512 - 4 4M - 256 - 6 8M - 128 - 10 16M - 64 - 18 32M - 32 - uncountable
(In reply to Florian Strankowski from comment #11) > Thanks for making this clear. I've now bumped up one of my Jessie-hosts to > Stretch to keep digging deeper: > > Jessie provides Libvirt 1.2.6 which does not yet have detect_zeros > implemented. So i went to Stretch which comes with Libvirt 3.0.0 and support > for zero-detection. > > I have to make a 'little' excuse here. QEMU-Upstream comes with exactly the > same bug. Do you guys want me to report it to them? > > Attached: Screenshot. yes, please report the bug upstream
Reference for further processing: https://bugs.launchpad.net/qemu/+bug/1687653
since qemu closed the bug as invalid, and i can (after further tests) confirm that there are a maximum of 64 threads being started (which do only io requests waiting to finish) i would like to also close the bug here, or do you have further comments?