Bug 1368 - KVM started unlimited number of threads on IO-Load (VM-sided)
Summary: KVM started unlimited number of threads on IO-Load (VM-sided)
Status: RESOLVED INVALID
Alias: None
Product: pve
Classification: Unclassified
Component: Qemu (show other bugs)
Version: 4
Hardware: PC Linux
: Normal normal
Assignee: Bugs
URL: https://bugs.launchpad.net/qemu/+bug/...
Depends on:
Blocks:
 
Reported: 2017-04-30 10:54 CEST by Florian Strankowski
Modified: 2017-05-30 11:23 CEST (History)
2 users (show)

See Also:


Attachments
Example Video (85 bytes, text/plain)
2017-04-30 10:55 CEST, Florian Strankowski
Details
QEMU detect_zeroes bug (207.00 KB, image/png)
2017-05-02 15:33 CEST, Florian Strankowski
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Florian Strankowski 2017-04-30 10:54:22 CEST
Once a VM starts to generate IO-load, the host starts an unlimited number of threads compared to usual libvirt+kvm.

https://forum.proxmox.com/threads/test-luks-softraid-10-f2-slow-vm-performance.34395/

I was able to reporduce this on our live production cluster (14 boxes) without LUKS and with hardware-raid instead! So this bug has nothing todo with softraid at all.
Comment 1 Florian Strankowski 2017-04-30 10:55:07 CEST
Created attachment 221 [details]
Example Video
Comment 2 Florian Strankowski 2017-04-30 10:56:46 CEST
This bug also affect Proxmox 5.0 beta and 4.4 enterprise (since we are enterprise customer)
Comment 3 Dominik Csapak 2017-05-02 11:35:50 CEST
could you please post the complete command lines for the vm under proxmox
(you can get this with 'qm showcmd ID') and for the vm started with libvirt?

without comparing these, we cannot know if the behaviour comes from upstream qemu, or from one of our patches (unlikely, since AFAIK we do not touch the general io code of qemu)

also you could test if the commandline from proxmox has the same behaviour when started with upstream qemu?
Comment 4 Fabian Grünbichler 2017-05-02 12:43:17 CEST
also please test using a sane benchmark tool (something like fio), because I suspect that you are triggering some strange interaction with the combination of O_DIRECT, huge block size and /dev/zero
Comment 5 Florian Strankowski 2017-05-02 13:26:19 CEST
Regarding your O_DIRECT flag: This has nothing todo with it. Same situation occurs with reading a file and writing a file like if=/root/testfile of=/root/testfile2. /dev/zero has also nothing todo with it.


Concerning libvirt:

Its just a "virsh start <ID>" but the VM was created like so:

virt-install -n 100 --hvm -r 8192 --cpu=host --vcpus 2 --cpuset=0-1 --os-type=linux --os-variant=generic --disk /dev/raid10f2/100,bus=virtio,sparse=false,cache=none,io=native -w bridge=vmbr0,model=virtio -l http://ftp.de.debian.org/debian/dists/jessie/main/installer-amd64/ --autostart --nographics --console pty,target_type=serial --extra-args 'console=ttyS0,115200n8 serial'


----

Concerning Proxmox:

Created using "VM Create" and here is the startup cmdline:

/usr/bin/kvm -id 102 -chardev 'socket,id=qmp,path=/var/run/qemu-server/102.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/102.pid -daemonize -smbios 'type=1,uuid=49ad3aa7-f988-4041-8ab3-3e19cef1f7fb' -name MONITORING-PRD-001 -smp '4,sockets=1,cores=4,maxcpus=4' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga cirrus -vnc unix:/var/run/qemu-server/102.vnc,x509,password -cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 8192 -object 'memory-backend-ram,id=ram-node0,size=8192M' -numa 'node,nodeid=0,cpus=0-3,memdev=ram-node0' -k de -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:fc4a93ed73c' -drive 'if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/dev/vm/vm-102-disk-1,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap102i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=96:C8:2B:17:F3:9E,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300'
Comment 6 Florian Strankowski 2017-05-02 14:13:14 CEST
Update: It seems that if the source-file contains only zeros (doesnt matter if from /dev/zero or a file containing zeros) the host goes nuts. Using FIO the described behavior does not occur.
Comment 7 Dominik Csapak 2017-05-02 14:22:37 CEST
(In reply to Florian Strankowski from comment #6)
> Update: It seems that if the source-file contains only zeros (doesnt matter
> if from /dev/zero or a file containing zeros) the host goes nuts. Using FIO
> the described behavior does not occur.

yes it seems that if the source contains only zeros and the blocksize is huge the problem occurs. my findings so far (tested on LVM)

source    , bs   , count     ,    O_DIRECT, behaviour

urandom   , bs 1M, count 1024,    O_DIRECT: OK
file      , bs 1M, count 1024,    O_DIRECT: OK
/dev/zero , bs 1M, count 1024,    O_DIRECT: OK
zero file , bs 1M, count 1024,    O_DIRECT: OK
/dev/zero , bs 1G, count    1,    O_DIRECT: NOT OK
zero file , bs 1G, count    1,    O_DIRECT: NOT OK
zero file , bs 1G, count    1, no O_DIRECT: NOT OK
rand file , bs 1G, count    1,    O_DIRECT: OK
rand file , bs 1G, count    1, no O_DIRECT: OK

discard on:

urandom   , bs 1M, count 1024,    O_DIRECT: OK
rand file , bs 1M, count 1024,    O_DIRECT: OK
/dev/zero , bs 1M, count 1024,    O_DIRECT: OK
zero file , bs 1M, count 1024,    O_DIRECT: OK
/dev/zero , bs 1G, count    1,    O_DIRECT: NOT OK
zero file , bs 1G, count    1,    O_DIRECT: NOT OK
zero file , bs 1G, count    1, no O_DIRECT: NOT OK
rand file , bs 1G, count    1,    O_DIRECT: OK
rand file , bs 1G, count    1, no O_DIRECT: OK

detect_zeros off:

urandom   , bs 1M, count 1024,    O_DIRECT: OK
rand file , bs 1M, count 1024,    O_DIRECT: OK
/dev/zero , bs 1M, count 1024,    O_DIRECT: OK
zero file , bs 1M, count 1024,    O_DIRECT: OK
/dev/zero , bs 1G, count    1,    O_DIRECT: OK
zero file , bs 1G, count    1,    O_DIRECT: OK
zero file , bs 1G, count    1, no O_DIRECT: OK
rand file , bs 1G, count    1,    O_DIRECT: OK
rand file , bs 1G, count    1, no O_DIRECT: OK

we will investigate further, but this behavior should not occur under normal workloads, or is there a program which writes a massive amounts of zeros with a huge blocksize?
Comment 8 Florian Strankowski 2017-05-02 14:39:00 CEST
> (In reply to Florian Strankowski from comment #6)
> > Update: It seems that if the source-file contains only zeros (doesnt matter
> > if from /dev/zero or a file containing zeros) the host goes nuts. Using FIO
> > the described behavior does not occur.
> 
> yes it seems that if the source contains only zeros and the blocksize is
> huge the problem occurs. my findings so far (tested on LVM)
>
> [...]
> 
> we will investigate further, but this behavior should not occur under normal
> workloads, or is there a program which writes a massive amounts of zeros
> with a huge blocksize?

Actually there is no scenario which comes to my mind triggering this behavior right off the bat. Since we're in the process of migrating from XEN to Proxmox this came to my attention because i usually test harddrive speeds with the method shown above (dd of=/dev/zero ..). 

I expected this behavior to be related to detect_zeros but did not want to provide false information here, so thank you Dominik for providing this information :-) Further this does (again tested on libvirt+kvm) not occur on stock-jessie (which provides QEMU emulator version 2.1.2).

We definitly have to make sure that this only affects the handling of zero files/zero input, otherwise we cant go into production with this cluster due to the nature that this bug might be a complete showstopper in any production enviroment (if an application writes a large amount of zeros for whatever reason we dont know yet).
Comment 9 Dominik Csapak 2017-05-02 14:53:56 CEST
> We definitly have to make sure that this only affects the handling of zero
> files/zero input, otherwise we cant go into production with this cluster due
> to the nature that this bug might be a complete showstopper in any
> production enviroment (if an application writes a large amount of zeros for
> whatever reason we dont know yet).

again, this seems only to happen when writing large amounts of zeros in HUGE blocks (using a large buffer in memory), i do not think any sane application would do this

using smaller blocks like 1M does not trigger this (for yet unknown reasons)
Comment 10 Florian Strankowski 2017-05-02 15:33:54 CEST
Created attachment 222 [details]
QEMU detect_zeroes bug
Comment 11 Florian Strankowski 2017-05-02 15:34:06 CEST
Thanks for making this clear. I've now bumped up one of my Jessie-hosts to Stretch to keep digging deeper:

Jessie provides Libvirt 1.2.6 which does not yet have detect_zeros implemented. So i went to Stretch which comes with Libvirt 3.0.0 and support for zero-detection.

I have to make a 'little' excuse here. QEMU-Upstream comes with exactly the same bug. Do you guys want me to report it to them?

Attached: Screenshot.
Comment 12 Florian Strankowski 2017-05-02 15:45:26 CEST
Some last thing i've tested (all with O_DIRECT, cache=none and io=native and detect_zeroes=on):

Host: 2CPU (assigned to 0-1)

bs   -    count   -    io-threads

512K -    2048    -    2
1M   -    1024    -    2
2M   -     512    -    4
4M   -     256    -    6
8M   -     128    -    10
16M  -      64    -    18
32M  -      32    -    uncountable
Comment 13 Dominik Csapak 2017-05-02 16:38:10 CEST
(In reply to Florian Strankowski from comment #11)
> Thanks for making this clear. I've now bumped up one of my Jessie-hosts to
> Stretch to keep digging deeper:
> 
> Jessie provides Libvirt 1.2.6 which does not yet have detect_zeros
> implemented. So i went to Stretch which comes with Libvirt 3.0.0 and support
> for zero-detection.
> 
> I have to make a 'little' excuse here. QEMU-Upstream comes with exactly the
> same bug. Do you guys want me to report it to them?
> 
> Attached: Screenshot.

yes, please report the bug upstream
Comment 14 Florian Strankowski 2017-05-02 17:53:32 CEST
Reference for further processing:

https://bugs.launchpad.net/qemu/+bug/1687653
Comment 15 Dominik Csapak 2017-05-05 13:38:44 CEST
since qemu closed the bug as invalid, and i can  (after further tests) confirm that there are a maximum of 64 threads being started (which do only io requests waiting to finish) i would like to also close the bug here, or do you have further comments?