The combination of lots of Ceph OSDs (150) and multiple VM disks (6) can run into the default soft limit (1024) easily. I've reproduced this with 115 OSDs and 24 VM disks. I/O is necessary for connections to be established to the OSDs. I/O was produced by the following command (disks /dev/sdb to /dev/sdy): for dev in sd{b..y}; do dd if=/dev/urandom of=/dev/$dev bs=1 & done KRBD is not affected since the connections are made on the host, not inside the QEMU process.
The customer that ran into this issue has worked around it by increasing the `DefaultLimitNOFILE` in /etc/systemd/system.conf and by increasing the limit in /etc/security/limits.d/ for root.
Related Ceph issue: https://tracker.ceph.com/issues/17573 When the QEMU process hits the soft limit, the VM syslog will start reporting hung tasks related to I/O, for example (reproduced by setting an artificial low limit using `prlimit --nofile`): > Apr 24 10:42:15 freeze-test kernel: INFO: task mkfs.ext4:5244 blocked for more than 120 seconds. > Apr 24 10:42:15 freeze-test kernel: Tainted: P O 5.15.102-1-pve #1 > Apr 24 10:42:15 freeze-test kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Apr 24 10:42:15 freeze-test kernel: task:mkfs.ext4 state:D stack: 0 pid: 5244 ppid: 2021 flags:0x00004000 > Apr 24 10:42:15 freeze-test kernel: Call Trace: > Apr 24 10:42:15 freeze-test kernel: <TASK> > Apr 24 10:42:15 freeze-test kernel: __schedule+0x34e/0x1740 > Apr 24 10:42:15 freeze-test kernel: ? blk_mq_sched_insert_requests+0x7a/0xf0 > ...
Another instance not related to Ceph, where having many vNICS with many queues hits the limit: https://forum.proxmox.com/threads/131601/post-578366
Hi, I have trigger this bug in production this week, migrating a big customer vm with 5~6 disk (librbd) to a new bigger ceph cluster (with 100 osd). This has been super painfull to debug, with random timeout storage access when the limit was reached after some times I just send a small patch to increase value for qemu process at vm start https://lists.proxmox.com/pipermail/pve-devel/2023-December/060982.html
Patch using a different approach, i.e. increasing the limit from QEMU itself: https://lists.proxmox.com/pipermail/pve-devel/2023-December/061043.html
nice ! I'll try to test it tomorrow.
Looks like we are hitting a similar issue. Hoping this patch gets merged in sometime soon. Jan 23 15:29:39 QEMU[284972]: kvm: virtio_bus_set_host_notifier: unable to init event notifier: Too many open files (-24) Jan 23 15:29:39 QEMU[284972]: virtio-blk failed to set host notifier (-24) Jan 23 15:29:39 QEMU[284972]: kvm: virtio_bus_start_ioeventfd: failed. Fallback to userspace (slower). Jan 23 15:31:34 pmxcfs[2985]: [dcdb] notice: data verification successful VM crashes hard when we hit the above.