Bug 4507 - Increase NOFILE soft limit of QEMU process
Summary: Increase NOFILE soft limit of QEMU process
Status: PATCH AVAILABLE
Alias: None
Product: pve
Classification: Unclassified
Component: Backend (show other bugs)
Version: 7
Hardware: PC Linux
: --- enhancement
Assignee: Bugs
URL:
Depends on:
Blocks:
 
Reported: 2023-01-30 12:58 CET by Mira Limbeck
Modified: 2024-01-24 18:26 CET (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mira Limbeck 2023-01-30 12:58:15 CET
The combination of lots of Ceph OSDs (150) and multiple VM disks (6) can run into the default soft limit (1024) easily.

I've reproduced this with 115 OSDs and 24 VM disks. I/O is necessary for connections to be established to the OSDs.
I/O was produced by the following command (disks /dev/sdb to /dev/sdy):
for dev in sd{b..y}; do dd if=/dev/urandom of=/dev/$dev bs=1 & done

KRBD is not affected since the connections are made on the host, not inside the QEMU process.
Comment 1 Mira Limbeck 2023-01-30 14:37:43 CET
The customer that ran into this issue has worked around it by increasing the `DefaultLimitNOFILE` in /etc/systemd/system.conf and by increasing the limit in /etc/security/limits.d/ for root.
Comment 2 Friedrich Weber 2023-04-25 12:13:10 CEST
Related Ceph issue: https://tracker.ceph.com/issues/17573

When the QEMU process hits the soft limit, the VM syslog will start reporting hung tasks related to I/O, for example (reproduced by setting an artificial low limit using `prlimit --nofile`):

> Apr 24 10:42:15 freeze-test kernel: INFO: task mkfs.ext4:5244 blocked for more than 120 seconds.
> Apr 24 10:42:15 freeze-test kernel:       Tainted: P           O      5.15.102-1-pve #1
> Apr 24 10:42:15 freeze-test kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Apr 24 10:42:15 freeze-test kernel: task:mkfs.ext4       state:D stack:    0 pid: 5244 ppid:  2021 flags:0x00004000
> Apr 24 10:42:15 freeze-test kernel: Call Trace:
> Apr 24 10:42:15 freeze-test kernel:  <TASK>
> Apr 24 10:42:15 freeze-test kernel:  __schedule+0x34e/0x1740
> Apr 24 10:42:15 freeze-test kernel:  ? blk_mq_sched_insert_requests+0x7a/0xf0
> ...
Comment 3 Fiona Ebner 2023-08-07 11:25:24 CEST
Another instance not related to Ceph, where having many vNICS with many queues hits the limit: https://forum.proxmox.com/threads/131601/post-578366
Comment 4 Alexandre Derumier 2023-12-10 15:59:22 CET
Hi,
I have trigger this bug in production this week, migrating a big customer vm with 5~6 disk (librbd) to a new bigger ceph cluster (with 100 osd).

This has been super painfull to debug, with random timeout storage access when the limit was reached after some times



I just send a small patch to increase value for qemu process at vm start

https://lists.proxmox.com/pipermail/pve-devel/2023-December/060982.html
Comment 5 Fiona Ebner 2023-12-12 14:50:23 CET
Patch using a different approach, i.e. increasing the limit from QEMU itself: https://lists.proxmox.com/pipermail/pve-devel/2023-December/061043.html
Comment 6 Alexandre Derumier 2023-12-12 17:40:19 CET
nice !

I'll try to test it tomorrow.
Comment 7 Adam 2024-01-24 18:26:45 CET
Looks like we are hitting a similar issue.  Hoping this patch gets merged in sometime soon.

Jan 23 15:29:39 QEMU[284972]: kvm: virtio_bus_set_host_notifier: unable to init event notifier: Too many open files (-24)
Jan 23 15:29:39 QEMU[284972]: virtio-blk failed to set host notifier (-24)
Jan 23 15:29:39 QEMU[284972]: kvm: virtio_bus_start_ioeventfd: failed. Fallback to userspace (slower).
Jan 23 15:31:34 pmxcfs[2985]: [dcdb] notice: data verification successful

VM crashes hard when we hit the above.