There exists a bug with Chelsio NICs that causes the following error: kvm: -device vfio-pci,host=0000:83:00.7,id=hostpci1.7,bus=pci.0,addr=0x11.7: vfio 0000:83:00.7: hardware reports invalid configuration, MSIX PBA outside of specified BAR This bug was fixed in later versions of Qemu, and is caused by vendor misconfigurations of their MSIX PBA. I know a catchall fix was implemented in recent versions of Qemu, as well as patches applied to hotfix it in earlier versions. I encountered this bug using a Chelsio T4 device, and I believe the patches are for T5 and newer. Here is an email chain that has a patch for this situation: https://patchwork.ozlabs.org/project/qemu-devel/patch/1435777545-32152-1-git-send-email-glaupre@chelsio.com/ I'd appreciate it if anyone could tell me what the best course of action to fix it on my system would be. I assume the solution is to either build Qemu with this patch applied, or update the version of Qemu in my Proxmox installation, but I do not know which is the better route to go.
The patch you mention is already included in our QEMU builds, but as you correctly said it's only implemented for T5 devices. You'd have to go about patching your QEMU yourself if you want this to work, or message the upstream QEMU maintainers to include a fix (or even better: provide them with the fix :) ). In any case, a full 'lspci -nnkvv' output for your device (and any virtual functions thereof) would help. I've attached a QEMU patch for you to try, it has "0xNNNN" instead of the actual device ID of your T4, so change that before applying the patch. No liability of this working at all, here be dragons and if it breaks everything you're on your own, but I believe it's simple enough to work, provided the hardware quirk is the same on T4 as on T5. You can find our QEMU downstream at https://git.proxmox.com/?p=pve-qemu.git;a=summary, if you put it in debian/patches/pve and mention the file in debian/patches/series you should be able to build a pve-qemu against it. Check out our developer documentation (https://pve.proxmox.com/wiki/Developer_Documentation) as well.
Created attachment 614 [details] experimental T4 patch, change 0xNNNN to device id
Created attachment 615 [details] Full output of lspci -nnkvv
Created attachment 616 [details] Output of lspci -nnkvv with Chelsio devices only
Thank you so much for your reply! I have attached the lspci you requested. I think the most recent version of qemu actually has a fix for all devices that give this error, as there were reports of some HBA cards also causing it. I would like to try applying your patch, however for several days now my builds of pve-qemu have been getting stopped by a missing dependency called libproxmox-backup-qemu0-dev. I have seen other people on the forums mention that it exists in the repository, but every time I git clone pve-qemu.git and attempt to build I get the same error. I thought it would be taken care of by mk-build-deps, but even that gets stopped by the same missing dependency. Apt install isn't able to find it either. Would you be able to tell me where I can find this dependency?
You need to configure our PBS repository to get the library: # echo "deb http://download.proxmox.com/debian/pbs buster pbstest" >> /etc/apt/sources.list.d/pbs.list # apt update # apt install libproxmox-backup-qemu0-dev > I think the most recent version of qemu actually has a fix for all devices that give this error, as there were reports of some HBA cards also causing it. Hm, not sure about that, the patch I added is against our 5.1 build from the repo. That said, 5.1 is newer than what's currently rolled out, so you can also try just building the repo version without any patches and see if that fixes it. That would be nice, since 5.1 will be rolled out soon-ish anyway :)
I managed to get the package installed. Apparently my sources.list was set to jessie instead of buster. Fixing this allowed me to download that package, however make still fails, but with new errors. Progress! I'll attach the errors, but I understand if helping me fix this is outside of what you're willing to help me with. As a side note, the machine that I am configuring this on is not deployed, does not have a deadline for deployment, and has no data stored on it at all. As such, I'm willing to make just about any changes to it that you think might help, or that you may want to test.
Created attachment 618 [details] New errors
Hm, it appears your linker isn't finding the library. Try installing the 'libproxmox-backup-qemu0' package as well, that should have been a dependency of the -dev package though... Make sure /usr/lib/libproxmox_backup_qemu.so.0 exists. If you use "make deb" it also might be necessary to run the build as root.
I ran into problems building it with the patch applied. I know how to correct those errors, but I decided to check if I could build without the patches and found that the build fails for other reasons, too. I have attached the new errors. I have attached the new output. Just so that I understand it correctly, does the value that PCI_VENDOR_ID_CHELSIO stores equal 1425? Since I have two of the same Chelsio NIC installed, would that mean that I have to insert both 8100 and 8300 as my device IDs for my two cards in the patch, and have it evaluate whether they are equal to the value at vdev->device_id for the if statement the same way you did? Or should I just be bale to do it with a single device ID?
Created attachment 620 [details] New errors given by make after installing libproxmox-backup-qemu0
There's no relevant error in the output you posted? You should have two files 'pve-qemu-kvm_5.1.0-1_amd64.deb' and 'pve-qemu-kvm-dbg_5.1.0-1_amd64.deb' in the repository root now, which you can install with 'apt install ./*.deb' or similar. If not, you might need a 'make clean' before the 'make deb'. > Just so that I understand it correctly, does the value that > PCI_VENDOR_ID_CHELSIO stores equal 1425? Since I have two of the same > Chelsio NIC installed, would that mean that I have to insert both 8100 and > 8300 as my device IDs for my two cards in the patch, and have it evaluate > whether they are equal to the value at vdev->device_id for the if statement > the same way you did? Or should I just be bale to do it with a single device > ID? # rg "PCI_VENDOR_ID_CHELSIO" include/hw/pci/pci_ids.h 219:#define PCI_VENDOR_ID_CHELSIO 0x1425 Yes. And also yes, if you need two different device IDs you need to add more clauses to the 'or', e.g.: ((vdev->device_id & 0xff00) == 0x5800 || (vdev->device_id & 0xff00) == 0x8100) || (vdev->device_id & 0xff00) == 0x8300)) {
Yes, you were right, I thought the warnings being set to evaluate as errors would stop the build, but I completely missed where it said it built the .deb packages. I got it built and installed this time, but I still get the same error when I attempt to boot a vm with the Chelsio cards. I have started a bug report with the upstream qemu devs.
https://bugs.launchpad.net/qemu/+bug/1894869 Here's the discussion with the upstream devs. The problem ended up being on Chelsio's part as either the .7 funciton fo these cards should not have even been exposed to the OS in the first place, or SR-IOV is necessary to actually correct the parameters of this function. Unfortunately, it looks like SR-IOV is no longer possible to enable on these cards. Thank you for your help.