Bug 3002 - Error when trying to "Move disk" to newly installed NVMe disk: storage migration failed
Summary: Error when trying to "Move disk" to newly installed NVMe disk: storage migrat...
Status: RESOLVED FIXED
Alias: None
Product: pve
Classification: Unclassified
Component: Storage (show other bugs)
Version: 6
Hardware: PC Linux
: --- bug
Assignee: Stefan Reiter
URL:
Depends on:
Blocks:
 
Reported: 2020-09-11 16:42 CEST by me
Modified: 2021-04-30 08:33 CEST (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description me 2020-09-11 16:42:00 CEST
When attempting to move a VM hard disk to a newly installed NVMe storage I get this error logged:

create full clone of drive scsi0 (local-lvm:vm-101-disk-0)
  Logical volume "vm-101-disk-0" created.
transferred: 0 bytes remaining: 42949672960 bytes total: 42949672960 bytes progression: 0.00 %
transferred: 429496729 bytes remaining: 42520176231 bytes total: 42949672960 bytes progression: 1.00 %
transferred: 858993459 bytes remaining: 42090679501 bytes total: 42949672960 bytes progression: 2.00 %
transferred: 1288490188 bytes remaining: 41661182772 bytes total: 42949672960 bytes progression: 3.00 %
transferred: 1717986918 bytes remaining: 41231686042 bytes total: 42949672960 bytes progression: 4.00 %
transferred: 2147483648 bytes remaining: 40802189312 bytes total: 42949672960 bytes progression: 5.00 %
qemu-img: error while writing at byte 2187329024: Device or resource busy
  Logical volume "vm-101-disk-0" successfully removed
TASK ERROR: storage migration failed: copy failed: command '/usr/bin/qemu-img convert -p -n -f raw -O raw /dev/pve/vm-101-disk-0 /dev/new250nvme/vm-101-disk-0' failed: exit code 1
Comment 1 Dominic Jäger 2020-09-14 08:57:44 CEST
Could you pleast post the following?
> cat /etc/pve/storage.cfg
> lsblk
Comment 2 me 2020-09-14 10:52:49 CEST
Thanks for your response. Here's the info you requested:

root@h110m:~# cat /etc/pve/storage.cfg
dir: local
        path /var/lib/vz
        content backup,iso,vztmpl

lvmthin: local-lvm
        thinpool data
        vgname pve
        content images,rootdir

nfs: remote
        export /mnt/mypool/h110m
        path /mnt/pve/remote
        server 192.168.1.5
        content vztmpl,iso,backup
        maxfiles 4

lvm: local-480-ssd
        vgname new480ssd
        content images,rootdir
        shared 0

lvm: local-250-nvme
        vgname new250nvme
        content rootdir,images
        shared 0

root@h110m:~# lsblk
NAME                           MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                              8:0    0 111.8G  0 disk
├─sda1                           8:1    0  1007K  0 part
├─sda2                           8:2    0   512M  0 part /boot/efi
└─sda3                           8:3    0 111.3G  0 part
  ├─pve-swap                   253:1    0     8G  0 lvm  [SWAP]
  ├─pve-root                   253:2    0  27.8G  0 lvm  /
  ├─pve-data_tmeta             253:3    0     1G  0 lvm
  │ └─pve-data-tpool           253:5    0  59.7G  0 lvm
  │   ├─pve-data               253:6    0  59.7G  0 lvm
  │   ├─pve-vm--100--disk--0   253:7    0     8G  0 lvm
  │   ├─pve-vm--101--disk--0   253:8    0    40G  0 lvm
  │   └─pve-vm--102--disk--0   253:9    0     8G  0 lvm
  └─pve-data_tdata             253:4    0  59.7G  0 lvm
    └─pve-data-tpool           253:5    0  59.7G  0 lvm
      ├─pve-data               253:6    0  59.7G  0 lvm
      ├─pve-vm--100--disk--0   253:7    0     8G  0 lvm
      ├─pve-vm--101--disk--0   253:8    0    40G  0 lvm
      └─pve-vm--102--disk--0   253:9    0     8G  0 lvm
sdb                              8:16   0 447.1G  0 disk
└─sdb1                           8:17   0 447.1G  0 part
  └─new480ssd-vm--101--disk--0 253:0    0   444G  0 lvm
nvme0n1                        259:0    0 232.9G  0 disk
└─nvme0n1p1                    259:1    0 232.9G  0 part
Comment 3 Richard Schütz 2020-09-14 11:25:02 CEST
We observe the same kind of error after upgrading from pve-qemu-kvm 5.0.0-13 to 5.1.0-1 in a test environment. Stracing qemu-img shows fallocate(10, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2147483136, 1049088) = -1 EBUSY (Device or resource busy) as the culprit (fd 10 is the target LV). Earlier calls to fallocate() succeed.
Comment 4 Stefan Reiter 2020-09-14 14:30:01 CEST
Thanks for the detailed reports, I can reproduce the issue locally. This appears to be a regression in QEMU. A bit baffling, but I'm working on triaging the exact issue and finding a fix.
Comment 5 Richard Schütz 2020-09-15 08:45:51 CEST
I can confirm that the workaround from pve-qemu-kvm 5.1.0-2, which reverts to pre-zeroing, allows moving and cloning of disk images again.
Comment 6 Thomas Lamprecht 2020-09-15 16:14:40 CEST
Thanks for your feedback!

The current work-around is not 100% ideal, so I'll keep this report open until we find the real culprit.

Related active thread on the qemu-devel mailing list:
https://lists.gnu.org/archive/html/qemu-devel/2020-09/msg04854.html
Comment 7 Thomas Lamprecht 2021-01-19 11:31:03 CET
We got pointed out[0] a potential "real" fix for this[1] which would be worth to check out with the next build (just noting for to have it on record).

[0]: https://lists.nongnu.org/archive/html/qemu-block/2021-01/msg00112.html
[1]: https://lists.nongnu.org/archive/html/qemu-block/2020-11/msg00358.html
Comment 8 Thomas Lamprecht 2021-04-30 08:33:17 CEST
Taking silence regarding feedback as confirmation that this is resolved with current QEMU (>= 5.2-5)