Table of Contents
iSCSI hard disk errors
Error description
When using kernel versions of v3.19 and above multipath connection losses and scsi disk errors occur and are reported in log files. Affected virtual hosts can become inaccessible and virtual disks stored on iscsi devices can become corrupted.
As a result LVM devices cannot be properly read and communication with the libvirt-bin process e.g. from the command line or from the Virtal Machine Manager leads to time-outs. In the end the affected KVM host has to be rebooted thus loosing all running virtual machines.
Failure cause
Most of the iSCSI devices report maximum block sizes (aka 'maximum transfer length') they can handle.
Hint: For Ubuntu the command sg_vpd
is part of the package 'sg3-utils'.
root@kvm55:~# sg_vpd -p bl /dev/sdd Block limits VPD page (SBC): Write same no zero (WSNZ): 1 Maximum compare and write length: 1 blocks Optimal transfer length granularity: 1 blocks Maximum transfer length: 4294967295 blocks Optimal transfer length: 4294967295 blocks Maximum prefetch length: 0 blocks Maximum unmap LBA count: 8388607 Maximum unmap block descriptor count: 1 Optimal unmap granularity: 16383 Unmap granularity alignment valid: 0 Unmap granularity alignment: 0 Maximum write same length: 0xffffffff blocks
Some iSCSI devices report no maximum block size. Examples are known for Dell MD3200i (own experience), Sinology DS-1813+ or QNAP TS-669. Further details for Sinology devices can also be found here
In such a case kernels before v3.19 use a maximum block size of 512. Kernels of version v3.19 and above use a default block size of 32767 when no maximum block size is reported by the iSCSI device. Some iSCSI devices cannot handle such block sizes and generate the observed multipath and disk errors.
Solution
The old block size has to be set for all affected devices. This can be done like so:
echo 512 >/sys/block/sdx/queue/hw_sector_size
In the example above 'sdx' is the name of an exisiting iSCSI disk device. All existing block device names (like e.g. 'dm-4') can be used instead.
Please note:
- According to tests in production environments, the old block size has to be set for all affected block devices - also known as DM devices:
- iSCSI disk devices (can be found by the command
lsscsi
) - LVM physical devices on iSCSI devices (found by
pvs
anddmsetup ls
) - LVM logical volumes on iSCSI devices (found by
lvs
anddmsetup ls
)
- The block sizes have to be set before the affected device is used.
- Affected block devices can be created while a KVM host is running. After creation and before use of the device its block size has to be fixed.