User Tools

Site Tools


public:trouble:mapping

Faulty mapping between iSCSI and multipath devices

Error description

In another section the device depenencies for KVM host 'kvm55' were already displayed:

root@kvm55:~# dmsetup ls --tree
350002ac21d622071 (252:4)
 ├─ (8:32)
 └─ (8:16)
vg-images (252:0)
 └─ (8:4)
vg3_nas-lv_nas5501 (252:5)
 └─350002ac21d632071 (252:2)
    ├─ (8:64)
    └─ (8:48)
vg3_dbserver-lv_dbserver5501 (252:3)
 └─ (8:32)
vg-lv_dbslave5501 (252:1)
 └─ (8:4) 

The device depencies show an anomaly: The device 'vg3_dbserver-lv_dbserver5501' should be bound to the multipath device '350002ac21d622071'. Instead it is bound only to the hard disk (8:32) of the iSCSI device. According to this source the anomaly influences the stability of the device: When the iSCSI path fails on which the hard disk (8:32) is bound, the LVM device 'vg3_dbserver-lv_dbserver5501' will also fail - no matter whether the 2nd iSCSI path is still connected or not. Thus the reliability regarding both iSCSI paths is not working. Data loss due to a faulty mapping was indeed observed during administration.

Failure cause

The error is caused by misconfiguration of LVM. Devices on which LVM devices must not bind have to be blacklisted in the LVM onfiguration file 'lvm.conf'. See the solution description below for detail.

Solution

The problem can be solved in two steps.

First step: Adapt the LVM configuration

A filter in the LVM configuration file '/etc/lvm/lvm.conf' can prevent the binding of LVM devices to single hard disks. Only the most important lines of the configuration file are shown:

root@kvm58:/etc/lvm# cat lvm.conf
# This is an adapted configuration file for the LVM2 system.
# It contains the default settings that would be used if there was no
# plus special filter for proper connection of lvm to multipath
# distributed by puppet - do not edit locally
devices {
...
    # 1st filter: iscsi devices, 2nd,3rd filter: vg on local disks/RAID0
    filter = [ 'a/mapper/', 'a|/dev/sda4$|', 'a|/dev/md|', 'r/.*/' ]
...
}

The decisive line of the configuration file is the one beginning with 'filter ='. The entry '/dev/sda4$' prevents the direct binding of LVM devices to SCSI disks. For details to the filter syntax see the man page to 'lvm.conf'.

The filter has to be in place, before any LVM devices become active. The easiest way to achive this is to reboot the host after reconfiguration. When this is not possible, follow the description of the second step.

Second step: Rescan and reconnect the LVM devices

When the filter is in place, these steps have to be followed to properly reconnect the LVM devices:

  1. Deactivate all processes accessing the LVM devices
  2. Set all LVM partitions and volume groups to state 'inactive'
  3. Rescan all LVM devices
  4. Set all LVM partitions and volume groups back to state 'active'
  5. Reactivate all processes accessing the LVM devices

The steps will be now described in more detail:

1. Deactivate processes

Details depend from the running processes. On a KVM host at least all vhosts have to be stopped that access LVM partitions. Often the process libvirt-bin also has to be stopped:

root@kvm55:~# virsh list
 Id    Name                           State
----------------------------------------------------
 1     nas                            running
 2     webserver01                    running
 3     admin                          running
 4     vpn                            running
root@kvm55:~# virsh shutdown nas
root@kvm55:~# virsh shutdown webserver01
root@kvm55:~# virsh shutdown admin
root@kvm55:~# virsh shutdown vpn
root@kvm55:~# /etc/init.d/libvirt-bin stop

The output of the commands are not displayed, because the commands cannot be tested on a running host without really stopping all vhosts.

If possible prefer the proper shutdown of the vhosts from their command line to the shutdown with the command virsh shutdown from the command line of the KVM host.

2. Deactivate LVM devices

It is sufficient to deactivate the volume groups. The logical partitions will be deactivated autmatically.

root@kvm55:~# vgchange -an

The output of the commands again are not displayed, because the commands cannot be tested on a running host.

3. Scan LVM devices

The scanning is done per LVM device type (physical volume, volume group, logical volume):

root@kvm55:~# pvscan
root@kvm55:~# vgscan
root@kvm55:~# lvscan

To the output of the commands - see above.

4. Activate LVM devices

At first the volume groups get activated. Afterwards the logical partitions can be activated:

root@kvm55:~# vgchange -ay
root@kvm55:~# lvchange –ay /dev/vg3_dbserver/lv_dbserver5501
root@kvm55:~# lvchange –ay /dev/vg/images
root@kvm55:~# lvchange –ay /dev/vg/lv_dbslave5501

To the output of the commands - see above. The activation of the logical volumes is not necessary. They will be activated during the next access anyway.

5. Activate processes

All processes that access LVM devices con now be started again. For a KVM host the service 'libvirt-bin' can be restarted as well as the virtual hosts:

root@kvm55:~# /etc/init.d/libvirt-bin start
root@kvm55:~# virsh start nas
root@kvm55:~# virsh start webserver01
root@kvm55:~# virsh start admin
root@kvm55:~# virsh start vpn

Possible mutual dependencies among the vhosts are not considered in the depicted order of the restart of the vhosts. To the output of the commands - see above.

As a result logical partitions on iSCSI devices should be properly bound to multipath devices. This means, the partitions are now bound to both iSCSI paths:

root@kvm55:~# lsscsi --device
[4:2:0:0]    disk    LSI      MR9240-4i        2.13  /dev/sda[8:0]
[5:0:0:0]    disk    3PARdata VV               3112  /dev/sdc[8:32]
[5:0:0:1]    disk    3PARdata VV               3112  /dev/sde[8:64]
[5:0:0:254]  enclosu 3PARdata SES              3112  -       
[6:0:0:0]    disk    3PARdata VV               3112  /dev/sdb[8:16]
[6:0:0:1]    disk    3PARdata VV               3112  /dev/sdd[8:48]
[6:0:0:254]  enclosu 3PARdata SES              3112  -

root@kvm55:~# dmsetup ls --tree
vg-images (252:0)
 └─ (8:4)
vg3_nas-lv_nas5501 (252:5)
 └─350002ac21d632071 (252:2)
    ├─ (8:64)
    └─ (8:48)
vg3_dbserver-lv_dbserver5501 (252:3)
 └─350002ac21d622071 (252:4)
    ├─ (8:32)
    └─ (8:16)
vg-lv_dbslave5501 (252:1)
 └─ (8:4)
 

previous chapter | contents | next chapter

public/trouble/mapping.txt · Last modified: 2015/07/12 12:27 by wiki.tk