July 2013 – BAFM

As the title pretty much tells, I’ve been working on fixing the Root-Disk-Multipathing feature of our XenServer installations. Our XenServer boot from a HA-enabled NetApp controller, however we recently noticed that during a controller fail-over some, if not all, paths would go offline and never come back. If you do a cf takeover and cf giveback in short succession, you’ll end up with a XenServer host that is unusable, as the Root-Disk would be pretty much non-responsive.

Guessing from that, there don’t seem to be that many people using XenServer with Boot-from-SAN. Otherwise Citrix/NetApp would have fixed that by now…. Anyhow, I went around digging in our XenServer’s. What I already did, was adjust the /etc/multipath.conf according to a bug report (or TR-3373). For completeness sake I’ll list it here:

# Multipathing configuration for XenServer on NetApp ALUA
# enabled storage.
# TR-3732, revision 5

defaults {
        user_friendly_names no
        queue_without_daenon no
        flush_on_last_del yes
}

## some vendor specific modifications
devices {
        device {
                vendor "NETAPP"
                product "LUN"
                path_grouping_policy group_by_prio
                features "1 queue_if_no_path"
                getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
                prio_callout "/sbin/mpath_prio_alua /dev/%n"
                path_checker directio
                failback immediate
                hardware handler "0"
                rr_weight uniform
                rr_min_io 128
        }
}

# Multipathing configuration for XenServer on NetApp ALUA

# enabled storage.

# TR-3732, revision 5

defaults {

user_friendly_names no

queue_without_daenon no

flush_on_last_del yes

}

## some vendor specific modifications

devices {

device {

vendor "NETAPP"

product "LUN"

path_grouping_policy group_by_prio

features "1 queue_if_no_path"

getuid_callout "/sbin/scsi_id -g -u -s /block/%n"

prio_callout "/sbin/mpath_prio_alua /dev/%n"

path_checker directio

failback immediate

hardware handler "0"

rr_weight uniform

rr_min_io 128

}

And as it turns out, this is the reason why we’re having such difficulties with the Multipathing. The information in TR-3373 is a bunch of BS (no, not everything but a single path is wrong, the getuid_callout) and thus the whole concept of Multipathing, Failover and High-Availibility (yeah, I know – if you want HA, don’t use XenServer :P) is gone.

Well, I’ve been tinkering with NGINX for a while at home, up till now I had a somewhat working reverse proxy setup (to access my stuff, when I’m not at home or away).

What didn’t work so far was the DSM web interface. Basically, because the interface is using absolute paths in some CSS/JS includes, which fuck up the whole interface.

<link rel="stylesheet" type="text/css" href="/scripts/ext-3/resources/css/ext-all.css?v=3211" />
<link rel="stylesheet" type="text/css" href="/scripts/ext-3/resources/css/xtheme-gray.css?v=3211" />
<link rel="stylesheet" type="text/css" href="/scripts/ext-3/ux/ux-all.css?v=3211" />
<link rel="stylesheet" type="text/css" href="resources/css/desktop.css?v=3211" />
<link rel="stylesheet" type="text/css" href="resources/css/flexcrollstyles.css?v=3211" />
...
<script type="text/javascript" src="/scripts/uistrings.cgi?lang=enu&v=3211"></script>
<script type="text/javascript" src="/webfm/webUI/uistrings.cgi?lang=enu&v=3211"></script>
<script type="text/javascript" src="uistrings.cgi?lang=enu&v=3211"></script>
<script type="text/javascript" src="/scripts/prototype-1.6.1/prototype.js?v=3211"></script>
<script type="text/javascript" src="/scripts/ext-3/adapter/ext/ext-base.js?v=3211"></script>
<script type="text/javascript" src="/scripts/ext-3/ext-all.js?v=3211"></script>
<script type="text/javascript" src="/scripts/ext-3/ux/ux-all.js?v=3211"></script>
<script type="text/javascript" src="/scripts/scrollbar/flexcroll.js?v=3211"></script>

...

After some googling and looking through the NGINX documentation I thought “Why don’t I create a vHost for each application that is being served by the reverse proxy?”.

And after looking further into the documentation, out came this simple reverse proxy statement:

server {
        listen 80;
        server_name syno.heimdaheim.de;

        location / {
                proxy_pass http://172.31.76.50:5000/;
                proxy_next_upstream error timeout invalid_header http_500 http_502 http_503 http_504;
                proxy_redirect http://172.31.76.50:5000/ http://$host:$server_port/;
                proxy_buffering off;
                auth_basic "Restricted";
                auth_basic_user_file /etc/nginx/htpasswd;
        }
}

server {

listen 80;

server_name syno.heimdaheim.de;

location / {

proxy_pass http://172.31.76.50:5000/;

proxy_next_upstream error timeout invalid_header http_500 http_502 http_503 http_504;

proxy_redirect http://172.31.76.50:5000/ http://$host:$server_port/;

proxy_buffering off;

auth_basic "Restricted";

auth_basic_user_file /etc/nginx/htpasswd;

}

And as you can see, it works:

S	M	T	W	T	F	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Month: July 2013

XenServer 6.0.2: Fixing Root-Disk-Multipathing with Boot-from-SAN

NGINX reverse proxy for Synology DiskStation