Masamichi Fukuda - elf-systems
masamichi_fukud****@elf-s*****
2015年 3月 17日 (火) 23:46:55 JST
山内さん こんばんは、福田です。 stonith-helperの-x指定は何かやり方が違うんでしょうかね。 stonith-helperを外して、xen0だけにして起動してみました。 # crm_mon -rfA Last updated: Tue Mar 17 23:38:53 2015 Last change: Tue Mar 17 23:30:34 2015 Stack: heartbeat Current DC: lbv1.beta.com (38b0f200-83ea-8633-6f37-047d36cd39c6) - parti tion with quorum Version: 1.1.12-e32080b 2 Nodes configured 6 Resources configured Online: [ lbv1.beta.com lbv2.beta.com ] Full list of resources: Stonith1-2 (stonith:external/xen0): Stopped Stonith2-2 (stonith:external/xen0): Stopped Resource Group: HAvarnish vip_208 (ocf::heartbeat:IPaddr2): Started lbv1.beta.com varnishd (lsb:varnish): Started lbv1.beta.com Clone Set: clone_ping [ping] Started: [ lbv1.beta.com lbv2.beta.com ] Node Attributes: * Node lbv1.beta.com: + default_ping_set : 100 * Node lbv2.beta.com: + default_ping_set : 100 Migration summary: * Node lbv1.beta.com: Stonith2-2: migration-threshold=1 fail-count=1000000 last-failure='Tue Mar 17 23:38:34 2015' * Node lbv2.beta.com: Stonith1-2: migration-threshold=1 fail-count=1000000 last-failure='Tue Mar 17 23:38:27 2015' Failed actions: Stonith2-2_start_0 on lbv1.beta.com 'unknown error' (1): call=23, st atus=Error, exit-reason='none', last-rc-change='Tue Mar 17 23:38:32 2015', queue d=0ms, exec=1061ms Stonith1-2_start_0 on lbv2.beta.com 'unknown error' (1): call=23, st atus=Error, exit-reason='none', last-rc-change='Tue Mar 17 23:38:25 2015', queue d=0ms, exec=1342ms stonith-helperがあるときと同様のfialed actionsが出ているようです。 宜しくお願いします。 以上 2015年3月17日 22:38 <renay****@ybb*****>: > 福田さん > > こんばんは、山内です。 > > ちなみに可能であれば、external/stonith-helperを外して、external/xen0だけにした場合に > どうなるか?を確認すると、問題の切り分けになるかもしれません。 > > 以上です。 > > > > ----- Original Message ----- > > From: "renay****@ybb*****" <renay****@ybb*****> > > To: "linux****@lists*****" < > linux****@lists*****> > > Cc: > > Date: 2015/3/17, Tue 22:28 > > Subject: Re: [Linux-ha-jp] スプリットブレイン時のSTONITHエラーについて > > > > 福田さん > > > > こんばんは、山内です。 > > > > 変わらないようですね。。。 > > > > とりあえず、明日くらいに、RHEL上ですが、 > > > > Heartbeat3.0.6 > > Pacemakerの最新 > > > > > 組み合わせで、同じような設定(リソースはDummy、external/xen0はexternal/sshになりますが)stonith-helperが動くかどうかを確認してみます。 > > > > #stonith-helperの-x指定の出力が確認出来ると、もう少し問題が絞りやすいのですが・・・ > > > > > > 以上です。 > > > > > > > > ----- Original Message ----- > >> From: Masamichi Fukuda - elf-systems > > <masamichi_fukud****@elf-s*****> > >> To: 山内英生 <renay****@ybb*****>; > > "linux****@lists*****" > > <linux****@lists*****> > >> Date: 2015/3/17, Tue 21:24 > >> Subject: Re: [Linux-ha-jp] スプリットブレイン時のSTONITHエラーについて > >> > >> > >> 山内さん > >> > >> こんばんは、福田です。 > >> 最新版の情報をありがとうございました。 > >> > >> 早速インストールしてみました。 > >> > >> 起動後の状態です。 > >> > >> failed actionsは変わりないようです。 > >> > >> > >> > >> # crm_mon -rfA > >> Last updated: Tue Mar 17 21:03:49 2015 > >> Last change: Tue Mar 17 20:30:58 2015 > >> Stack: heartbeat > >> Current DC: lbv1.beta.com (38b0f200-83ea-8633-6f37-047d36cd39c6) - > parti > >> tion with quorum > >> Version: 1.1.12-e32080b > >> 2 Nodes configured > >> 8 Resources configured > >> > >> > >> Online: [ lbv1.beta.com lbv2.beta.com ] > >> > >> Full list of resources: > >> > >> Resource Group: HAvarnish > >> vip_208 (ocf::heartbeat:IPaddr2): Started lbv1.beta.com > >> varnishd (lsb:varnish): Started lbv1.beta.com > >> Resource Group: grpStonith1 > >> Stonith1-1 (stonith:external/stonith-helper): Stopped > >> Stonith1-2 (stonith:external/xen0): Stopped > >> Resource Group: grpStonith2 > >> Stonith2-1 (stonith:external/stonith-helper): Stopped > >> Stonith2-2 (stonith:external/xen0): Stopped > >> Clone Set: clone_ping [ping] > >> Started: [ lbv1.beta.com lbv2.beta.com ] > >> > >> Node Attributes: > >> * Node lbv1.beta.com: > >> + default_ping_set : 100 > >> * Node lbv2.beta.com: > >> + default_ping_set : 100 > >> > >> Migration summary: > >> * Node lbv1.beta.com: > >> Stonith2-1: migration-threshold=1 fail-count=1000000 > > last-failure='Tue Mar 17 > >> 21:03:39 2015' > >> * Node lbv2.beta.com: > >> Stonith1-1: migration-threshold=1 fail-count=1000000 > > last-failure='Tue Mar 17 > >> 21:03:32 2015' > >> > >> Failed actions: > >> Stonith2-1_start_0 on lbv1.beta.com 'unknown error' (1): > > call=31, st > >> atus=Error, exit-reason='none', last-rc-change='Tue Mar 17 > > 21:03:37 2015', queue > >> d=0ms, exec=1085ms > >> Stonith1-1_start_0 on lbv2.beta.com 'unknown error' (1): > > call=18, st > >> atus=Error, exit-reason='none', last-rc-change='Tue Mar 17 > > 21:03:30 2015', queue > >> d=0ms, exec=1061ms > >> > >> > >> > >> > >> ログです。 > >> > >> > >> # less /var/log/ha-debug > >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: info: Pacemaker > support: > > yes > >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: WARN: File > > /etc/ha.d//haresources exists. > >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: WARN: This file is > not used > > because pacemaker is enabled > >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: debug: Checking > access of: > > /usr/local/heartbeat/libexec/heartbeat/ccm > >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: debug: Checking > access of: > > /usr/local/heartbeat/libexec/pacemaker/cib > >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: debug: Checking > access of: > > /usr/local/heartbeat/libexec/pacemaker/stonithd > >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: debug: Checking > access of: > > /usr/local/heartbeat/libexec/pacemaker/lrmd > >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: debug: Checking > access of: > > /usr/local/heartbeat/libexec/pacemaker/attrd > >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: debug: Checking > access of: > > /usr/local/heartbeat/libexec/pacemaker/crmd > >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: WARN: Core dumps > could be > > lost if multiple dumps occur. > >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: WARN: Consider setting > > non-default value in /proc/sys/kernel/core_pattern (or equivalent) for > maximum > > supportability > >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: WARN: Consider setting > > /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum > supportability > >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: WARN: Logging daemon > is > > disabled --enabling logging daemon is recommended > >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: info: > > ************************** > >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4235]: info: Configuration > > validated. Starting heartbeat 3.0.6 > >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4236]: info: heartbeat: > version > > 3.0.6 > >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4236]: info: Heartbeat > generation: > > 1423534116 > >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4236]: info: seed is > -1702799346 > >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4236]: info: glib: ucast: > write > > socket priority set to IPTOS_LOWDELAY on eth1 > >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4236]: info: glib: ucast: > bound > > send socket to device: eth1 > >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4236]: info: glib: ucast: set > > SO_REUSEADDR > >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4236]: info: glib: ucast: > bound > > receive socket to device: eth1 > >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4236]: info: glib: ucast: > started > > on port 694 interface eth1 to 10.0.17.133 > >> Mar 17 21:02:39 lbv1.beta.com heartbeat: [4236]: info: Local status > now set > > to: 'up' > >> Mar 17 21:02:46 lbv1.beta.com heartbeat: [4236]: info: Link > > lbv2.beta.com:eth1 up. > >> Mar 17 21:02:46 lbv1.beta.com heartbeat: [4236]: info: Status update > for > > node lbv2.beta.com: status up > >> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: Comm_now_up(): > > updating status to active > >> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: Local status > now set > > to: 'active' > >> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: Starting child > client > > "/usr/local/heartbeat/libexec/heartbeat/ccm" (109,113) > >> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: Starting child > client > > "/usr/local/heartbeat/libexec/pacemaker/cib" (109,113) > >> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: Starting child > client > > "/usr/local/heartbeat/libexec/pacemaker/stonithd" (0,0) > >> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: Starting child > client > > "/usr/local/heartbeat/libexec/pacemaker/lrmd" (0,0) > >> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: Starting child > client > > "/usr/local/heartbeat/libexec/pacemaker/attrd" (109,113) > >> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: Starting child > client > > "/usr/local/heartbeat/libexec/pacemaker/crmd" (109,113) > >> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: debug: > get_delnodelist: > > delnodelist= > >> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4250]: info: Starting > > "/usr/local/heartbeat/libexec/pacemaker/crmd" as uid 109 gid 113 (pid > > 4250) > >> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4246]: info: Starting > > "/usr/local/heartbeat/libexec/pacemaker/cib" as uid 109 gid 113 (pid > > 4246) > >> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4249]: info: Starting > > "/usr/local/heartbeat/libexec/pacemaker/attrd" as uid 109 gid 113 > > (pid 4249) > >> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4245]: info: Starting > > "/usr/local/heartbeat/libexec/heartbeat/ccm" as uid 109 gid 113 (pid > > 4245) > >> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4248]: info: Starting > > "/usr/local/heartbeat/libexec/pacemaker/lrmd" as uid 0 gid 0 (pid > > 4248) > >> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4247]: info: Starting > > "/usr/local/heartbeat/libexec/pacemaker/stonithd" as uid 0 gid 0 (pid > > 4247) > >> Mar 17 21:02:47 lbv1.beta.com ccm: [4245]: info: Hostname: > lbv1.beta.com > >> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: the send queue > length > > from heartbeat to client ccm is set to 1024 > >> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: the send queue > length > > from heartbeat to client attrd is set to 1024 > >> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: the send queue > length > > from heartbeat to client stonith-ng is set to 1024 > >> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: Status update > for > > node lbv2.beta.com: status active > >> Mar 17 21:02:47 lbv1.beta.com heartbeat: [4236]: info: the send queue > length > > from heartbeat to client cib is set to 1024 > >> Mar 17 21:02:51 lbv1.beta.com heartbeat: [4236]: WARN: 1 lost > packet(s) for > > [lbv2.beta.com] [15:17] > >> Mar 17 21:02:51 lbv1.beta.com heartbeat: [4236]: info: No pkts missing > from > > lbv2.beta.com! > >> Mar 17 21:02:52 lbv1.beta.com heartbeat: [4236]: WARN: 1 lost > packet(s) for > > [lbv2.beta.com] [19:21] > >> Mar 17 21:02:52 lbv1.beta.com heartbeat: [4236]: info: No pkts missing > from > > lbv2.beta.com! > >> Mar 17 21:02:52 lbv1.beta.com heartbeat: [4236]: info: the send queue > length > > from heartbeat to client crmd is set to 1024 > >> Mar 17 21:02:53 lbv1.beta.com heartbeat: [4236]: WARN: 1 lost > packet(s) for > > [lbv2.beta.com] [24:26] > >> Mar 17 21:02:53 lbv1.beta.com heartbeat: [4236]: info: No pkts missing > from > > lbv2.beta.com! > >> Mar 17 21:02:54 lbv1.beta.com heartbeat: [4236]: WARN: 1 lost > packet(s) for > > [lbv2.beta.com] [26:28] > >> Mar 17 21:02:54 lbv1.beta.com heartbeat: [4236]: info: No pkts missing > from > > lbv2.beta.com! > >> Mar 17 21:02:54 lbv1.beta.com heartbeat: [4236]: WARN: 1 lost > packet(s) for > > [lbv2.beta.com] [30:32] > >> Mar 17 21:02:54 lbv1.beta.com heartbeat: [4236]: info: No pkts missing > from > > lbv2.beta.com! > >> > >> > >> > >> # less /var/log/error > >> > >> Mar 17 21:02:47 lbv1 attrd[4249]: error: ha_msg_dispatch: Ignored > > incoming message. Please set_msg_callback on hbclstat > >> Mar 17 21:02:48 lbv1 attrd[4249]: error: ha_msg_dispatch: Ignored > > incoming message. Please set_msg_callback on hbclstat > >> Mar 17 21:02:53 lbv1 stonith-ng[4247]: error: ha_msg_dispatch: > Ignored > > incoming message. Please set_msg_callback on hbclstat > >> Mar 17 21:02:53 lbv1 stonith-ng[4247]: error: ha_msg_dispatch: > Ignored > > incoming message. Please set_msg_callback on hbclstat > >> Mar 17 21:03:39 lbv1 crmd[4250]: error: process_lrm_event: Operation > > Stonith2-1_start_0 (node=lbv1.beta.com, call=31, status=4, > cib-update=42, > > confirmed=true) Error > >> > >> # cat syslog|egrep 'Mar 17 21:03|Mar 17 21:02' |egrep > > 'heartbeat|stonith|pacemaker|error' > >> Mar 17 21:03:24 lbv1 pengine[4253]: notice: process_pe_message: > Calculated > > Transition 0: /var/lib/pacemaker/pengine/pe-input-115.bz2 > >> Mar 17 21:03:27 lbv1 crmd[4250]: notice: run_graph: Transition 0 > > (Complete=15, Pending=0, Fired=0, Skipped=16, Incomplete=2, > > Source=/var/lib/pacemaker/pengine/pe-input-115.bz2): Stopped > >> Mar 17 21:03:29 lbv1 pengine[4253]: notice: process_pe_message: > Calculated > > Transition 1: /var/lib/pacemaker/pengine/pe-input-116.bz2 > >> Mar 17 21:03:34 lbv1 crmd[4250]: notice: run_graph: Transition 1 > > (Complete=8, Pending=0, Fired=0, Skipped=12, Incomplete=1, > > Source=/var/lib/pacemaker/pengine/pe-input-116.bz2): Stopped > >> Mar 17 21:03:37 lbv1 pengine[4253]: warning: unpack_rsc_op_failure: > > Processing failed op start for Stonith1-1 on lbv2.beta.com: unknown > error (1) > >> Mar 17 21:03:37 lbv1 pengine[4253]: warning: unpack_rsc_op_failure: > > Processing failed op start for Stonith1-1 on lbv2.beta.com: unknown > error (1) > >> Mar 17 21:03:37 lbv1 pengine[4253]: notice: process_pe_message: > Calculated > > Transition 2: /var/lib/pacemaker/pengine/pe-input-117.bz2 > >> Mar 17 21:03:39 lbv1 stonith-ng[4247]: notice: log_operation: > Operation > > 'monitor' [4377] for device 'Stonith2-1' returned: -201 (Generic > > Pacemaker error) > >> Mar 17 21:03:39 lbv1 stonith-ng[4247]: warning: log_operation: > > Stonith2-1:4377 [ Performing: stonith -t external/stonith-helper -S ] > >> Mar 17 21:03:39 lbv1 stonith-ng[4247]: warning: log_operation: > > Stonith2-1:4377 [ failed to exec "stonith" ] > >> Mar 17 21:03:39 lbv1 stonith-ng[4247]: warning: log_operation: > > Stonith2-1:4377 [ failed: 2 ] > >> Mar 17 21:03:39 lbv1 crmd[4250]: error: process_lrm_event: Operation > > Stonith2-1_start_0 (node=lbv1.beta.com, call=31, status=4, > cib-update=42, > > confirmed=true) Error > >> Mar 17 21:03:40 lbv1 crmd[4250]: notice: run_graph: Transition 2 > > (Complete=12, Pending=0, Fired=0, Skipped=3, Incomplete=0, > > Source=/var/lib/pacemaker/pengine/pe-input-117.bz2): Stopped > >> Mar 17 21:03:42 lbv1 pengine[4253]: warning: unpack_rsc_op_failure: > > Processing failed op start for Stonith2-1 on lbv1.beta.com: unknown > error (1) > >> Mar 17 21:03:42 lbv1 pengine[4253]: warning: unpack_rsc_op_failure: > > Processing failed op start for Stonith2-1 on lbv1.beta.com: unknown > error (1) > >> Mar 17 21:03:42 lbv1 pengine[4253]: warning: unpack_rsc_op_failure: > > Processing failed op start for Stonith1-1 on lbv2.beta.com: unknown > error (1) > >> Mar 17 21:03:42 lbv1 pengine[4253]: notice: process_pe_message: > Calculated > > Transition 3: /var/lib/pacemaker/pengine/pe-input-118.bz2 > >> Mar 17 21:03:42 lbv1 IPaddr2(vip_208)[4448]: INFO: > > /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p > > /var/run/resource-agents/send_arp-192.168.17.208 eth0 192.168.17.208 auto > > not_used not_used > >> Mar 17 21:03:47 lbv1 crmd[4250]: notice: run_graph: Transition 3 > > (Complete=10, Pending=0, Fired=0, Skipped=0, Incomplete=0, > > Source=/var/lib/pacemaker/pengine/pe-input-118.bz2): Complete > >> > >> 宜しくお願いします。 > >> > >> 以上 > >> > >> > >> > >> 2015年3月17日 18:31 <renay****@ybb*****>: > >> > >> 福田さん > >>> > >>> こんばんは、山内です。 > >>> > >>> tag付けされていないので、本日の最新版は、 > >>> > >>> * > > > https://github.com/ClusterLabs/pacemaker/tree/e32080b460f81486b85d08ec958582b3e72d858c > >>> > >>> > >>> になります。 > >>> 右側の[Download ZIP]からダウンロード出来ます。 > >>> > >>> 以上です。 > >>> > >>> > >>> ----- Original Message ----- > >>>> From: Masamichi Fukuda - elf-systems > > <masamichi_fukud****@elf-s*****> > >>> > >>>> To: "renay****@ybb*****" > > <renay****@ybb*****>; > > "linux****@lists*****" > > <linux****@lists*****> > >>>> Date: 2015/3/17, Tue 18:07 > >>>> Subject: スプリットブレイン時のSTONITHエラーについて > >>>> > >>>> > >>>> 山内さん > >>>> > >>>> > >>>> お疲れ様です、福田です。 > >>>> > >>>> > >>>> こちらを見たのですが、 > >>>> https://github.com/ClusterLabs/pacemaker/tags > >>>> > >>>> > >>>> > >>>> pacemaker 1.1.12 561c4cf が最新のようなのですが。 > >>>> 済みませんが、これ以降の最新版はどちらにあるか教えて頂けますか。 > >>>> > >>>> > >>>> 宜しくお願いします。 > >>>> > >>>> > >>>> 以上 > >>>> > >>>> > >>>> > >>>> 2015年3月17日火曜日、<renay****@ybb*****>さんは書きました: > >>>> > >>>> 福田さん > >>>>> > >>>>> お疲れ様です。山内です。 > >>>>> > >>>>> はい。古いです。 > >>>>> > >>>>> PacemakerがHeartbeat3.0.6に対応したのは意外と最近です。 > >>>>> もっと新しいものを入れてください。(また、ソースから構築する必要がありますが・・・・) > >>>>> > >>>>> > >>>>> > >>>>> 本家のgithubから入手可能です。 > >>>>> * https://github.com/ClusterLabs/pacemaker > >>>>> > >>>>> > >>>>> 場合によっては、最新のmasterはエラーなどが出る場合がありますので、その場合は、バージョンを古い方にたぐって > >>>>> いくのが良いと思います。 > >>>>> > >>>>> 以上です。 > >>>>> > >>>>> > >>>>> > >>>>> ----- Original Message ----- > >>>>>> From: Masamichi Fukuda - elf-systems > > <masamichi_fukud****@elf-s*****> > >>>>>> To: 山内英生 <renay****@ybb*****>; > > "linux****@lists*****" > > <linux****@lists*****> > >>>>>> Date: 2015/3/17, Tue 16:06 > >>>>>> Subject: Re: [Linux-ha-jp] スプリットブレイン時のSTONITHエラーについて > >>>>>> > >>>>>> > >>>>>> 山内さん > >>>>>> > >>>>>> お疲れ様です、福田です。 > >>>>>> > >>>>>> 以前のメールでheartbeatとpacemakerを最新版を入れたほうが良いと回答頂きました。 > >>>>>> そこで今回、heartbeat3.0.6とpacemaker1.1.12を入れたのですが。 > >>>>>> > >>>>>> heartbeat configuration: Version = "3.0.6" > >>>>>> pacemaker configuration: Version = 1.1.12 (Build: > > 561c4cf)pacemakerがまだ古いということでしょうか。 > >>>>>> > >>>>>> 済みませんが、宜しくお願いします。 > >>>>>> > >>>>>> 以上 > >>>>>> > >>>>>> > >>>>>> > >>>>>> 2015年3月17日 14:59 <renay****@ybb*****>: > >>>>>> > >>>>>> 福田さん > >>>>>>> > >>>>>>> お疲れ様です。山内です。 > >>>>>>> > >>>>>>> ふと思ったのすが、以前のやり取りのメールで以下と回答してますが、問題ないでしょうか? > >>>>>>> > >>>>>>> > >>>>>>>>>>>>> 2)Heartbeat3.0.6+Pacemaker最新 : > > OK > >>>>>>>>>>>>> > >>>>>>>>>>>>> > > どうやら、Heartbeatも最新版3.0.6を組合せる必要があるようです。 > >>>>>>>>>>>>> > > * http://hg.linux-ha.org/heartbeat-STABLE_3_0/rev/cceeb47a7d8f > >>>>>>> > >>>>>>> 以下のcrm_monのバージョンを見ると、1.1.12のようです。 > >>>>>>> Heartbeat3.0.6と組み合わせるには、かなり新しめのPacemakerが必要です。 > >>>>>>> > >>>>>>>> # crm_mon -rfA > >>>>>>>> > >>>>>>>> Last updated: Tue Mar 17 14:14:39 2015 > >>>>>>>> Last change: Tue Mar 17 14:01:43 2015 > >>>>>>>> Stack: heartbeat > >>>>>>>> Current DC: lbv2.beta.com > > (82ffc36f-1ad8-8686-7db0-35686465c624) - parti > >>>>>>>> tion with quorum > >>>>>>>> Version: 1.1.12-561c4cf > >>>>>>> > >>>>>>> たぶん、以下の変更以降は少なくとも必要かと思います。 > >>>>>>> > >>>>>>> > https://github.com/ClusterLabs/pacemaker/commit/f2302da063d08719d28367d8e362b8bfb0f85bf3 > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> 以上です。 > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> ----- Original Message ----- > >>>>>>>> From: Masamichi Fukuda - elf-systems > > <masamichi_fukud****@elf-s*****> > >>>>>>>> To: 山内英生 <renay****@ybb*****>; > > "linux****@lists*****" > > <linux****@lists*****> > >>>>>>> > >>>>>>>> Date: 2015/3/17, Tue 14:38 > >>>>>>>> Subject: Re: [Linux-ha-jp] スプリットブレイン時のSTONITHエラーについて > >>>>>>>> > >>>>>>>> > >>>>>>>> 山内さん > >>>>>>>> > >>>>>>>> お疲れ様です、福田です。 > >>>>>>>> > >>>>>>>> stonith-helperのシェバング行に-xを追加すれば良いのでしょうか? > >>>>>>>> stonith-helperの先頭行を#!/bin/bash -xにしてクラスタを起動してみました。 > >>>>>>>> > >>>>>>>> crm_monでは先ほどと変わりはないようです。 > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> # crm_mon -rfA > >>>>>>>> > >>>>>>>> Last updated: Tue Mar 17 14:14:39 2015 > >>>>>>>> Last change: Tue Mar 17 14:01:43 2015 > >>>>>>>> Stack: heartbeat > >>>>>>>> Current DC: lbv2.beta.com > > (82ffc36f-1ad8-8686-7db0-35686465c624) - parti > >>>>>>>> tion with quorum > >>>>>>>> Version: 1.1.12-561c4cf > >>>>>>>> 2 Nodes configured > >>>>>>>> 8 Resources configured > >>>>>>>> > >>>>>>>> Online: [ lbv1.beta.com lbv2.beta.com ] > >>>>>>>> > >>>>>>>> Full list of resources: > >>>>>>>> > >>>>>>>> Resource Group: HAvarnish > >>>>>>>> vip_208 (ocf::heartbeat:IPaddr2): > > Started lbv1.beta.com > >>>>>>>> varnishd (lsb:varnish): Started > > lbv1.beta.com > >>>>>>>> Resource Group: grpStonith1 > >>>>>>>> Stonith1-1 > > (stonith:external/stonith-helper): Stopped > >>>>>>>> Stonith1-2 (stonith:external/xen0): > > Stopped > >>>>>>>> Resource Group: grpStonith2 > >>>>>>>> Stonith2-1 > > (stonith:external/stonith-helper): Stopped > >>>>>>>> Stonith2-2 (stonith:external/xen0): > > Stopped > >>>>>>>> Clone Set: clone_ping [ping] > >>>>>>>> Started: [ lbv1.beta.com lbv2.beta.com ] > >>>>>>>> > >>>>>>>> Node Attributes: > >>>>>>>> * Node lbv1.beta.com: > >>>>>>>> + default_ping_set : 100 > >>>>>>>> * Node lbv2.beta.com: > >>>>>>>> + default_ping_set : 100 > >>>>>>>> > >>>>>>>> Migration summary: > >>>>>>>> * Node lbv2.beta.com: > >>>>>>>> Stonith1-1: migration-threshold=1 > > fail-count=1000000 last-failure='Tue Mar 17 > >>>>>>>> 14:12:16 2015' > >>>>>>>> * Node lbv1.beta.com: > >>>>>>>> Stonith2-1: migration-threshold=1 > > fail-count=1000000 last-failure='Tue Mar 17 > >>>>>>>> 14:12:21 2015' > >>>>>>>> > >>>>>>>> Failed actions: > >>>>>>>> Stonith1-1_start_0 on lbv2.beta.com 'unknown > > error' (1): call=31, st > >>>>>>>> atus=Error, last-rc-change='Tue Mar 17 14:12:14 > > 2015', queued=0ms, exec=1065ms > >>>>>>>> Stonith2-1_start_0 on lbv1.beta.com 'unknown > > error' (1): call=26, st > >>>>>>>> atus=Error, last-rc-change='Tue Mar 17 14:12:19 > > 2015', queued=0ms, exec=1081ms > >>>>>>>> > >>>>>>>> その他のログを探してみました。 > >>>>>>>> > >>>>>>>> heartbeat起動時です。 > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> # less /var/log/pm_logconv.out > >>>>>>>> Mar 17 14:11:28 lbv1.beta.com info: Starting > > Heartbeat 3.0.6. > >>>>>>>> Mar 17 14:11:33 lbv1.beta.com info: Link > > lbv2.beta.com:eth1 is up. > >>>>>>>> Mar 17 14:11:34 lbv1.beta.com info: Start > > "ccm" process. (pid=13264) > >>>>>>>> Mar 17 14:11:34 lbv1.beta.com info: Start > > "lrmd" process. (pid=13267) > >>>>>>>> Mar 17 14:11:34 lbv1.beta.com info: Start > > "attrd" process. (pid=13268) > >>>>>>>> Mar 17 14:11:34 lbv1.beta.com info: Start > > "stonithd" process. (pid=13266) > >>>>>>>> Mar 17 14:11:34 lbv1.beta.com info: Start > > "cib" process. (pid=13265) > >>>>>>>> Mar 17 14:11:34 lbv1.beta.com info: Start > > "crmd" process. (pid=13269) > >>>>>>>> > >>>>>>>> > >>>>>>>> # less /var/log/error > >>>>>>>> Mar 17 14:12:20 lbv1 crmd[13269]: error: > > process_lrm_event: Operation Stonith2-1_start_0 (node=lbv1.beta.com, > call=26, > > status=4, cib-update=19, confirmed=true) Error > >>>>>>>> > >>>>>>>> > >>>>>>>> syslogからstonithをgrepしたものです > >>>>>>>> > >>>>>>>> Mar 17 14:11:34 lbv1 heartbeat: [13255]: info: > > Starting child client > > "/usr/local/heartbeat/libexec/pacemaker/stonithd" (0,0) > >>>>>>>> Mar 17 14:11:34 lbv1 heartbeat: [13266]: info: > > Starting "/usr/local/heartbeat/libexec/pacemaker/stonithd" as uid 0 > > gid 0 (pid 13266) > >>>>>>>> Mar 17 14:11:34 lbv1 stonithd[13266]: notice: > > crm_cluster_connect: Connecting to cluster infrastructure: heartbeat > >>>>>>>> Mar 17 14:11:34 lbv1 heartbeat: [13255]: info: the > > send queue length from heartbeat to client stonithd is set to 1024 > >>>>>>>> Mar 17 14:11:40 lbv1 stonithd[13266]: notice: > > setup_cib: Watching for stonith topology changes > >>>>>>>> Mar 17 14:11:40 lbv1 stonithd[13266]: notice: > > unpack_config: On loss of CCM Quorum: Ignore > >>>>>>>> Mar 17 14:11:40 lbv1 stonithd[13266]: warning: > > handle_startup_fencing: Blind faith: not fencing unseen nodes > >>>>>>>> Mar 17 14:11:40 lbv1 stonithd[13266]: warning: > > handle_startup_fencing: Blind faith: not fencing unseen nodes > >>>>>>>> Mar 17 14:11:41 lbv1 stonithd[13266]: notice: > > stonith_device_register: Added 'Stonith2-1' to the device list (1 active > > devices) > >>>>>>>> Mar 17 14:11:41 lbv1 stonithd[13266]: notice: > > stonith_device_register: Added 'Stonith2-2' to the device list (2 active > > devices) > >>>>>>>> Mar 17 14:12:04 lbv1 stonithd[13266]: notice: > > xml_patch_version_check: Versions did not change in patch 0.5.0 > >>>>>>>> Mar 17 14:12:20 lbv1 stonithd[13266]: notice: > > log_operation: Operation 'monitor' [13386] for device > > 'Stonith2-1' returned: -201 (Generic Pacemaker error) > >>>>>>>> Mar 17 14:12:20 lbv1 stonithd[13266]: warning: > > log_operation: Stonith2-1:13386 [ Performing: stonith -t > external/stonith-helper > > -S ] > >>>>>>>> Mar 17 14:12:20 lbv1 stonithd[13266]: warning: > > log_operation: Stonith2-1:13386 [ failed to exec "stonith" ] > >>>>>>>> Mar 17 14:12:20 lbv1 stonithd[13266]: warning: > > log_operation: Stonith2-1:13386 [ failed: 2 ] > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> 宜しくお願いします。 > >>>>>>>> > >>>>>>>> 以上 > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> 2015年3月17日 13:32 <renay****@ybb*****>: > >>>>>>>> > >>>>>>>> 福田さん > >>>>>>>>> > >>>>>>>>> お疲れ様です。山内です。 > >>>>>>>>> > >>>>>>>>> ということは、stonith-helperのstartに問題があるようですね。 > >>>>>>>>> > >>>>>>>>> stonith-helperの先頭に > >>>>>>>>> > >>>>>>>>> #!/bin/bash -x > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> を入れて、クラスタを起動すると何かわかるかも知れません。 > >>>>>>>>> > >>>>>>>>> ちなみに、stonith-helperのログもどこかに出ていると思うのですが。。。 > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> 以上です。 > >>>>>>>>> > >>>>>>>>> ----- Original Message ----- > >>>>>>>>>> From: Masamichi Fukuda - elf-systems > > <masamichi_fukud****@elf-s*****> > >>>>>>>>>> To: 山内英生 <renay****@ybb*****>; > > "linux****@lists*****" > > <linux****@lists*****> > >>>>>>>>> > >>>>>>>>>> Date: 2015/3/17, Tue 12:31 > >>>>>>>>>> Subject: Re: [Linux-ha-jp] > > スプリットブレイン時のSTONITHエラーについて > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> 山内さん > >>>>>>>>>> cc:松島さん > >>>>>>>>>> > >>>>>>>>>> こんにちは、福田です。 > >>>>>>>>>> > >>>>>>>>>> 同じディレクトリにxen0はありました。 > >>>>>>>>>> > >>>>>>>>>> # pwd > >>>>>>>>>> /usr/local/heartbeat/lib/stonith/plugins/external > >>>>>>>>>> > >>>>>>>>>> # ls > >>>>>>>>>> drac5 ibmrsa kdumpcheck > > riloe vmware > >>>>>>>>>> dracmc-telnet ibmrsa-telnet libvirt > > ssh xen0 > >>>>>>>>>> hetzner ipmi nut > > stonith-helper xen0-ha > >>>>>>>>>> hmchttp ippower9258 rackpdu > > vcenter > >>>>>>>>>> > >>>>>>>>>> 宜しくお願いします。 > >>>>>>>>>> > >>>>>>>>>> 以上 > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> 2015-03-17 10:53 GMT+09:00 > > <renay****@ybb*****>: > >>>>>>>>>> > >>>>>>>>>> 福田さん > >>>>>>>>>>> cc:松島さん > >>>>>>>>>>> > >>>>>>>>>>> お疲れ様です。山内です。 > >>>>>>>>>>> > >>>>>>>>>>>> 標準出力や標準エラー出力はありませんでした。 > >>>>>>>>>>>> > >>>>>>>>>>>> stonith-helperがおかしいのでしょうか。 > >>>>>>>>>>>> stonith-helperはシェルスクリプトなのでインストールはあまり気にしていなかったのですが。 > >>>>>>>>>>>> stonith-helperはここに配置されています。 > >>>>>>>>>>>> > /usr/local/heartbeat/lib/stonith/plugins/external/stonith-helper > >>>>>>>>>>> > >>>>>>>>>>> このディレクトリにxen0もありますか? > >>>>>>>>>>> > 無いようでしたら、問題がありますので、一度、stonith-helperのファイルを属性などはそのまま、xen0と同じディレクトリに > >>>>>>>>>>> コピーしてみてください。 > >>>>>>>>>>> > >>>>>>>>>>> それで稼働するなら、pm_extrasのインストールに問題があるということになります。 > >>>>>>>>>>> > >>>>>>>>>>> 以上です。 > >>>>>>>>>>> > >>>>>>>>>>> ----- Original Message ----- > >>>>>>>>>>>> From: Masamichi Fukuda - elf-systems > > <masamichi_fukud****@elf-s*****> > >>>>>>>>>>>> To: 山内英生 > > <renay****@ybb*****>; > > "linux****@lists*****" > > <linux****@lists*****> > >>>>>>>>>>> > >>>>>>>>>>>> Date: 2015/3/17, Tue 10:31 > >>>>>>>>>>>> Subject: Re: [Linux-ha-jp] > > スプリットブレイン時のSTONITHエラーについて > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> 山内さん > >>>>>>>>>>>> cc:松島さん > >>>>>>>>>>>> > >>>>>>>>>>>> おはようございます、福田です。 > >>>>>>>>>>>> crmの例をありがとうございます。 > >>>>>>>>>>>> > >>>>>>>>>>>> 早速、こちらの環境に合わせてみました。 > >>>>>>>>>>>> > >>>>>>>>>>>> $ cat test.crm > >>>>>>>>>>>> ### Cluster Option ### > >>>>>>>>>>>> property \ > >>>>>>>>>>>> > > no-quorum-policy="ignore" \ > >>>>>>>>>>>> stonith-enabled="true" > > \ > >>>>>>>>>>>> > > startup-fencing="false" \ > >>>>>>>>>>>> stonith-timeout="710s" > > \ > >>>>>>>>>>>> > > crmd-transition-delay="2s" > >>>>>>>>>>>> > >>>>>>>>>>>> ### Resource Default ### > >>>>>>>>>>>> rsc_defaults \ > >>>>>>>>>>>> > > resource-stickiness="INFINITY" \ > >>>>>>>>>>>> > > migration-threshold="1" > >>>>>>>>>>>> > >>>>>>>>>>>> ### Group Configuration ### > >>>>>>>>>>>> group HAvarnish \ > >>>>>>>>>>>> vip_208 \ > >>>>>>>>>>>> varnishd > >>>>>>>>>>>> > >>>>>>>>>>>> group grpStonith1 \ > >>>>>>>>>>>> Stonith1-1 \ > >>>>>>>>>>>> Stonith1-2 > >>>>>>>>>>>> > >>>>>>>>>>>> group grpStonith2 \ > >>>>>>>>>>>> Stonith2-1 \ > >>>>>>>>>>>> Stonith2-2 > >>>>>>>>>>>> > >>>>>>>>>>>> ### Clone Configuration ### > >>>>>>>>>>>> clone clone_ping \ > >>>>>>>>>>>> ping > >>>>>>>>>>>> > >>>>>>>>>>>> ### Fencing Topology ### > >>>>>>>>>>>> fencing_topology \ > >>>>>>>>>>>> lbv1.beta.com: Stonith1-1 > > Stonith1-2 \ > >>>>>>>>>>>> lbv2.beta.com: Stonith2-1 > > Stonith2-2 > >>>>>>>>>>>> > >>>>>>>>>>>> ### Primitive Configuration ### > >>>>>>>>>>>> primitive vip_208 > > ocf:heartbeat:IPaddr2 \ > >>>>>>>>>>>> params \ > >>>>>>>>>>>> > > ip="192.168.17.208" \ > >>>>>>>>>>>> nic="eth0" \ > >>>>>>>>>>>> cidr_netmask="24" > > \ > >>>>>>>>>>>> op start interval="0s" > > timeout="90s" on-fail="restart" \ > >>>>>>>>>>>> op monitor > > interval="5s" timeout="60s" on-fail="restart" > > \ > >>>>>>>>>>>> op stop interval="0s" > > timeout="100s" on-fail="fence" > >>>>>>>>>>>> > >>>>>>>>>>>> primitive varnishd lsb:varnish \ > >>>>>>>>>>>> op start interval="0s" > > timeout="90s" on-fail="restart" \ > >>>>>>>>>>>> op monitor > > interval="10s" timeout="60s" on-fail="restart" > > \ > >>>>>>>>>>>> op stop interval="0s" > > timeout="100s" on-fail="fence" > >>>>>>>>>>>> > >>>>>>>>>>>> primitive ping ocf:pacemaker:ping > > \ > >>>>>>>>>>>> params \ > >>>>>>>>>>>> > > name="default_ping_set" \ > >>>>>>>>>>>> > > host_list="192.168.17.254" \ > >>>>>>>>>>>> multiplier="100" > > \ > >>>>>>>>>>>> dampen="1" \ > >>>>>>>>>>>> op start interval="0s" > > timeout="90s" on-fail="restart" \ > >>>>>>>>>>>> op monitor > > interval="10s" timeout="60s" on-fail="restart" > > \ > >>>>>>>>>>>> op stop interval="0s" > > timeout="100s" on-fail="fence" > >>>>>>>>>>>> > >>>>>>>>>>>> primitive Stonith1-1 > > stonith:external/stonith-helper \ > >>>>>>>>>>>> params \ > >>>>>>>>>>>> > > pcmk_reboot_retries="1" \ > >>>>>>>>>>>> > > pcmk_reboot_timeout="40s" \ > >>>>>>>>>>>> > > hostlist="lbv1.beta.com" \ > >>>>>>>>>>>> > > dead_check_target="192.168.17.132 10.0.17.132" \ > >>>>>>>>>>>> > > standby_check_command="/usr/local/sbin/crm_resource -r varnishd -W | grep > > -q `hostname`" \ > >>>>>>>>>>>> > > run_online_check="yes" \ > >>>>>>>>>>>> op start interval="0s" > > timeout="60s" on-fail="restart" \ > >>>>>>>>>>>> op stop interval="0s" > > timeout="60s" on-fail="ignore" > >>>>>>>>>>>> > >>>>>>>>>>>> primitive Stonith1-2 > > stonith:external/xen0 \ > >>>>>>>>>>>> params \ > >>>>>>>>>>>> > > pcmk_reboot_timeout="60s" \ > >>>>>>>>>>>> > > hostlist="lbv1.beta.com:/etc/xen/lbv1.cfg" \ > >>>>>>>>>>>> > > dom0="xen0.beta.com" \ > >>>>>>>>>>>> op start interval="0s" > > timeout="60s" on-fail="restart" \ > >>>>>>>>>>>> op monitor > > interval="3600s" timeout="60s" on-fail="restart" > > \ > >>>>>>>>>>>> op stop interval="0s" > > timeout="60s" on-fail="ignore" > >>>>>>>>>>>> > >>>>>>>>>>>> primitive Stonith2-1 > > stonith:external/stonith-helper \ > >>>>>>>>>>>> params \ > >>>>>>>>>>>> > > pcmk_reboot_retries="1" \ > >>>>>>>>>>>> > > pcmk_reboot_timeout="40s" \ > >>>>>>>>>>>> > > hostlist="lbv2.beta.com" \ > >>>>>>>>>>>> > > dead_check_target="192.168.17.133 10.0.17.133" \ > >>>>>>>>>>>> > > standby_check_command="/usr/local/sbin/crm_resource -r varnishd -W | grep > > -q `hostname`" \ > >>>>>>>>>>>> > > run_online_check="yes" \ > >>>>>>>>>>>> op start interval="0s" > > timeout="60s" on-fail="restart" \ > >>>>>>>>>>>> op stop interval="0s" > > timeout="60s" on-fail="ignore" > >>>>>>>>>>>> > >>>>>>>>>>>> primitive Stonith2-2 > > stonith:external/xen0 \ > >>>>>>>>>>>> params \ > >>>>>>>>>>>> > > pcmk_reboot_timeout="60s" \ > >>>>>>>>>>>> > > hostlist="lbv2.beta.com:/etc/xen/lbv2.cfg" \ > >>>>>>>>>>>> > > dom0="xen0.beta.com" \ > >>>>>>>>>>>> op start interval="0s" > > timeout="60s" on-fail="restart" \ > >>>>>>>>>>>> op monitor > > interval="3600s" timeout="60s" on-fail="restart" > > \ > >>>>>>>>>>>> op stop interval="0s" > > timeout="60s" on-fail="ignore" > >>>>>>>>>>>> > >>>>>>>>>>>> ### Resource Location ### > >>>>>>>>>>>> location HA_location-1 HAvarnish > > \ > >>>>>>>>>>>> rule 200: #uname eq > > lbv1.beta.com \ > >>>>>>>>>>>> rule 100: #uname eq > > lbv2.beta.com > >>>>>>>>>>>> > >>>>>>>>>>>> location HA_location-2 HAvarnish > > \ > >>>>>>>>>>>> rule -INFINITY: not_defined > > default_ping_set or default_ping_set lt 100 > >>>>>>>>>>>> > >>>>>>>>>>>> location HA_location-3 grpStonith1 > > \ > >>>>>>>>>>>> rule -INFINITY: #uname eq > > lbv1.beta.com > >>>>>>>>>>>> > >>>>>>>>>>>> location HA_location-4 grpStonith2 > > \ > >>>>>>>>>>>> rule -INFINITY: #uname eq > > lbv2.beta.com > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> これを流しこんだところ、昨日とはメッセージが異なります。 > >>>>>>>>>>>> pingのメッセージはなくなっていました。 > >>>>>>>>>>>> > >>>>>>>>>>>> # crm_mon -rfA > >>>>>>>>>>>> Last updated: Tue Mar 17 10:21:28 > > 2015 > >>>>>>>>>>>> Last change: Tue Mar 17 10:21:09 > > 2015 > >>>>>>>>>>>> Stack: heartbeat > >>>>>>>>>>>> Current DC: lbv2.beta.com > > (82ffc36f-1ad8-8686-7db0-35686465c624) - parti > >>>>>>>>>>>> tion with quorum > >>>>>>>>>>>> Version: 1.1.12-561c4cf > >>>>>>>>>>>> 2 Nodes configured > >>>>>>>>>>>> 8 Resources configured > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> Online: [ lbv1.beta.com > > lbv2.beta.com ] > >>>>>>>>>>>> > >>>>>>>>>>>> Full list of resources: > >>>>>>>>>>>> > >>>>>>>>>>>> Resource Group: HAvarnish > >>>>>>>>>>>> vip_208 > > (ocf::heartbeat:IPaddr2): Started lbv1.beta.com > >>>>>>>>>>>> varnishd (lsb:varnish): > > Started lbv1.beta.com > >>>>>>>>>>>> Resource Group: grpStonith1 > >>>>>>>>>>>> Stonith1-1 > > (stonith:external/stonith-helper): Stopped > >>>>>>>>>>>> Stonith1-2 > > (stonith:external/xen0): Stopped > >>>>>>>>>>>> Resource Group: grpStonith2 > >>>>>>>>>>>> Stonith2-1 > > (stonith:external/stonith-helper): Stopped > >>>>>>>>>>>> Stonith2-2 > > (stonith:external/xen0): Stopped > >>>>>>>>>>>> Clone Set: clone_ping [ping] > >>>>>>>>>>>> Started: [ lbv1.beta.com > > lbv2.beta.com ] > >>>>>>>>>>>> > >>>>>>>>>>>> Node Attributes: > >>>>>>>>>>>> * Node lbv1.beta.com: > >>>>>>>>>>>> + > > default_ping_set : 100 > >>>>>>>>>>>> * Node lbv2.beta.com: > >>>>>>>>>>>> + > > default_ping_set : 100 > >>>>>>>>>>>> > >>>>>>>>>>>> Migration summary: > >>>>>>>>>>>> * Node lbv2.beta.com: > >>>>>>>>>>>> Stonith1-1: migration-threshold=1 > > fail-count=1000000 last-failure='Tue Mar 17 > >>>>>>>>>>>> 10:21:17 2015' > >>>>>>>>>>>> * Node lbv1.beta.com: > >>>>>>>>>>>> Stonith2-1: migration-threshold=1 > > fail-count=1000000 last-failure='Tue Mar 17 > >>>>>>>>>>>> 10:21:17 2015' > >>>>>>>>>>>> > >>>>>>>>>>>> Failed actions: > >>>>>>>>>>>> Stonith1-1_start_0 on > > lbv2.beta.com 'unknown error' (1): call=31, st > >>>>>>>>>>>> atus=Error, last-rc-change='Tue > > Mar 17 10:21:15 2015', queued=0ms, exec=1082ms > >>>>>>>>>>>> Stonith2-1_start_0 on > > lbv1.beta.com 'unknown error' (1): call=31, st > >>>>>>>>>>>> atus=Error, last-rc-change='Tue > > Mar 17 10:21:16 2015', queued=0ms, exec=1079ms > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> /var/log/ha-debugのログです。 > >>>>>>>>>>>> > >>>>>>>>>>>> IPaddr2(vip_208)[7851]: > > 2015/03/17_10:21:22 INFO: Adding inet address 192.168.17.208/24 with > broadcast > > address 192.168.17.255 to device eth0 > >>>>>>>>>>>> IPaddr2(vip_208)[7851]: > > 2015/03/17_10:21:22 INFO: Bringing device eth0 up > >>>>>>>>>>>> IPaddr2(vip_208)[7851]: > > 2015/03/17_10:21:22 INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p > > /var/run/resource-agents/send_arp-192.168.17.208 eth0 192.168.17.208 auto > > not_used not_used > >>>>>>>>>>>> > >>>>>>>>>>>> 標準出力や標準エラー出力はありませんでした。 > >>>>>>>>>>>> > >>>>>>>>>>>> stonith-helperがおかしいのでしょうか。 > >>>>>>>>>>>> stonith-helperはシェルスクリプトなのでインストールはあまり気にしていなかったのですが。 > >>>>>>>>>>>> stonith-helperはここに配置されています。 > >>>>>>>>>>>> > /usr/local/heartbeat/lib/stonith/plugins/external/stonith-helper > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> 宜しくお願いします。 > >>>>>>>>>>>> > >>>>>>>>>>>> 以上 > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> 2015-03-17 9:45 GMT+09:00 > > <renay****@ybb*****>: > >>>>>>>>>>>> > >>>>>>>>>>>> 福田さん > >>>>>>>>>>>>> > >>>>>>>>>>>>> おはようございます。山内です。 > >>>>>>>>>>>>> > >>>>>>>>>>>>> 念の為、手元にある複数のstonithを利用した場合の例を抜粋してお送りします。 > >>>>>>>>>>>>> (実際には、改行に気を付けてください) > >>>>>>>>>>>>> > >>>>>>>>>>>>> 以下の例は、PM1.1系での設定で、 > >>>>>>>>>>>>> nodeaは、prmStonith1-1、 prmStonith1-2の順でstonithが実行されます。 > >>>>>>>>>>>>> nodebは、prmStonith2-1、 prmStonith2-2の順でstonithが実行されます。 > >>>>>>>>>>>>> > >>>>>>>>>>>>> stonith自体は、helperとsshです。 > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> (snip) > >>>>>>>>>>>>> ### Group Configuration ### > >>>>>>>>>>>>> group grpStonith1 \ > >>>>>>>>>>>>> prmStonith1-1 \ > >>>>>>>>>>>>> prmStonith1-2 > >>>>>>>>>>>>> > >>>>>>>>>>>>> group grpStonith2 \ > >>>>>>>>>>>>> prmStonith2-1 \ > >>>>>>>>>>>>> prmStonith2-2 > >>>>>>>>>>>>> > >>>>>>>>>>>>> ### Fencing Topology ### > >>>>>>>>>>>>> fencing_topology \ > >>>>>>>>>>>>> nodea: prmStonith1-1 > > prmStonith1-2 \ > >>>>>>>>>>>>> nodeb: prmStonith2-1 > > prmStonith2-2 > >>>>>>>>>>>>> (snp) > >>>>>>>>>>>>> primitive prmStonith1-1 > > stonith:external/stonith-helper \ > >>>>>>>>>>>>> params \ > >>>>>>>>>>>>> > >>>>>>>>>>>>> pcmk_reboot_retries="1" > > \ > >>>>>>>>>>>>> pcmk_reboot_timeout="40s" > > \ > >>>>>>>>>>>>> hostlist="nodea" \ > >>>>>>>>>>>>> dead_check_target="192.168.28.60 > > 192.168.28.70" \ > >>>>>>>>>>>>> standby_check_command="/usr/sbin/crm_resource > > -r prmRES -W | grep -qi `hostname`" \ > >>>>>>>>>>>>> run_online_check="yes" > > \ > >>>>>>>>>>>>> op start interval="0s" > > timeout="60s" on-fail="restart" \ > >>>>>>>>>>>>> op stop interval="0s" > > timeout="60s" on-fail="ignore" > >>>>>>>>>>>>> > >>>>>>>>>>>>> primitive prmStonith1-2 > > stonith:external/ssh \ > >>>>>>>>>>>>> params \ > >>>>>>>>>>>>> pcmk_reboot_timeout="60s" > > \ > >>>>>>>>>>>>> hostlist="nodea" \ > >>>>>>>>>>>>> op start interval="0s" > > timeout="60s" on-fail="restart" \ > >>>>>>>>>>>>> op monitor > > interval="3600s" timeout="60s" on-fail="restart" > > \ > >>>>>>>>>>>>> op stop interval="0s" > > timeout="60s" on-fail="ignore" > >>>>>>>>>>>>> > >>>>>>>>>>>>> primitive prmStonith2-1 > > stonith:external/stonith-helper \ > >>>>>>>>>>>>> params \ > >>>>>>>>>>>>> pcmk_reboot_retries="1" > > \ > >>>>>>>>>>>>> pcmk_reboot_timeout="40s" > > \ > >>>>>>>>>>>>> hostlist="nodeb" \ > >>>>>>>>>>>>> dead_check_target="192.168.28.61 > > 192.168.28.71" \ > >>>>>>>>>>>>> standby_check_command="/usr/sbin/crm_resource > > -r prmRES -W | grep -qi `hostname`" \ > >>>>>>>>>>>>> run_online_check="yes" > > \ > >>>>>>>>>>>>> op start interval="0s" > > timeout="60s" on-fail="restart" \ > >>>>>>>>>>>>> op stop interval="0s" > > timeout="60s" on-fail="ignore" > >>>>>>>>>>>>> > >>>>>>>>>>>>> primitive prmStonith2-2 > > stonith:external/ssh \ > >>>>>>>>>>>>> params \ > >>>>>>>>>>>>> pcmk_reboot_timeout="60s" > > \ > >>>>>>>>>>>>> hostlist="nodeb" \ > >>>>>>>>>>>>> op start interval="0s" > > timeout="60s" on-fail="restart" \ > >>>>>>>>>>>>> op monitor > > interval="3600s" timeout="60s" on-fail="restart" > > \ > >>>>>>>>>>>>> op stop interval="0s" > > timeout="60s" on-fail="ignore" > >>>>>>>>>>>>> (snip) > >>>>>>>>>>>>> location > > rsc_location-grpStonith1-2 grpStonith1 \ > >>>>>>>>>>>>> rule -INFINITY: #uname eq nodea > >>>>>>>>>>>>> location > > rsc_location-grpStonith2-3 grpStonith2 \ > >>>>>>>>>>>>> rule -INFINITY: #uname eq nodeb > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> 以上です。 > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> -- > >>>>>>>>>>>> > >>>>>>>>>>>> ELF Systems > >>>>>>>>>>>> Masamichi Fukuda > >>>>>>>>>>>> mail to: > > masamichi_fukud****@elf-s***** > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> _______________________________________________ > >>>>>>>>>>> Linux-ha-japan mailing list > >>>>>>>>>>> Linux****@lists***** > >>>>>>>>>>> http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> -- > >>>>>>>>>> > >>>>>>>>>> ELF Systems > >>>>>>>>>> Masamichi Fukuda > >>>>>>>>>> mail to: masamichi_fukud****@elf-s***** > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> _______________________________________________ > >>>>>>>>> Linux-ha-japan mailing list > >>>>>>>>> Linux****@lists***** > >>>>>>>>> http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan > >>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> -- > >>>>>>>> > >>>>>>>> ELF Systems > >>>>>>>> Masamichi Fukuda > >>>>>>>> mail to: masamichi_fukud****@elf-s***** > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> Linux-ha-japan mailing list > >>>>>>> Linux****@lists***** > >>>>>>> http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan > >>>>>>> > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> > >>>>>> ELF Systems > >>>>>> Masamichi Fukuda > >>>>>> mail to: masamichi_fukud****@elf-s***** > >>>>>> > >>>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> Linux-ha-japan mailing list > >>>>> Linux****@lists***** > >>>>> http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan > >>>>> > >>>> > >>>> -- > >>>> > >>>> ELF Systems > >>>> Masamichi Fukuda > >>>> mail to: masamichi_fukud****@elf-s***** > >>>> > >>>> > >>>> > >>> > >>> _______________________________________________ > >>> Linux-ha-japan mailing list > >>> Linux****@lists***** > >>> http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan > >>> > >> > >> > >> -- > >> > >> ELF Systems > >> Masamichi Fukuda > >> mail to: masamichi_fukud****@elf-s***** > >> > >> > > > > _______________________________________________ > > Linux-ha-japan mailing list > > Linux****@lists***** > > http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan > > > > _______________________________________________ > Linux-ha-japan mailing list > Linux****@lists***** > http://lists.sourceforge.jp/mailman/listinfo/linux-ha-japan > -- ELF Systems Masamichi Fukuda mail to: *masamichi_fukud****@elf-s***** <elfsy****@gmail*****>* -------------- next part -------------- HTML$B$NE:IU%U%!%$%k$rJ]4I$7$^$7$?(B...下載