프린트 하기

OS환경 : Oracle Linux 6.8 (64bit)


DB 환경 : Oracle Database 11.2.0.4 RAC


방법 : 오라클 11g RAC 환경 운영중에 /etc/passwd 파일이 삭제된다면?

RAC 운영 중 2번노드의 /etc/passwd 파일이 지워질 경우 어떤 문제와 로그가 발생하는지 확인해봄



먼저 샘플유저 및 데이터 생성

유저생성

1
2
3
4
5
6
7
SQL> create user imsi identified by imsi account unlock;
 
User created.
 
SQL> grant resource, connect to imsi;
 
Grant succeeded.



데이터 생성

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
SQL> CREATE TABLE MAXTEST (COLA NUMBER, COLB NUMBER, COLC NUMBER);
 
Table created.
 
SQL>
DECLARE
TYPE tbl_ins IS TABLE OF MAXTEST%ROWTYPE INDEX BY BINARY_INTEGER;
w_ins tbl_ins;
BEGIN
FOR i IN 1..1000000 LOOP 
   w_ins(i).COLA :=i;
   w_ins(i).COLB :=10;
   w_ins(i).COLC :=99;
END LOOP;
   FORALL i in 1..1000000 INSERT INTO MAXTEST VALUES w_ins(i);
   COMMIT;
END;
/
 
PL/SQL procedure successfully completed.



데이터 확인

1
2
3
4
5
6
7
SQL> select count(*from MAXTEST;
 
  COUNT(*)
----------
   1000000
 
1 row selected.



RAC 상태 확인

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
$ crsctl stat res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.LISTENER.lsnr
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
ora.OCR_VOTE.dg
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
ora.ORADATA.dg
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
ora.ORAFRA.dg
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
ora.asm
               ONLINE  ONLINE       rac1                     Started             
               ONLINE  ONLINE       rac2                     Started             
ora.gsd
               OFFLINE OFFLINE      rac1                                         
               OFFLINE OFFLINE      rac2                                         
ora.net1.network
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
ora.ons
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       rac1                                         
ora.cvu
      1        ONLINE  ONLINE       rac1                                         
ora.rac1.vip
      1        ONLINE  ONLINE       rac1                                         
ora.rac2.vip
      1        ONLINE  ONLINE       rac2                                         
ora.racdb.db
      1        ONLINE  ONLINE       rac1                     Open                
      2        ONLINE  ONLINE       rac2                     Open                
ora.scan1.vip
      1        ONLINE  ONLINE       rac1  

모든 리소스가 정상적으로 운영중임



작업 전 ping 테스트

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# ping rac2
PING rac2 (192.168.137.5156(84) bytes of data.
64 bytes from rac2 (192.168.137.51): icmp_seq=1 ttl=64 time=0.199 ms
64 bytes from rac2 (192.168.137.51): icmp_seq=2 ttl=64 time=0.148 ms
^C
--- rac2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.148/0.173/0.199/0.028 ms
 
# ping rac2-vip
PING rac2-vip (192.168.137.5356(84) bytes of data.
64 bytes from rac2-vip (192.168.137.53): icmp_seq=1 ttl=64 time=0.156 ms
64 bytes from rac2-vip (192.168.137.53): icmp_seq=2 ttl=64 time=0.489 ms
^C
--- rac2-vip ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.156/0.322/0.489/0.167 ms
 
# ping rac2-priv
PING rac2-priv (10.10.10.5156(84) bytes of data.
64 bytes from rac2-priv (10.10.10.51): icmp_seq=1 ttl=64 time=0.181 ms
64 bytes from rac2-priv (10.10.10.51): icmp_seq=2 ttl=64 time=0.134 ms
^C
--- rac2-priv ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.134/0.157/0.181/0.026 ms

public ip, vip, priv ip 모두 정상(반대쪽도 ping 이감)



2번노드의 /etc/passwd 파일 삭제(Null로 변경)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
# ls -al /etc/passwd
-rw-r--r--. 1 root root 2103 Feb 22  2018 /etc/passwd
 
# cat /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
nobody:x:99:99:Nobody:/:/sbin/nologin
systemd-network:x:192:192:systemd Network Management:/:/sbin/nologin
dbus:x:81:81:System message bus:/:/sbin/nologin
polkitd:x:999:997:User for polkitd:/:/sbin/nologin
abrt:x:173:173::/etc/abrt:/sbin/nologin
libstoragemgmt:x:998:996:daemon account for libstoragemgmt:/var/run/lsm:/sbin/nologin
rpc:x:32:32:Rpcbind Daemon:/var/lib/rpcbind:/sbin/nologin
colord:x:997:995:User for colord:/var/lib/colord:/sbin/nologin
saslauth:x:996:76:Saslauthd user:/run/saslauthd:/sbin/nologin
setroubleshoot:x:995:993::/var/lib/setroubleshoot:/sbin/nologin
rtkit:x:172:172:RealtimeKit:/proc:/sbin/nologin
pulse:x:171:171:PulseAudio System Daemon:/var/run/pulse:/sbin/nologin
chrony:x:994:990::/var/lib/chrony:/sbin/nologin
qemu:x:107:107:qemu user:/:/sbin/nologin
radvd:x:75:75:radvd user:/:/sbin/nologin
tss:x:59:59:Account used by the trousers package to sandbox the tcsd daemon:/dev/null:/sbin/nologin
usbmuxd:x:113:113:usbmuxd user:/:/sbin/nologin
geoclue:x:993:988:User for geoclue:/var/lib/geoclue:/sbin/nologin
gdm:x:42:42::/var/lib/gdm:/sbin/nologin
rpcuser:x:29:29:RPC Service User:/var/lib/nfs:/sbin/nologin
nfsnobody:x:65534:65534:Anonymous NFS User:/var/lib/nfs:/sbin/nologin
gnome-initial-setup:x:992:987::/run/gnome-initial-setup/:/sbin/nologin
avahi:x:70:70:Avahi mDNS/DNS-SD Stack:/var/run/avahi-daemon:/sbin/nologin
postfix:x:89:89::/var/spool/postfix:/sbin/nologin
ntp:x:38:38::/etc/ntp:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
tcpdump:x:72:72::/:/sbin/nologin
admin:x:1000:1000:admin:/home/admin:/bin/bash
oracle:x:500:500::/home/oracle:/bin/bash
 
$ date
Sun Jun 21 11:45:34 EDT 2020
 
# cp /dev/null /etc/passwd
cp: overwrite ??/etc/passwd??? y
 
# cat /etc/passwd
 

정상적으로 /etc/passwd 파일 내용이 null로 변경됨

/etc/passwd 파일 null 변경 시점 : Sun Jun 21 11:45:34 EDT 2020



grid alert log 확인

1
2
3
4
5
6
7
8
9
1번 노드
$ cd $GRID_HOME/log/rac1
$ tail -f alertrac1.log
변화없음
 
2번 노드 
$ cd $GRID_HOME/log/rac2
$ tail -f alertrac2.log
변화없음

1번, 2번 모두 변화 없음



db alert log 확인

1
2
3
4
5
6
7
1번 노드
$ tail -/app/oracle/diag/rdbms/racdb/racdb1/trace/alert_racdb1.log
변화없음
 
2번 노드 
$ tail -/app/oracle/diag/rdbms/racdb/racdb2/trace/alert_racdb2.log
변화없음

1번, 2번 모두 변화 없음



crsd log 확인

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
1번 노드
$ cd $GRID_HOME/log/rac1/crsd
$ tail -f crsd.log
변화없음
 
2번 노드
$ cd $GRID_HOME/log/rac2/crsd
$ tail -f crsd.log
평상시
2020-06-21 11:34:53.927: [UiServer][377435904] CS(0x7f99181cb5a0)set Properties ( oracle,0x7f99242e10f0)
2020-06-21 11:34:53.943: [UiServer][379537152]{2:43819:287} Sending message to PE. ctx= 0x2cf58f0, Client PID: 2543
2020-06-21 11:34:53.943: [   CRSPE][381638400]{2:43819:287} Processing PE command id=348. Description: [Stat Resource : 0x7f9934117120]
2020-06-21 11:34:53.943: [   CRSPE][381638400]{2:43819:287} Expression Filter : ((NAME == ora.scan1.vip) AND (LAST_SERVER == rac2))
2020-06-21 11:34:53.944: [UiServer][379537152]{2:43819:287} Done for ctx=0x2cf58f0
2020-06-21 11:35:53.929: [UiServer][377435904] CS(0x7f99181cb5a0)set Properties ( oracle,0x7f992402a400)
2020-06-21 11:35:53.940: [UiServer][379537152]{2:43819:288} Sending message to PE. ctx= 0x2cf58f0, Client PID: 2543
2020-06-21 11:35:53.940: [   CRSPE][381638400]{2:43819:288} Processing PE command id=349. Description: [Stat Resource : 0x7f99340de3a0]
 
장애 발생 후
2020-06-21 11:44:53.951: [UiServer][379537152]{2:43819:300} Sending message to PE. ctx= 0x2cf5870, Client PID: 2543
2020-06-21 11:44:53.951: [   CRSPE][381638400]{2:43819:300} Processing PE command id=361. Description: [Stat Resource : 0x7f99340de3a0]
2020-06-21 11:44:53.952: [   CRSPE][381638400]{2:43819:300} Expression Filter : ((NAME == ora.scan1.vip) AND (LAST_SERVER == rac2))
2020-06-21 11:44:53.952: [UiServer][379537152]{2:43819:300} Done for ctx=0x2cf5870
2020-06-21 11:45:53.946: [UiServer][377435904] CS(0x7f99181cb5a0)set Properties ( oracle,0x7f992402a400)
2020-06-21 11:45:53.956: [UiServer][379537152]{2:43819:301} Sending message to PE. ctx= 0x2cf5870, Client PID: 2543
2020-06-21 11:45:53.956: [   CRSPE][381638400]{2:43819:301} Processing PE command id=362. Description: [Stat Resource : 0x7f9934117120]
2020-06-21 11:45:53.956: [   CRSPE][381638400]{2:43819:301} Expression Filter : ((NAME == ora.scan1.vip) AND (LAST_SERVER == rac2))
2020-06-21 11:45:53.957: [UiServer][379537152]{2:43819:301} Done for ctx=0x2cf5870
2020-06-21 11:46:59.919: [    AGFW][392144640]{0:1:9} Agfw Proxy Server received the message: RESOURCE_STATUS[Proxy] ID 20481:22642
2020-06-21 11:46:59.919: [    AGFW][392144640]{0:1:9} Verifying msg rid = ora.asm rac2 1
2020-06-21 11:46:59.919: [    AGFW][392144640]{0:1:9} Received state change for ora.asm rac2 1 [old state = ONLINE, new state = UNKNOWN]
2020-06-21 11:46:59.919: [    AGFW][392144640]{0:1:9} Received state LABEL change for ora.asm rac2 1 [old label  = Started, new label = ]
2020-06-21 11:46:59.919: [    AGFW][392144640]{0:1:9} Agfw Proxy Server sending message to PE, Contents = [MIDTo:2|OpID:3|FromA:{Invalid|Node:0|Process:0|Type:0}|ToA:{Invalid|Node:-1|Process:-1|Type:-1}|MIDFrom:0|Type:4|Pri2|Id:3665:Ver:2]
2020-06-21 11:46:59.919: [    AGFW][392144640]{0:1:9} Agfw Proxy Server replying to the message: RESOURCE_STATUS[Proxy] ID 20481:22642
2020-06-21 11:46:59.920: [   CRSPE][381638400]{0:1:9} State change received from rac2 for ora.asm rac2 1
2020-06-21 11:46:59.920: [   CRSPE][381638400]{0:1:9} Processing PE command id=363. Description: [Resource State Change (ora.asm rac2 1) : 0x7f9934102070]
2020-06-21 11:46:59.920: [   CRSPE][381638400]{0:1:9} State information for [ora.asm rac2 1] has been lost, all we know is the initial check timed out. Issuing check operations until we can operate on better data.
2020-06-21 11:46:59.920: [    AGFW][392144640]{0:1:9} Agfw Proxy Server received the message: RESOURCE_PROBE[ora.asm rac2 1] ID 4097:3666
2020-06-21 11:46:59.920: [    AGFW][392144640]{0:1:9} Agfw Proxy Server forwarding the message: RESOURCE_PROBE[ora.asm rac2 1] ID 4097:3666 to the agent /app/grid/product/11.2.0/grid/bin/oraagent_oracle
2020-06-21 11:47:09.148: [ COMMCRS][377435904]Authentication OSD error, op: getpwuid_scls
 loc: authrespchk7
 info: failed for uid 500
dep: 0
2020-06-21 11:47:09.148: [UiServer][377435904] clscanswer returned error: 2
2020-06-21 11:47:14.931: [    AGFW][392144640]{0:1:9} Received the reply to the message: RESOURCE_PROBE[ora.asm rac2 1] ID 4097:3667 from the agent /app/grid/product/11.2.0/grid/bin/oraagent_oracle
2020-06-21 11:47:14.931: [    AGFW][392144640]{0:1:9} ora.asm rac2 1 received state from probe request. Old state = UNKNOWN, New state = UNKNOWN
2020-06-21 11:47:14.931: [    AGFW][392144640]{0:1:9} Agfw Proxy Server sending the last reply to PE for message:RESOURCE_PROBE[ora.asm rac2 1] ID 4097:3666
2020-06-21 11:47:14.932: [   CRSPE][381638400]{0:1:9} State information for [ora.asm rac2 1] is still bad. Issuing another check.
2020-06-21 11:47:14.932: [    AGFW][392144640]{0:1:9} Agfw Proxy Server received the message: RESOURCE_PROBE[ora.asm rac2 1] ID 4097:3674
2020-06-21 11:47:14.932: [    AGFW][392144640]{0:1:9} Agfw Proxy Server forwarding the message: RESOURCE_PROBE[ora.asm rac2 1] ID 4097:3674 to the agent /app/grid/product/11.2.0/grid/bin/oraagent_oracle

crsd 프로세스에서 장애 상황을 인지함

uid 500(oracle 계정의 user id)에 접근하지 못함을 인지함



cssd log 확인

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
1번 노드
$ cd $GRID_HOME/log/rac1/cssd
$ tail -f ocssd.log
변화없음
2020-06-21 11:44:57.708: [    CSSD][3079120640]clssnmSendingThread: sending status msg to all nodes
2020-06-21 11:44:57.708: [    CSSD][3079120640]clssnmSendingThread: sent 5 status msgs to all nodes
2020-06-21 11:45:02.718: [    CSSD][3079120640]clssnmSendingThread: sending status msg to all nodes
2020-06-21 11:45:02.719: [    CSSD][3079120640]clssnmSendingThread: sent 5 status msgs to all nodes
2020-06-21 11:45:06.746: [    CSSD][3079120640]clssnmSendingThread: sending status msg to all nodes
2020-06-21 11:45:06.746: [    CSSD][3079120640]clssnmSendingThread: sent 4 status msgs to all nodes
.
.
반복
 
2번 노드
$ cd $GRID_HOME/log/rac2/cssd
$ tail -f ocssd.log
변화없음
2020-06-21 11:44:55.708: [    CSSD][559494912]clssnmSendingThread: sending status msg to all nodes
2020-06-21 11:44:55.708: [    CSSD][559494912]clssnmSendingThread: sent 5 status msgs to all nodes
2020-06-21 11:45:00.710: [    CSSD][559494912]clssnmSendingThread: sending status msg to all nodes
2020-06-21 11:45:00.710: [    CSSD][559494912]clssnmSendingThread: sent 5 status msgs to all nodes
2020-06-21 11:45:04.730: [    CSSD][559494912]clssnmSendingThread: sending status msg to all nodes
2020-06-21 11:45:04.730: [    CSSD][559494912]clssnmSendingThread: sent 4 status msgs to all nodes
.
.
반복



gipcd log 확인

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
1번 노드
$ cd $GRID_HOME/log/rac1/gipcd
$ tail -f gipcd.log
변화없음
2020-06-21 11:45:47.708: [GIPCDCLT][4224059136] gipcdClientThread: req from local client of type gipcdmsgtypeInterfaceMetrics, endp 0000000000000336
2020-06-21 11:45:47.709: [GIPCDMON][4219856640] gipcdMonitorCssCheck: found node rac1
2020-06-21 11:45:47.709: [GIPCDMON][4219856640] gipcdMonitorCssCheck: found node rac2
2020-06-21 11:45:47.709: [GIPCDMON][4219856640] gipcdMonitorCssCheck: updating timeout node rac2
2020-06-21 11:45:47.709: [GIPCDMON][4219856640] gipcdMonitorCssCheck: updating timeout node rac2
2020-06-21 11:45:47.709: [GIPCDMON][4219856640] gipcdMonitorFailZombieNodes: skipping live node 'rac2', time 0 ms, endp 00000000000000000000000000000880
2020-06-21 11:45:47.709: [GIPCDMON][4219856640] gipcdMonitorFailZombieNodes: skipping live node 'rac2', time 0 ms, endp 0000000000000000, 000000000000097b
2020-06-21 11:45:48.098: [GIPCDCLT][4224059136] gipcdClientThread: req from local client of type gipcdmsgtypeInterfaceMetrics, endp 00000000000007e9
2020-06-21 11:45:48.768: [GIPCDCLT][4224059136] gipcdClientThread: req from local client of type gipcdmsgtypeInterfaceMetrics, endp 00000000000003dd
2020-06-21 11:45:48.843: [GIPCDCLT][4224059136] gipcdClientThread: req from local client of type gipcdmsgtypeInterfaceMetrics, endp 0000000000000119
2020-06-21 11:45:48.990: [GIPCDCLT][4224059136] gipcdClientThread: req from local client of type gipcdmsgtypeInterfaceMetrics, endp 0000000000000b76
2020-06-21 11:45:50.221: [GIPCDMON][4219856640] gipcdMonitorSaveInfMetrics: inf[ 0]  ens34                - rank   99, avgms 1.052632 [ 117 / 114 / 114 ]
2020-06-21 11:45:50.715: [GIPCDCLT][4224059136] gipcdClientThread: req from local client of type gipcdmsgtypeInterfaceMetrics, endp 000000000000047e
2020-06-21 11:45:51.397: [ CLSINET][4219856640] Returning NETDATA: 1 interfaces
2020-06-21 11:45:51.397: [ CLSINET][4219856640] # 0 Interface 'ens34',ip='10.10.10.50',mac='00-50-56-33-ad-98',mask='255.255.255.0',net='10.10.10.0',use='cluster_interconnect'
.
.
반복
 
2번 노드
$ cd $GRID_HOME/log/rac2/gipcd
$ tail -f gipcd.log
변화없음
2020-06-21 11:45:54.281: [GIPCDCLT][3850999552] gipcdClientThread: req from local client of type gipcdmsgtypeInterfaceMetrics, endp 0000000000000b70
2020-06-21 11:45:56.125: [GIPCDCLT][3850999552] gipcdClientThread: req from local client of type gipcdmsgtypeInterfaceMetrics, endp 0000000000000336
2020-06-21 11:45:56.125: [GIPCDMON][3846797056] gipcdMonitorCssCheck: found node rac1
2020-06-21 11:45:56.125: [GIPCDMON][3846797056] gipcdMonitorCssCheck: updating timeout node rac1
2020-06-21 11:45:56.125: [GIPCDMON][3846797056] gipcdMonitorCssCheck: updating timeout node rac1
2020-06-21 11:45:56.125: [GIPCDMON][3846797056] gipcdMonitorCssCheck: found node rac2
2020-06-21 11:45:56.125: [GIPCDMON][3846797056] gipcdMonitorFailZombieNodes: skipping live node 'rac1', time 0 ms, endp 0000000000000000, 000000000000097c
2020-06-21 11:45:56.125: [GIPCDMON][3846797056] gipcdMonitorFailZombieNodes: skipping live node 'rac1', time 0 ms, endp 0000000000000000, 0000000000000ad3
2020-06-21 11:45:56.720: [GIPCDCLT][3850999552] gipcdClientThread: req from local client of type gipcdmsgtypeInterfaceMetrics, endp 000000000000047e
2020-06-21 11:45:56.770: [GIPCDCLT][3850999552] gipcdClientThread: req from local client of type gipcdmsgtypeInterfaceMetrics, endp 00000000000003dd
2020-06-21 11:45:56.851: [GIPCDCLT][3850999552] gipcdClientThread: req from local client of type gipcdmsgtypeInterfaceMetrics, endp 0000000000000195
2020-06-21 11:45:58.292: [GIPCDMON][3846797056] gipcdMonitorSaveInfMetrics: inf[ 0]  ens34                - rank   99, avgms 0.940171 [ 113 / 117 / 117 ]
2020-06-21 11:45:59.127: [GIPCDCLT][3850999552] gipcdClientThread: req from local client of type gipcdmsgtypeInterfaceMetrics, endp 00000000000007a6
2020-06-21 11:45:59.209: [ CLSINET][3846797056] Returning NETDATA: 1 interfaces
2020-06-21 11:45:59.209: [ CLSINET][3846797056] # 0 Interface 'ens34',ip='10.10.10.51',mac='00-50-56-27-3e-72',mask='255.255.255.0',net='10.10.10.0',use='cluster_interconnect'
.
.
반복



gpnpd log 확인

1
2
3
4
5
6
7
8
9
1번 노드
$ cd $GRID_HOME/log/rac1/gpnpd
$ tail -f gpnpd.log
변화없음
 
2번 노드
$ cd $GRID_HOME/log/rac2/gpnpd
$ tail -f gpnpd.log
변화없음



evmd log 확인

1
2
3
4
5
6
7
8
9
1번 노드
$ cd $GRID_HOME/log/rac1/evmd
$ tail -f evmd.log
변화없음
 
2번 노드
$ cd $GRID_HOME/log/rac2/evmd
$ tail -f evmd.log
변화없음



ohasd log 확인

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
1번 노드
$ cd $GRID_HOME/log/rac1/ohasd
$ tail -f ohasd.log
2020-06-21 11:44:01.720: [   CRSPE][2415916800]{0:0:285} Processing PE command id=297. Description: [Stat Resource : 0x7f929c0b6af0]
2020-06-21 11:44:01.720: [UiServer][2413815552]{0:0:285} Done for ctx=0x7f92901d4440
2020-06-21 11:45:01.715: [UiServer][2411714304] CS(0x7f929c007cd0)set Properties ( oracle,0x7f92881bdfd0)
2020-06-21 11:45:01.725: [UiServer][2413815552]{0:0:286} Sending message to PE. ctx= 0x7f92901e3f20, Client PID: 2541
2020-06-21 11:45:01.725: [   CRSPE][2415916800]{0:0:286} Processing PE command id=298. Description: [Stat Resource : 0x7f929c0b6af0]
2020-06-21 11:45:01.726: [UiServer][2413815552]{0:0:286} Done for ctx=0x7f92901e3f20
2020-06-21 11:46:01.664: [UiServer][2411714304] CS(0x7f929c082710)set Properties ( oracle,0x7f92881bdfd0)
2020-06-21 11:46:01.674: [UiServer][2411714304] CS(0x7f929c087e10)set Properties ( oracle,0x7f92881a3e30)
2020-06-21 11:46:01.674: [UiServer][2411714304] CS(0x7f929c089930)set Properties ( oracle,0x7f92881b1940)
.
.
반복
 
2번 노드
$ cd $GRID_HOME/log/rac2/ohasd
$ tail -f ohasd.log
평상시
2020-06-21 11:34:44.890: [UiServer][2891732736] CS(0x7f469c0b2210)set Properties ( oracle,0x7f46881ca5a0)
2020-06-21 11:34:44.900: [UiServer][2893833984]{0:0:266} Sending message to PE. ctx= 0x7f469000e6d0, Client PID: 2543
2020-06-21 11:34:44.900: [   CRSPE][2895935232]{0:0:266} Processing PE command id=280. Description: [Stat Resource : 0x7f469c112de0]
2020-06-21 11:34:44.901: [UiServer][2893833984]{0:0:266} Done for ctx=0x7f469000e6d0
2020-06-21 11:35:44.885: [UiServer][2891732736] CS(0x7f469c082710)set Properties ( oracle,0x7f46881a80d0)
2020-06-21 11:35:44.895: [UiServer][2891732736] CS(0x7f469c1092f0)set Properties ( oracle,0x7f46881dad20)
2020-06-21 11:35:44.896: [UiServer][2893833984]{0:0:267} Sending message to PE. ctx= 0x7f46901133d0, Client PID: 2543
2020-06-21 11:35:44.896: [   CRSPE][2895935232]{0:0:267} Processing PE command id=281. Description: [Stat Resource : 0x7f469c10bb40]
.
.
반복
 
장애 발생 후
2020-06-21 11:45:44.882: [UiServer][2891732736] CS(0x7f469c082710)set Properties ( oracle,0x7f46881dad20)
2020-06-21 11:45:44.893: [UiServer][2893833984]{0:0:283} Sending message to PE. ctx= 0x7f4690035a30, Client PID: 2543
2020-06-21 11:45:44.893: [   CRSPE][2895935232]{0:0:283} Processing PE command id=297. Description: [Stat Resource : 0x7f469c112de0]
2020-06-21 11:45:44.894: [UiServer][2893833984]{0:0:283} Done for ctx=0x7f4690035a30
2020-06-21 11:45:44.914: [UiServer][2891732736] CS(0x7f469c1092f0)set Properties ( oracle,0x7f46881ca5a0)
2020-06-21 11:45:44.914: [UiServer][2891732736] CS(0x7f469c114fa0)set Properties ( oracle,0x7f46881a80d0)
2020-06-21 11:45:44.925: [UiServer][2891732736] CS(0x7f469c087e10)set Properties ( oracle,0x7f46881dad20)
2020-06-21 11:45:44.925: [UiServer][2893833984]{0:0:284} Sending message to PE. ctx= 0x7f46900eafe0, Client PID: 2543
2020-06-21 11:45:44.925: [UiServer][2893833984]{0:0:285} Sending message to PE. ctx= 0x7f4690090630, Client PID: 2543
2020-06-21 11:45:44.926: [   CRSPE][2895935232]{0:0:284} Processing PE command id=298. Description: [Stat Resource : 0x7f469c112de0]
2020-06-21 11:45:44.926: [   CRSPE][2895935232]{0:0:285} Processing PE command id=299. Description: [Stat Resource : 0x7f469c10bb40]
2020-06-21 11:45:44.927: [UiServer][2893833984]{0:0:284} Done for ctx=0x7f46900eafe0
2020-06-21 11:45:44.927: [UiServer][2893833984]{0:0:285} Done for ctx=0x7f4690090630
2020-06-21 11:45:44.935: [UiServer][2893833984]{0:0:286} Sending message to PE. ctx= 0x7f46900b9d90, Client PID: 2543
2020-06-21 11:45:44.936: [   CRSPE][2895935232]{0:0:286} Processing PE command id=300. Description: [Stat Resource : 0x7f469c112de0]
2020-06-21 11:45:44.936: [UiServer][2893833984]{0:0:286} Done for ctx=0x7f46900b9d90
2020-06-21 11:47:00.129: [ COMMCRS][2891732736]Authentication OSD error, op: getpwuid_scls
 loc: authrespchk7
 info: failed for uid 500
dep: 0
2020-06-21 11:47:00.129: [UiServer][2891732736] clscanswer returned error: 2
2020-06-21 11:47:15.360: [ COMMCRS][2891732736]Authentication OSD error, op: getpwuid_scls
 loc: authrespchk7
 info: failed for uid 500
dep: 0
2020-06-21 11:47:15.360: [UiServer][2891732736] clscanswer returned error: 2
2020-06-21 11:47:30.565: [ COMMCRS][2891732736]Authentication OSD error, op: getpwuid_scls
 loc: authrespchk7
 info: failed for uid 500
dep: 0
.
.
반복
.
.
2020-06-21 11:49:48.028: [UiServer][2891732736] clscanswer returned error: 2
2020-06-21 11:49:57.443: [    AGFW][2906441472]{0:11:15} Agfw Proxy Server received the message: RESOURCE_STATUS[Proxy] ID 20481:3205
2020-06-21 11:49:57.443: [    AGFW][2906441472]{0:11:15} Verifying msg rid = ora.crsd 1 1
2020-06-21 11:49:57.443: [    AGFW][2906441472]{0:11:15} Received state change for ora.crsd 1 1 [old state = ONLINE, new state = UNKNOWN]
2020-06-21 11:49:57.443: [    AGFW][2906441472]{0:11:15} Agfw Proxy Server sending message to PE, Contents = [MIDTo:2|OpID:3|FromA:{Invalid|Node:0|Process:0|Type:0}|ToA:{Invalid|Node:-1|Process:-1|Type:-1}|MIDFrom:0|Type:4|Pri2|Id:3142:Ver:2]
2020-06-21 11:49:57.443: [    AGFW][2906441472]{0:11:15} Agfw Proxy Server replying to the message: RESOURCE_STATUS[Proxy] ID 20481:3205
2020-06-21 11:49:57.443: [   CRSPE][2895935232]{0:11:15} State change received from rac2 for ora.crsd 1 1
2020-06-21 11:49:57.443: [   CRSPE][2895935232]{0:11:15} Processing PE command id=301. Description: [Resource State Change (ora.crsd 1 1) : 0x7f469c09b340]
2020-06-21 11:49:57.443: [   CRSPE][2895935232]{0:11:15} RI [ora.crsd 1 1] new external state [INTERMEDIATE] old value: [ONLINE] on rac2 label = []
2020-06-21 11:49:57.443: [   CRSPE][2895935232]{0:11:15} Processing unplanned state change for [ora.crsd 1 1]
2020-06-21 11:49:57.443: [   CRSPE][2895935232]{0:11:15} PE Command [ Resource State Change (ora.crsd 1 1) : 0x7f469c09b340 ] has completed
2020-06-21 11:49:57.443: [    AGFW][2906441472]{0:11:15} Agfw Proxy Server received the message: CMD_COMPLETED[Proxy] ID 20482:3143
2020-06-21 11:49:57.443: [    AGFW][2906441472]{0:11:15} Agfw Proxy Server replying to the message: CMD_COMPLETED[Proxy] ID 20482:3143
2020-06-21 11:49:57.443: [    AGFW][2906441472]{0:11:15} Agfw received reply from PE for resource state change for ora.crsd 1 1
2020-06-21 11:50:03.235: [ COMMCRS][2891732736]Authentication OSD error, op: getpwuid_scls
 loc: authrespchk7
 info: failed for uid 500
dep: 0

ohasd 프로세스에서 crsd 프로세스의 상태가 ONLINE에서 UNKNOWN으로 변경됨을 인지함



1번 노드에서 RAC 상태 확인

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
$ crsctl stat res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.LISTENER.lsnr
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
ora.OCR_VOTE.dg
               ONLINE  ONLINE       rac1                                         
               ONLINE  INTERMEDIATE rac2                                         
ora.ORADATA.dg
               ONLINE  ONLINE       rac1                                         
               ONLINE  INTERMEDIATE rac2                                         
ora.ORAFRA.dg
               ONLINE  ONLINE       rac1                                         
               ONLINE  INTERMEDIATE rac2                                         
ora.asm
               ONLINE  ONLINE       rac1                     Started             
               ONLINE  INTERMEDIATE rac2                                         
ora.gsd
               OFFLINE OFFLINE      rac1                                         
               OFFLINE OFFLINE      rac2                                         
ora.net1.network
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
ora.ons
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       rac1                                         
ora.cvu
      1        ONLINE  ONLINE       rac1                                         
ora.rac1.vip
      1        ONLINE  ONLINE       rac1                                         
ora.rac2.vip
      1        ONLINE  ONLINE       rac2                                         
ora.racdb.db
      1        ONLINE  ONLINE       rac1                     Open                
      2        ONLINE  ONLINE       rac2                     Open                
ora.scan1.vip
      1        ONLINE  ONLINE       rac1

asm 영역이 모두 2번 state가 Offline 로 변경됨



2번 노드에서 RAC 상태 확인

1
2
3
# crsctl stat res -t
terminate called after throwing an instance of 'cls::Exception'
Aborted (core dumped)

명령이 제대로 실행되지 않음



db 상태 확인

1
2
3
4
5
6
7
8
SQL> select instance_name, version, status from gv$instance;
 
INSTANCE_NAME     VERSION       STATUS
---------------- ----------------- ------------
racdb1         11.2.0.4.0       OPEN
racdb2         11.2.0.4.0       OPEN
 
2 rows selected.



1번 노드에서 데이터 삽입

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
SQL> conn imsi/imsi
SQL> 
DECLARE
TYPE tbl_ins IS TABLE OF MAXTEST%ROWTYPE INDEX BY BINARY_INTEGER;
w_ins tbl_ins;
BEGIN
FOR i IN 1..10000 LOOP 
   w_ins(i).COLA :=i;
   w_ins(i).COLB :=10;
   w_ins(i).COLC :=99;
END LOOP;
   FORALL i in 1..10000 INSERT INTO MAXTEST VALUES w_ins(i);
   COMMIT;
END;
/



데이터 확인

1
2
3
4
5
6
7
SQL> select count(*from MAXTEST;
 
  COUNT(*)
----------
   1010000
 
1 row selected.



데이터 삽입 시 db, grid alert log 모두 변화 없음



1번 노드에서 로그 스위치

1
2
3
SQL> alter system switch logfile;
 
System altered.



db alert log 확인

1
2
3
4
5
6
7
8
9
10
1번 노드
$ tail -/app/oracle/diag/rdbms/racdb/racdb1/trace/alert_racdb1.log
Thu Jun 21 11:50:16 2020
Thread 1 advanced to log sequence 16 (LGWR switch)
  Current log# 2 seq# 16 mem# 0+ORADATA/racdb/onlinelog/group_2.258.970626167
  Current log# 2 seq# 16 mem# 1+ORAFRA/racdb/onlinelog/group_2.273.970626167
 
2번 노드
$ tail -/app/oracle/diag/rdbms/racdb/racdb2/trace/alert_racdb2.log
변화없음

1번은 로그스위치 실행기록이 나오고 2번은 변화 없음



2번 DB는 유저를 삭제했기때문에 sqlplus 접속이 되지 않음

2번 DB접속을 위해 1번 노드 tnsnames.ora 수정

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
$ cat $ORACLE_HOME/network/admin/tnsnames.ora
# tnsnames.ora Network Configuration File: /app/oracle/product/11.2.0/db_1/network/admin/tnsnames.ora
# Generated by Oracle configuration tools.
 
RACDB =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = rac-scan)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = racdb)
    )
  )
 
RAC2 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.137.51)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SID = racdb2)
    )
  )

14~ 21번째 라인 추가



tnsping 확인

1
2
3
4
5
6
7
8
9
10
11
12
$ tnsping RAC2
 
TNS Ping Utility for Linux: Version 11.2.0.4.0 - Production on 21-JUN-2020 11:52:39
 
Copyright (c) 19972013, Oracle.  All rights reserved.
 
Used parameter files:
 
 
Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.137.51)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SID = racdb2)))
OK (0 msec)

정상연결됨



2번 DB 접속

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
$ sqlplus sys/oracle@RAC2 as sysdba
 
SQL*Plus: Release 11.2.0.4.0 Production on Thu Jun 21 11:52:54 2020
 
Copyright (c) 19822013, Oracle.  All rights reserved.
 
 
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
 
SQL> select instance_name, host_name from v$instance;
 
INSTANCE_NAME     HOST_NAME
---------------- ----------------------------------------------------------------
racdb2         rac2
 
1 row selected.

정상 접속됨



2번 노드에서 데이터 확인

1
2
3
4
5
6
7
8
9
SQL> conn imsi/imsi
Connected.
SQL> select count(*from MAXTEST;
 
  COUNT(*)
----------
   1010000
 
1 row selected.

1번 노드에서 삽입한 데이터가 정상적으로 확인됨



2번 노드에서 데이터 삽입

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
SQL> conn imsi/imsi
SQL> 
DECLARE
TYPE tbl_ins IS TABLE OF MAXTEST%ROWTYPE INDEX BY BINARY_INTEGER;
w_ins tbl_ins;
BEGIN
FOR i IN 1..1000 LOOP 
   w_ins(i).COLA :=i;
   w_ins(i).COLB :=10;
   w_ins(i).COLC :=99;
END LOOP;
   FORALL i in 1..1000 INSERT INTO MAXTEST VALUES w_ins(i);
   COMMIT;
END;
/



2번 노드에서 데이터 확인

1
2
3
4
5
6
7
8
9
SQL> conn imsi/imsi
Connected.
SQL> select count(*from MAXTEST;
 
  COUNT(*)
----------
   1011000
 
1 row selected.



데이터 삽입 시 db, grid alert log 모두 변화 없음



2번 노드에서 로그 스위치

1
2
3
SQL> alter system switch logfile;
 
System altered.



db alert log 확인

1
2
3
4
5
6
7
8
$ tail -/app/oracle/diag/rdbms/racdb/racdb1/trace/alert_racdb1.log
변화없음
 
$ tail -/app/oracle/diag/rdbms/racdb/racdb2/trace/alert_racdb2.log
Thu Jun 21 11:53:25 2020
Thread 2 advanced to log sequence 4 (LGWR switch)
  Current log# 4 seq# 4 mem# 0+ORADATA/racdb/onlinelog/group_4.266.970626635
  Current log# 4 seq# 4 mem# 1+ORAFRA/racdb/onlinelog/group_4.275.970626635

2번은 로그스위치 실행기록이 나오고 1번은 변화 없음



다시 한번 ping 테스트

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# ping rac2
PING rac2 (192.168.137.5156(84) bytes of data.
64 bytes from rac2 (192.168.137.51): icmp_seq=1 ttl=64 time=0.184 ms
64 bytes from rac2 (192.168.137.51): icmp_seq=2 ttl=64 time=0.083 ms
^C
--- rac2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.083/0.133/0.184/0.051 ms
 
# ping rac2-vip
PING rac2-vip (192.168.137.5356(84) bytes of data.
64 bytes from rac2-vip (192.168.137.53): icmp_seq=1 ttl=64 time=0.167 ms
64 bytes from rac2-vip (192.168.137.53): icmp_seq=2 ttl=64 time=0.112 ms
^C
--- rac2-vip ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.112/0.139/0.167/0.029 ms
 
# ping rac2-priv
PING rac2-priv (10.10.10.5156(84) bytes of data.
64 bytes from rac2-priv (10.10.10.51): icmp_seq=1 ttl=64 time=0.143 ms
64 bytes from rac2-priv (10.10.10.51): icmp_seq=2 ttl=64 time=0.141 ms
^C
--- rac2-priv ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.141/0.142/0.143/0.001 ms

1번 노드 -> 2번 노드로 정상적으로 ping 이감(반대쪽도 ping 이감)



테스트 결과

2번 노드의 /etc/passwd 파일이 삭제되어도(유저삭제) 데이터 삽입 및 조회는 양쪽에서 가능함

1번 노드에서 crsctl stat res -t 명령으로 확인 시 2번 노드의 모든 asm 영역의 state가 Offline 로 변경됨

/etc/passwd 파일 변경 시 crsd, ohasd 프로세스가 장애 내용을 인지함



grid alert log 확인

1
2
3
4
5
6
7
8
$ cd $GRID_HOME/log/rac1
$ tail -f alertrac1.log
변화 없음
 
$ cd $GRID_HOME/log/rac2
$ tail -f alertrac2.log
2020-06-18 11:41:17.823
[ctssd(2256)]CRS-2408:The clock on host rac2 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

2번 노드에서 ctssd 로그가 발생하긴 했지만 크리티컬한 메세지는 아님

ctssd 에 의해 평균 클러스터 시간과 동기화되도록 업데이트되었다는 내용임



해결 방법 : /etc/passwd 파일 원복 후 oracle 프로세스 모두 재기동

백업파일이 있는 경우 /etc/passwd 파일 원복

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
# cp /etc/passwd_bak /etc/passwd
# cat /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
nobody:x:99:99:Nobody:/:/sbin/nologin
systemd-network:x:192:192:systemd Network Management:/:/sbin/nologin
dbus:x:81:81:System message bus:/:/sbin/nologin
polkitd:x:999:997:User for polkitd:/:/sbin/nologin
abrt:x:173:173::/etc/abrt:/sbin/nologin
libstoragemgmt:x:998:996:daemon account for libstoragemgmt:/var/run/lsm:/sbin/nologin
rpc:x:32:32:Rpcbind Daemon:/var/lib/rpcbind:/sbin/nologin
colord:x:997:995:User for colord:/var/lib/colord:/sbin/nologin
saslauth:x:996:76:Saslauthd user:/run/saslauthd:/sbin/nologin
setroubleshoot:x:995:993::/var/lib/setroubleshoot:/sbin/nologin
rtkit:x:172:172:RealtimeKit:/proc:/sbin/nologin
pulse:x:171:171:PulseAudio System Daemon:/var/run/pulse:/sbin/nologin
chrony:x:994:990::/var/lib/chrony:/sbin/nologin
qemu:x:107:107:qemu user:/:/sbin/nologin
radvd:x:75:75:radvd user:/:/sbin/nologin
tss:x:59:59:Account used by the trousers package to sandbox the tcsd daemon:/dev/null:/sbin/nologin
usbmuxd:x:113:113:usbmuxd user:/:/sbin/nologin
geoclue:x:993:988:User for geoclue:/var/lib/geoclue:/sbin/nologin
gdm:x:42:42::/var/lib/gdm:/sbin/nologin
rpcuser:x:29:29:RPC Service User:/var/lib/nfs:/sbin/nologin
nfsnobody:x:65534:65534:Anonymous NFS User:/var/lib/nfs:/sbin/nologin
gnome-initial-setup:x:992:987::/run/gnome-initial-setup/:/sbin/nologin
avahi:x:70:70:Avahi mDNS/DNS-SD Stack:/var/run/avahi-daemon:/sbin/nologin
postfix:x:89:89::/var/spool/postfix:/sbin/nologin
ntp:x:38:38::/etc/ntp:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
tcpdump:x:72:72::/:/sbin/nologin
admin:x:1000:1000:admin:/home/admin:/bin/bash
oracle:x:500:500::/home/oracle:/bin/bash



2번 노드 db 종료

1
2
3
4
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.



2번 노드 crs 종료

1
2
3
4
# crsctl stop crs
CRS-2796: The command may not proceed when Cluster Ready Services is not running
CRS-4687: Shutdown command has completed with errors.
CRS-4000: Command Stop failed, or completed with errors.

현재 상태가 정상적이지 못해서 정상종료 불가함



2번 노드 crs 강제 종료

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac2'
CRS-2673: Attempting to stop 'ora.ctssd' on 'rac2'
CRS-2673: Attempting to stop 'ora.evmd' on 'rac2'
CRS-2673: Attempting to stop 'ora.asm' on 'rac2'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'rac2'
CRS-2677: Stop of 'ora.evmd' on 'rac2' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'rac2' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'rac2' succeeded
CRS-2677: Stop of 'ora.asm' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'rac2'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'rac2'
CRS-2677: Stop of 'ora.cssd' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.crf' on 'rac2'
CRS-2677: Stop of 'ora.crf' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'rac2'
CRS-2677: Stop of 'ora.gipcd' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'rac2'
CRS-2677: Stop of 'ora.gpnpd' on 'rac2' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rac2' has completed
CRS-4133: Oracle High Availability Services has been stopped.



2번 노드 서버 재기동(/etc/passwd 파일 유실로 인해 다른 프로세스들도 문제가 생겼을수 있음)

1
# reboot



crs, db 모두 기동 후 상태 확인

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# crsctl stat res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.LISTENER.lsnr
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
ora.OCR_VOTE.dg
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
ora.ORADATA.dg
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
ora.ORAFRA.dg
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
ora.asm
               ONLINE  ONLINE       rac1                     Started             
               ONLINE  ONLINE       rac2                     Started             
ora.gsd
               OFFLINE OFFLINE      rac1                                         
               OFFLINE OFFLINE      rac2                                         
ora.net1.network
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
ora.ons
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       rac2                                         
ora.cvu
      1        ONLINE  ONLINE       rac1                                         
ora.rac1.vip
      1        ONLINE  ONLINE       rac1                                         
ora.rac2.vip
      1        ONLINE  ONLINE       rac2                                         
ora.racdb.db
      1        ONLINE  ONLINE       rac1                     Open                
      2        ONLINE  ONLINE       rac2                     Open                
ora.scan1.vip
      1        ONLINE  ONLINE       rac2

모두 정상 ONLINE 상태임



참조 : 

https://positivemh.tistory.com/183

https://positivemh.tistory.com/246