OS환경 : Oracle Linux6.8(64bit)
DB 환경 : Oracle Database 11.2.0.4
방법 : oracle smon을 kill 시키면 어떻게 될까?
공식문서에 따르면 아래와 같다.
System Monitor Process (SMON)
The system monitor process (SMON) performs recovery, if necessary, at instance startup. SMON is also responsible for cleaning up temporary segments that are no longer in use and for coalescing contiguous free extents within dictionary managed tablespaces. If any terminated transactions were skipped during instance recovery because of file-read or offline errors, SMON recovers them when the tablespace or file is brought back online. SMON checks regularly to see whether it is needed. Other processes can call SMON if they detect a need for it.
With Real Application Clusters, the SMON process of one instance can perform instance recovery for a failed CPU or instance.
구글번역
시스템 모니터 프로세스 (SMON)
시스템 모니터 처리 (SMON)은 필요한 경우 예를 시작할 때, 복구를 수행한다.또한 SMON은 더 이상 사용되지 않는 임시 세그먼트를 정리하고 사전 관리 테이블 공간 내에서 연속적인 여유 범위를 통합하는 일도 담당합니다. 파일 읽기 또는 오프라인 오류로 인해 인스턴스 복구 중에 종료 된 트랜잭션이 건너 뛴 경우 SMON은 테이블 공간이나 파일이 다시 온라인으로 전환 될 때이를 복구합니다. SMON은 그것이 필요한지 정기적으로 점검합니다. SMON이 필요성을 감지하면 다른 프로세스가 SMON을 호출 할 수 있습니다.
Real Application Clusters를 사용하면 한 인스턴스의 SMON 프로세스가 실패한 CPU 또는 인스턴스에 대한 인스턴스 복구를 수행 할 수 있습니다
https://docs.oracle.com/cd/B19306_01/server.102/b14220/process.htm
마지막 문장이 흥미로운데
rac 상태에서 node1번의 smon이 죽으면 node2의 smon이 node1를 복구시킨다고 되어있어서 테스트를 한번 해보았다.
1번노드 smon의 pid 확인
1 2 3 4 | [oracle@orcldb1 ~]$ ps -ef | grep smon root 4937 1 1 Nov26 ? 00:51:03 /oracle/app/11.2.0/grid/bin/osysmond.bin oracle 24547 1 0 17:55 ? 00:00:00 ora_smon_ORCLDB1 oracle 26173 22968 0 18:00 pts/1 00:00:00 grep smon |
1번노드 smon kill
1 | [oracle@orcldb1 ~]$ kill -9 24547 |
alert log tail -300f 걸어놓고 확인한 결과
1번노드 alert log
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 | Thu Nov 29 18:02:57 2018 PMON (ospid: 24507): terminating the instance due to error 474 Thu Nov 29 18:02:57 2018 System state dump requested by (instance=1, osid=24507 (PMON)), summary=[abnormal instance termination]. System State dumped to trace file /oracle/app/oracle/diag/rdbms/prorcldb/ORCLDB1/trace/ORCLDB1_diag_24517_20181129180257.trc Dumping diagnostic data in directory=[cdmp_20181129180257], requested by (instance=1, osid=24507 (PMON)), summary=[abnormal instance termination]. Instance terminated by PMON, pid = 24507 Thu Nov 29 18:03:08 2018 Starting ORACLE instance (normal) LICENSE_MAX_SESSION = 0 LICENSE_SESSIONS_WARNING = 0 Initial number of CPU is 1 Number of processor cores in the system is 1 Number of processor sockets in the system is 1 Private Interface 'eth1:1' configured from GPnP for use as a private interconnect. [name='eth1:1', type=1, ip=169.254.173.134, mac=00-0c-29-af-42-2c, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62] Public Interface 'eth0' configured from GPnP for use as a public interface. [name='eth0', type=1, ip=192.168.0.120, mac=00-0c-29-af-42-22, net=192.168.0.0/24, mask=255.255.255.0, use=public/1] Public Interface 'eth0:1' configured from GPnP for use as a public interface. [name='eth0:1', type=1, ip=192.168.0.150, mac=00-0c-29-af-42-22, net=192.168.0.0/24, mask=255.255.255.0, use=public/1] Public Interface 'eth0:3' configured from GPnP for use as a public interface. [name='eth0:3', type=1, ip=192.168.0.127, mac=00-0c-29-af-42-22, net=192.168.0.0/24, mask=255.255.255.0, use=public/1] CELL communication is configured to use 0 interface(s): CELL IP affinity details: NUMA status: non-NUMA system cellaffinity.ora status: N/A CELL communication will use 1 IP group(s): Grp 0: Picked latch-free SCN scheme 3 Autotune of undo retention is turned on. LICENSE_MAX_USERS = 0 SYS auditing is disabled Starting up: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Partitioning, Real Application Clusters and Data Mining options. ORACLE_HOME = /oracle/app/11.2.0/db System name: Linux Node name: prorcldb1 Release: 4.1.12-37.4.1.el6uek.x86_64 Version: #2 SMP Tue May 17 07:23:38 PDT 2016 Machine: x86_64 VM name: VMWare Version: 6 Using parameter settings in server-side spfile /oracle/app/11.2.0/db/dbs/spfileORCLDB1.ora WARNING: spfile modifications are disabled because multiple spfiles were used System parameters with non-default values: processes = 150 spfile = "/oracle/app/11.2.0/db/dbs/spfileORCLDB1.ora" memory_target = 796M control_files = "/dev/raw/raw18" control_files = "/dev/raw/raw19" control_files = "/dev/raw/raw20" db_block_size = 8192 compatible = "11.2.0.0.0" log_archive_dest = "/arch/arc1" cluster_database = TRUE thread = 1 undo_tablespace = "UNDOTBS1" instance_number = 1 remote_login_passwordfile= "EXCLUSIVE" db_domain = "" remote_listener = "prorcldb-scan:1521" audit_file_dest = "/oracle/app/oracle/admin/prorcldb/adump" audit_trail = "DB" db_name = "prorcldb" open_cursors = 300 diagnostic_dest = "/oracle/app/oracle" Cluster communication is configured to use the following interface(s) for this instance 169.254.173.134 cluster interconnect IPC version:Oracle UDP/IP (generic) IPC Vendor 1 proto 2 Thu Nov 29 18:03:08 2018 PMON started with pid=2, OS id=26995 Thu Nov 29 18:03:08 2018 PSP0 started with pid=3, OS id=26997 Thu Nov 29 18:03:09 2018 VKTM started with pid=4, OS id=27002 VKTM running at (100ms) precision Thu Nov 29 18:03:09 2018 GEN0 started with pid=5, OS id=27014 Thu Nov 29 18:03:09 2018 DIAG started with pid=6, OS id=27026 Thu Nov 29 18:03:10 2018 DBRM started with pid=7, OS id=27039 Thu Nov 29 18:03:10 2018 PING started with pid=8, OS id=27052 Thu Nov 29 18:03:10 2018 ACMS started with pid=9, OS id=27055 Thu Nov 29 18:03:10 2018 DIA0 started with pid=10, OS id=27057 Thu Nov 29 18:03:10 2018 LMON started with pid=11, OS id=27059 Thu Nov 29 18:03:10 2018 LMD0 started with pid=12, OS id=27061 * Load Monitor used for high load check * New Low - High Load Threshold Range = [960 - 1280] Thu Nov 29 18:03:10 2018 LMS0 started with pid=13, OS id=27063 Thu Nov 29 18:03:10 2018 RMS0 started with pid=14, OS id=27067 Thu Nov 29 18:03:10 2018 LMHB started with pid=15, OS id=27069 Thu Nov 29 18:03:10 2018 MMAN started with pid=16, OS id=27071 Thu Nov 29 18:03:10 2018 DBW0 started with pid=17, OS id=27073 Thu Nov 29 18:03:10 2018 LGWR started with pid=18, OS id=27075 Thu Nov 29 18:03:10 2018 CKPT started with pid=19, OS id=27077 Thu Nov 29 18:03:10 2018 SMON started with pid=20, OS id=27079 Thu Nov 29 18:03:10 2018 RECO started with pid=21, OS id=27081 Thu Nov 29 18:03:10 2018 MMON started with pid=22, OS id=27083 Thu Nov 29 18:03:10 2018 MMNL started with pid=23, OS id=27085 lmon registered with NM - instance number 1 (internal mem no 0) Reconfiguration started (old inc 0, new inc 12) List of instances: 1 2 (myinst: 1) Global Resource Directory frozen * allocate domain 0, invalid = TRUE Communication channels reestablished * domain 0 valid according to instance 2 * domain 0 valid = 1 according to instance 2 Master broadcasted resource hash value bitmaps Non-local Process blocks cleaned out LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived Set master node info Submitted all remote-enqueue requests Dwn-cvts replayed, VALBLKs dubious All grantable enqueues granted Submitted all GCS remote-cache requests Fix write in gcs resources Reconfiguration complete Thu Nov 29 18:03:11 2018 LCK0 started with pid=24, OS id=27092 Starting background process RSMN Thu Nov 29 18:03:11 2018 RSMN started with pid=25, OS id=27106 ORACLE_BASE not set in environment. It is recommended that ORACLE_BASE be set in the environment Reusing ORACLE_BASE from an earlier startup = /oracle/app/oracle Thu Nov 29 18:03:12 2018 ALTER SYSTEM SET local_listener=' (ADDRESS=(PROTOCOL=TCP)(HOST=192.168.0.127)(PORT=2521))' SCOPE=MEMORY SID='ORCLDB1'; ALTER DATABASE MOUNT /* db agent *//* {0:9:31} */ Successful mount of redo thread 1, with mount id 657254492 Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE) Lost write protection disabled Completed: ALTER DATABASE MOUNT /* db agent *//* {0:9:31} */ ALTER DATABASE OPEN /* db agent *//* {0:9:31} */ Picked broadcast on commit scheme to generate SCNs ARCH: STARTING ARCH PROCESSES Thu Nov 29 18:03:17 2018 ARC0 started with pid=27, OS id=27138 ARC0: Archival started ARCH: STARTING ARCH PROCESSES COMPLETE ARC0: STARTING ARCH PROCESSES Thu Nov 29 18:03:18 2018 ARC1 started with pid=28, OS id=27149 Thread 1 opened at log sequence 39 Current log# 3 seq# 39 mem# 0: /dev/raw/raw14 Successful open of redo thread 1 MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set SMON: enabling cache recovery Thu Nov 29 18:03:18 2018 ARC2 started with pid=29, OS id=27151 ARC1: Archival started ARC2: Archival started ARC2: Becoming the 'no FAL' ARCH ARC2: Becoming the 'no SRL' ARCH ARC1: Becoming the heartbeat ARCH Thu Nov 29 18:03:18 2018 ARC3 started with pid=30, OS id=27153 [27118] Successfully onlined Undo Tablespace 2. Undo initialization finished serial:0 start:714679724 end:714680004 diff:280 (2 seconds) Verifying file header compatibility for 11g tablespace encryption.. Verifying 11g file header compatibility for tablespace encryption completed SMON: enabling tx recovery Database Characterset is KO16MSWIN949 No Resource Manager plan active minact-scn: Inst 1 is a slave inc#:12 mmon proc-id:27083 status:0x2 minact-scn status: grec-scn:0x0000.00000000 gmin-scn:0x0000.00000000 gcalc-scn:0x0000.00000000 Starting background process GTX0 Thu Nov 29 18:03:19 2018 GTX0 started with pid=31, OS id=27196 Starting background process RCBG Thu Nov 29 18:03:19 2018 RCBG started with pid=32, OS id=27199 replication_dependency_tracking turned off (no async multimaster replication found) ARC3: Archival started ARC0: STARTING ARCH PROCESSES COMPLETE Starting background process QMNC Thu Nov 29 18:03:20 2018 QMNC started with pid=33, OS id=27202 Completed: ALTER DATABASE OPEN /* db agent *//* {0:9:31} */ |
2번노드 alert log
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 | Thu Nov 29 18:02:57 2018 Dumping diagnostic data in directory=[cdmp_20181129180257], requested by (instance=1, osid=24507 (PMON)), summary=[abnormal instance termination]. Thu Nov 29 18:02:58 2018 Reconfiguration started (old inc 8, new inc 10) List of instances: 2 (myinst: 2) Global Resource Directory frozen * dead instance detected - domain 0 invalid = TRUE Communication channels reestablished Master broadcasted resource hash value bitmaps Non-local Process blocks cleaned out Thu Nov 29 18:02:58 2018 LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived Set master node info Submitted all remote-enqueue requests Dwn-cvts replayed, VALBLKs dubious All grantable enqueues granted Post SMON to start 1st pass IR Submitted all GCS remote-cache requests Thu Nov 29 18:02:58 2018 Instance recovery: looking for dead threads Post SMON to start 1st pass IR Fix write in gcs resources Beginning instance recovery of 1 threads Reconfiguration complete Started redo scan Completed redo scan read 1386 KB redo, 541 data blocks need recovery Started redo application at Thread 1: logseq 37, block 99 Recovery of Online Redo Log: Thread 1 Group 1 Seq 37 Reading mem 0 Mem# 0: /dev/raw/raw12 Completed redo application of 1.10MB Completed instance recovery at Thread 1: logseq 37, block 2871, scn 2137874 477 data blocks read, 576 data blocks written, 1386 redo k-bytes read Thread 1 advanced to log sequence 38 (thread recovery) Redo thread 1 internally disabled at seq 38 (SMON) Thu Nov 29 18:02:59 2018 Archived Log entry 53 added for thread 1 sequence 37 ID 0x2712c937 dest 1: Thu Nov 29 18:02:59 2018 minact-scn: Master considers inst:1 dead Thu Nov 29 18:03:00 2018 Thread 2 advanced to log sequence 33 (LGWR switch) Current log# 6 seq# 33 mem# 0: /dev/raw/raw17 Thu Nov 29 18:03:00 2018 ARC0: Archiving disabled thread 1 sequence 38 Archived Log entry 54 added for thread 1 sequence 38 ID 0x2712c937 dest 1: Archived Log entry 55 added for thread 2 sequence 32 ID 0x2712c937 dest 1: Thu Nov 29 18:03:11 2018 Reconfiguration started (old inc 10, new inc 12) List of instances: 1 2 (myinst: 2) Global Resource Directory frozen Communication channels reestablished Thu Nov 29 18:03:11 2018 * domain 0 valid = 1 according to instance 1 Master broadcasted resource hash value bitmaps Non-local Process blocks cleaned out Thu Nov 29 18:03:11 2018 LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived Set master node info Submitted all remote-enqueue requests Dwn-cvts replayed, VALBLKs dubious All grantable enqueues granted Submitted all GCS remote-cache requests Fix write in gcs resources Thu Nov 29 18:03:11 2018 minact-scn: Master returning as live inst:1 has inc# mismatch instinc:0 cur:12 errcnt:0 Reconfiguration complete |
결론 : smon을 kill 시키면 2번node에서 감지하고 바로 1번node를 restart 시켜 smon을 복구한다.
+
싱글db에서 smon이 죽으면 살릴수 있는 방법은 재기동밖에 없다.
vi $ORACLE_HOME/rdbms/mesg/oraus.msg 을 열어 SMON에 대한 내용을 확인해봄
Action에 Warn start instance라고 적혀있어서 그게 뭔지 Oracle forum에서 찾아보았다.
---
warm start instance
What kind of error message did You get? Depending on that we could figure out what exactly should be done in that particular case.
but warm start is actually clean shutdown and startup of oracle database. This activity which is called warm start is perfomed by SMON process upon startup of database - it takes cares about all incomplete transactions and recover them.
구글번역
웜 스타트 인스턴스
어떤 종류의 오류 메시지가 나타 났습니까? 그것에 따라 우리는 그 특별한 경우에 정확히 무엇이 이루어져야 하는지를 알아낼 수 있습니다.
하지만 따뜻한 시작은 실제로 오라클 데이터베이스의 완전한 종료 및 시작입니다. 웜 스타트 (warm start)라고 불리는이 활동은 데이터베이스 시작시 SMON 프로세스에 의해 수행됩니다. 모든 불완전한 트랜잭션에 대해 신경을 써서 복구합니다.
---
참조 :
https://docs.oracle.com/cd/B19306_01/server.102/b14220/process.htm
https://community.oracle.com/thread/891385
'ORACLE > Trouble Shooting' 카테고리의 다른 글
ORA-00600 발생 시 확인할 점 (0) | 2018.12.12 |
---|---|
ORA-21780 ENCOUNTERED WHEN GENERATING SERVER ALERT SMG-350 (0) | 2018.12.12 |
INSERT 문장에서 데이터를 바인딩 변수로 인식하는 문제 해결 (0) | 2018.11.19 |
DBMS_SQLTUNE.REPORT_SQL_MONITOR 실행시 결과가 모두 나오지 않을때 (0) | 2018.11.12 |
Resize operation completed for file# n 이란 (0) | 2018.11.08 |